Skip to content

Latest commit

 

History

History
100 lines (87 loc) · 4.43 KB

230215-nlp-common-task.md

File metadata and controls

100 lines (87 loc) · 4.43 KB

NLP common tasks:

  • Link

  • img.png

  • Classification

    • Text Classification: assigning a category to a sentence or document ( e.g. spam filtering).
    • Sentiment Analysis: identifying the polarity of a piece of text.
  • Information Retrieval and Document Ranking

    • Sentence/document similarity: determining how similar two texts are.
    • Question Answering: the task of answering a question in natural language.
  • Text-to-Text Generation

    • Machine Translation: translating from one language to another.
    • Text Generation: creating text that appears indistinguishable from human-written text.
    • Text Summarization: creating a shortened version of several documents that preserves most of their meaning.
    • Text Simplification: making a text easier to read and understand, while preserving its main ideas and approximate meaning.
    • Lexical Normalization: translating/transforming a non-standard text to a standard register.
    • Paraphrase Generation: creating an output sentence that preserves the meaning of input but includes variations in word choice and grammar.
  • Knowledge bases, entities and relations

    • Relation extraction: extracting semantic relationships from a text. Extracted relationships usually occur between two or more entities and fall into specific semantic categories (e.g. lives in, sister of, etc).
    • Relation prediction: identifying a named relation between two named semantic entities.
    • Named Entity Recognition: tagging entities in text with their corresponding type, typically in BIO notation.
    • Entity Linking: recognizing and disambiguating named entities to a knowledge base (typically Wikidata).
  • Topics and Keywords

    • Topic Modeling: identifying abstract “topics” underlying a collection of documents.
    • Keyword Extraction: identifying the most relevant terms to describe the subject of a document
  • Chatbots

    • Intent Detection: capturing the semantics behind messages from users and assigning them to the correct label.
    • Slot Filling: aims to extract the values of certain types of attributes ( or slots, such as cities or dates) for a given entity from texts.
    • Dialog Management: managing of state and flow of conversations.
  • Text Reasoning

    • Common Sense Reasoning: use of “common sense” or world knowledge to make inferences.
    • Natural Language Inference: determining whether a “hypothesis” is true ( entailment), false (contradiction), or undetermined (neutral) given a “premise”.
  • Fake News and Hate Speech Detection

    • Fake News Detection: detecting and filtering out texts containing false and misleading information.
    • Stance Detection: determining an individual’s reaction to a primary actor’s claim. It is a core part of a set of approaches to fake news assessment.
    • Hate Speech Detection: detecting if a piece of text contains hate speech.
  • Text-to-Data and viceversa

    • Text-to-Speech: technology that reads digital text aloud.
    • Speech-to-Text: transcribing speech to text.
    • Text-to-Image: generating photo-realistic images which are semantically consistent with the text descriptions.
    • Data-to-Text: producing text from non-linguistic input, such as databases of records, spreadsheets, and expert system knowledge bases.
  • Text Preprocessing

    • Coreference Resolution: clustering mentions in text that refer to the same underlying real-world entities.
    • Part Of Speech (POS) tagging: tagging a word in a text with its part of speech. A part of speech is a category of words with similar grammatical properties, such as noun, verb, adjective, adverb, pronoun, preposition, conjunction, etc.
    • Word Sense Disambiguation: associating words in context with their most suitable entry in a pre-defined sense inventory (typically WordNet).
    • Grammatical Error Correction: correcting different kinds of errors in text such as spelling, punctuation, grammatical, and word choice errors.
    • Feature Extraction: extraction of generic numerical features from text, usually embeddings.