Text Summarization

Table of Contents

1 Type of Tasks

1.1 Applications

  • outlines or abstracts,
  • summaries of email threads
  • action items
  • simplifying

1.2 Target

  • Single document summarization abstract, outline, headline
  • Multi-document
    • retrieve stories on the same event
    • web pages about some topics or questions

2 Methods

2.1 Single Document Summarization

2.1.1 Generating Summarization

  • Content Selection Retrieving relative sentences.
    Supervised methods are not better than unsupervised, thus unsupervised methods are more common.
    • tf-idf
    • topic signature (informative words)
      • likelihood
      • mutual information
  • Information Ordering Reordering the sentences, usually keeping the document order.
  • Sentence Realization Cleaning up the sentences, usually using the original sentences.

2.1.2 Evaluation

ROUGE-n method evaluates the summaries by counting the n-grams coexisting in generated summaries and human made summaries.

2.2 Multi-document Summarization (Question Answering)

Two kinds of methods: top-down (information extraction) , bottom-up (snippet).

2.2.1 Snippet

  • Splitting sentences from documents
  • Simplifing sentences (triming parsing trees, rule-based)
  • Extractign sentences (Maximal Margin Relevance)
    • Relevant to query
    • Novel to extracted answers
  • Ordering
    • Chronological ordering
    • Coherence (neighboring sentences are similar or discuss the same entity)
    • Topic ordering (learn the orders of topics from source files)
  • Sentence realization

2.2.2 Information extraction (domain-specific anwsering)

  • Predefined forms about specific domain. (i.e the forms predefined in BioNLP)
    • Biography of a person (birth/death, fame factor, education, nationality)
    • Definition (hypernym)
    • Medical anwser (problem, intervention, outcome)
  • Framewrok
    • Document retrieval
    • Predicate identification ?
    • Data-driven analysis ?
    • Definition creation

Author: Xiao LIU

Created: 2014-10-29 Wed 18:05

Emacs 24.3.1 (Org mode 8.2.10)