Text Summarization

Table of Contents

1. Type of Tasks
- 1.1. Applications
- 1.2. Target
2. Methods
- 2.1. Single Document Summarization
  - 2.1.1. Generating Summarization
  - 2.1.2. Evaluation
- 2.2. Multi-document Summarization (Question Answering)
  - 2.2.1. Snippet
  - 2.2.2. Information extraction (domain-specific anwsering)

1 Type of Tasks

1.1 Applications

outlines or abstracts,
summaries of email threads
action items
simplifying

1.2 Target

Single document summarization abstract, outline, headline
Multi-document
- retrieve stories on the same event
- web pages about some topics or questions

2 Methods

2.1 Single Document Summarization

2.1.1 Generating Summarization

Content Selection Retrieving relative sentences.
Supervised methods are not better than unsupervised, thus unsupervised methods are more common.
- tf-idf
- topic signature (informative words)
  - likelihood
  - mutual information
Information Ordering Reordering the sentences, usually keeping the document order.
Sentence Realization Cleaning up the sentences, usually using the original sentences.

2.1.2 Evaluation

ROUGE-n method evaluates the summaries by counting the n-grams coexisting in generated summaries and human made summaries.

2.2 Multi-document Summarization (Question Answering)

Two kinds of methods: top-down (information extraction) , bottom-up (snippet).

2.2.1 Snippet

Splitting sentences from documents
Simplifing sentences (triming parsing trees, rule-based)
Extractign sentences (Maximal Margin Relevance)
- Relevant to query
- Novel to extracted answers
Ordering
- Chronological ordering
- Coherence (neighboring sentences are similar or discuss the same entity)
- Topic ordering (learn the orders of topics from source files)
Sentence realization

2.2.2 Information extraction (domain-specific anwsering)

Predefined forms about specific domain. (i.e the forms predefined in BioNLP)
- Biography of a person (birth/death, fame factor, education, nationality)
- Definition (hypernym)
- Medical anwser (problem, intervention, outcome)
Framewrok
- Document retrieval
- Predicate identification ?
- Data-driven analysis ?
- Definition creation

Author: Xiao LIU

Created: 2014-10-29 Wed 18:05

Emacs 24.3.1 (Org mode 8.2.10)