Text Summarization
Table of Contents
1 Type of Tasks
1.1 Applications
- outlines or abstracts,
- summaries of email threads
- action items
- simplifying
1.2 Target
- Single document summarization abstract, outline, headline
- Multi-document
- retrieve stories on the same event
- web pages about some topics or questions
2 Methods
2.1 Single Document Summarization
2.1.1 Generating Summarization
- Content Selection Retrieving relative sentences.
Supervised methods are not better than unsupervised, thus unsupervised methods are more common.- tf-idf
- topic signature (informative words)
- likelihood
- mutual information
- Information Ordering Reordering the sentences, usually keeping the document order.
- Sentence Realization Cleaning up the sentences, usually using the original sentences.
2.1.2 Evaluation
ROUGE-n method evaluates the summaries by counting the n-grams coexisting in generated summaries and human made summaries.
2.2 Multi-document Summarization (Question Answering)
Two kinds of methods: top-down (information extraction) , bottom-up (snippet).
2.2.1 Snippet
- Splitting sentences from documents
- Simplifing sentences (triming parsing trees, rule-based)
- Extractign sentences (Maximal Margin Relevance)
- Relevant to query
- Novel to extracted answers
- Ordering
- Chronological ordering
- Coherence (neighboring sentences are similar or discuss the same entity)
- Topic ordering (learn the orders of topics from source files)
- Sentence realization
2.2.2 Information extraction (domain-specific anwsering)
- Predefined forms about specific domain. (i.e the forms predefined in BioNLP)
- Biography of a person (birth/death, fame factor, education, nationality)
- Definition (hypernym)
- Medical anwser (problem, intervention, outcome)
- Framewrok
- Document retrieval
- Predicate identification ?
- Data-driven analysis ?
- Definition creation