Automatic Text Summarization Based on the Global Document Annotation

Katashi NAGAO
名古屋大学 大学院 情報科学研究科


The GDA (Global Document Annotation) project proposes a tag set whichallows machines to automatically infer the underlying semantic/pragmaticstructure of documents. Its objectives are to promote development andspread of NLP/AI applications to render GDA-tagged documents versatileand intelligent contents, which should motivate WWW (World Wide Web) usersto tag their documents as part of content authoring. This paper discussesautomatic text summarization based on GDA. Its main features are adomain/style-free algorithm and personalization on summarization whichre ects readers' interests and preferences. Our solution naturallyoutperforms the traditional summarization methods, which just pick outsentences highly scored on the basis of superficial clues such as wordcount, etc. In order to calculate the importance score of a text element,the algorithm uses spreading activation on an intra-document network whichconnects text elements via thematic, rhetorical, and coreferential relations. The proposed method is exible enough to dynamically generatesummaries of various sizes. A summary browser supporting personalizationis reported as well.