The Structure of Scientific Articles
Applications to Citation Indexing and Summarization
Simone Teufel
Finding a particular scientific document amidst a sea of thousands of
other documents can often seem like an insurmountable task. The
Structure of Scientific Articles shows how linguistic theory can
provide a solution by analyzing rhetorical structures to make
information retrieval easier and faster.
Through the use of an improved citation indexing system, this
indispensable volume applies empirical discourse studies to pressing
issues of document management, including attribution, the author's
stance towards other work, and problem-solving processes.
Simone Teufel is senior lecturer at the natural language and
information processing group at the University of Cambridge computer
laboratory.
- 1 Introduction
- 1.1 Text Understanding and Information Management
- 1.2 Discourse Structure and Scientific Argument
- 1.3 Outline of this Book
- 2 Information Retrieval and Citation Indexes
- 2.1 Information Needs in Science
- 2.2 Keyword-Based Search
- 2.2.1 Information Retrieval Methods
- 2.2.2 Evaluation of Information Retrieval Systems
- 2.3 Citation-Based Search
- 2.3.1 The Citation System and Bibliometry
- 2.3.2 Citation Indexes and Search
- 3 Summarisation
- 3.1 Human Summarisation
- 3.1.1 Summary Journals and Professional Abstractors
- 3.1.2 Structure in Abstracts
- 3.2 Automatic Summarisation
- 3.2.1 Fact Extraction Methods
- 3.2.2 Text Extraction Methods
- 4 New Methods for Information Access
- 4.1 Rhetorical Extracts
- 4.2 Citation Maps
- 5 Experimental Corpora
- 5.1 Computational Linguistics (CmpLG)
- 5.1.1 Source
- 5.1.2 Properties
- 5.1.3 Citation behaviour
- 5.2 Chemistry
- 5.3 Genetics, Cardiology, Agriculture
- 5.4 SciXML
- 5.4.1 Description
- 5.4.2 Transformation from Source Formats
- 6 The Knowledge Claim Discourse Model (KCDM)
- 6.1 Overview of the Model
- 6.2 Level 0: Goals in Argumentation
- 6.3 Level 1: Rhetorical Moves
- 6.4 Level 2: Knowledge Claim Attribution
- 6.5 Level 3: Hinging
- 6.6 Level 4: Linearisation and Presentation
- 6.7 Traditional Intention-Based Discourse Models
- 7 Annotation Scheme Design
- 7.1 Fundamental Concepts
- 7.2 The KCA Scheme (Knowledge Claim Attribution)
- 7.3 The CFC Scheme (Citation Function Classification)
- 7.4 The AZ Scheme (Argumentative Zoning)
- 7.5 Alternative Scheme Definitions
- 8 Reliability Studies
- 8.1 Agreement Metrics, Ceilings and Baselines
- 8.2 Study I: Knowledge Claim Attribution (KCA)
- 8.3 Study II: Argumentative Zoning (AZ)
- 8.4 Study III: Argumentative Zoning, Untrained
- 8.5 Study IV: Citation Function Classification (CFC)
- 8.6 Post-Hoc Analyses of Study II Data
- 9 Meta-Discourse
- 9.1 Actions/States
- 9.2 Agents/Entities
- 9.3 Significance for Text Understanding
- 9.4 Practical Issues
- 9.4.1 Agent- and Action Recognition in Meta-Discourse
- 9.4.2 Ambiguous Mentions of Entities
- 9.4.3 Lexical Equivalence
- 9.5 Use of Meta-Discourse in the Literature
- 9.6 Cross-Discipline Differences in Meta-Discourse
- 10 Features
- 10.1 Entity-Based Meta-Discourse (Ent)
- 10.2 Action-Based Meta-Discourse (Act)
- 10.3 Formulaic Meta-Discourse (Formu, F-Strength, Formu-XXX)
- 10.4 Scientific Attribution (SciAtt-X)
- 10.5 Citations (Cit)
- 10.6 Tense, Voice and Aspect (Syn)
- 10.7 Category History (Hist)
- 10.8 Structural Indicators (Loc, Struct)
- 10.9 Content and Sentence Length (Cont, Len)
- 11 Automatic AZ, KCA and CFC
- 11.1 Feature Determination
- 11.2 Statistical Classification
- 12 Evaluation
- 12.1 Intrinsic Evaluation
- 12.1.1 Automatic AZ
- 12.1.2 Automatic KCA
- 12.1.3 Automatic CFC
- 12.2 Extrinsic Evaluation (AZ)
- 12.2.1 Experimental Design
- 12.2.2 Results
- 13 Applying the KCDM to Other Disciplines
- 13.1 Application to Chemistry
- 13.1.1 Domain Knowledge-Free Annotation
- 13.1.2 Argumentative Zoning II (AZ-II)
- 13.2 Variant AZ-Schemes
- 13.2.1 For Computer Science (Feltrim et al.)
- 13.2.2 For Biology (Mizuta and Collier)
- 13.2.3 For Astrophysics (Merity et al.)
- 13.2.4 For Legal Texts (Hachey and Grover)
- 13.3 Automatic Meta-Discourse Discovery
- 14 Outlook
- 14.1 Support Tools for Scientific Writing
- 14.2 Automatic Review Generation
- 14.3 Scientific Summaries Beyond Extraction
- 14.4 Digital Libraries and Robust AZ
- 15 Conclusions
- 15.1 An Interdisciplinary Project
- 15.2 Limitations
- A CmpLG-D Articles
- B DTD for SciXML
- C Guidelines
- C.1 KCS Guidelines (1998)
- C.2 AZ Guidelines (1998)
- C.3 CFC Guidelines; Excerpt (2005)
- D Lexical Resources
- D.1 Concept Lexicon
- D.2 Formulaic Patterns
- D.3 Entity Patterns
- D.4 Action Lexicon
- References
- Author Index
- Index
July 2010
ISBN (Paperback): 9781575865560
ISBN (Cloth): 9781575865553
ISBN (Electronic): 9781575867328
|
Distributed by the University of Chicago Press
|