File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-1126_intro.xml
Size: 4,251 bytes
Last Modified: 2025-10-06 14:06:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1126"> <Title>Mapping Collocational Properties into Machine Learning Features</Title> <Section position="3" start_page="225" end_page="225" type="intro"> <SectionTitle> 2 The Event Categorization Task </SectionTitle> <Paragraph position="0"> This work is part of a larger project on processing newspaper articles to support automatic segmentation and summarization. A fundamental component of reporting is evidentiality (Chafe 1986, van Dijk 1988): What source does the reporter give for his information? Is the information being presented as fact, opinion, or speculation? Our end application is a segmentation of the text into factual and non-factual segments, to include in a document profile for summarization and retrieval. A prerequisite to answering such questions is recognizing where in the text speech events and private states (belief, opinions, perception) are presented. That is the problem addressed here.</Paragraph> <Paragraph position="1"> Specifically, the main state or event of each sentence is classified into one of the following event categories: 1. ps: clauses about private states. &quot;Philip Morris hopes that by taking its Bill. of Rights theme to the airwaves it will reach the broadest possible audience.&quot; 2. se.ds: clauses presenting speech events in the form of direct speech. &quot;I'm hopeful that we'll have further discussions,&quot; Mr. Hahn said.</Paragraph> <Paragraph position="2"> 3. se.ms: speech-event clauses that are mixtures of direct and indirect speech. &quot;The company sPSid the fastener business 'has been under severe cost pressures for some time.' &quot; 4. se.o: clauses presenting speech events in the form of indirect speech, together with clauses about speech events that do not fall in the other speech-event categories.</Paragraph> <Paragraph position="3"> &quot;Stelco Inc. said it plans to shut down three Toronto-area plants.&quot; 5. ps \[ event: private state and either a speech event or other action. &quot;They were at odds over the price.&quot; 6. other: clauses that are not included in any of the other categories. &quot;The-fasteners, nuts and bolts, are sold to the North American auto market.&quot; Speech events and private states are very :frequent in newspaper articles: 48% of the sentences in our corpus. Note that the speech event category is broken into subcategories, co~:responding to different styles. The styles vary in the amount of paraphrase they admit, which in turn strongly affects how the sentence can be integrated into the surrounding discourse. We anticipate these distinctions to be important for future discourse segmentation tasks.</Paragraph> <Paragraph position="4"> This event categorization task is very challenging. The language used for speech events and private states is rich and varied. Metaphor and idiom are widely used (Barnden 1992) and there is a great deal of syntactic and part of speech variation. The classification is also highly context dependent. For example, a word like agree may simply refer to a belief, as in He agrees that interest rates may go down, but may also refer to a specific speech event, as in She said they should begin, and he agreed. For another example, many words normally associated with non-verbal actions may refer directly to speech events, if they appear in a strong speech context: e.g., attack, estimate, explore, guide, analyze, rise above, measure, etc.</Paragraph> <Paragraph position="5"> We developed detailed coding instructions for manual annotation of the data, and performed an inter-coder reliability study, including two expert and one naive annotator. The results of the study, which will be reported elsewhere, are very good. The coding instructions, the annotations of the data, and the results of the study will be made available on the project web site. The event categorization task is a challenging test for the issues concerning collocations addressed in this paper. However, it is important to note that these issues are relevant for any NLP task for which collocational information may be useful, including wordsense disambiguation. null</Paragraph> </Section> class="xml-element"></Paper>