File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/n04-2006_relat.xml
Size: 2,111 bytes
Last Modified: 2025-10-06 14:15:45
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-2006"> <Title>Automatic Article Restoration</Title> <Section position="3" start_page="0" end_page="0" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> The article generation task could be viewed as a classification problem, whose input is a set of features drawn from the context of an NP, and whose output is the most likely article for that NP. The context features are typically extracted from the syntactic parse tree of a sentence.</Paragraph> <Paragraph position="1"> (Heine, 1998) takes a Japanese NP as input, and classifies it as either definite or indefinite. A hierarchy of rules, ordered by their priorities, are hand-crafted. These rules involve the presence or absence of honorifics, demonstratives, possessives, counting expressions, and a set of verbs and postpositions that provide strong hints. In the appointment scheduling domain, 79.5% of the NPs are classified with an accuracy of 98.9%. The rest are classified by searching for its referent in the discourse context. (Knight and Chander, 1994) uses decision trees to pick either a/an or the for NPs extracted from the Wall Street Journal (WSJ). There are over 30000 features in the trees, including lexical features (e.g., the two words before and after the NP) and abstract features (e.g., the word after the head noun is a past tense verb). By classifying the more frequent head nouns with the trees, and guessing the for the rest, the overall accuracy is 78%.</Paragraph> <Paragraph position="2"> (Minnen et al., 2000) applies a memory-based learning approach to choose between a/an, the and null. Their features are drawn from two sources: first, from the Penn Treebank, such as the NP head and its part-of-speech (POS) and functional tags, the category and functional tags of the constituent embedding the NP, and other determiners in the NP; and second, from a Japanese-to-English translation system, such as the countability preference and semantic class of the NP head. The best result is 83.6% accuracy.</Paragraph> </Section> class="xml-element"></Paper>