File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/h01-1026_intro.xml
Size: 2,779 bytes
Last Modified: 2025-10-06 14:01:06
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1026"> <Title>Facilitating Treebank Annotation Using a Statistical Parser</Title> <Section position="3" start_page="0" end_page="2" type="intro"> <SectionTitle> 3. METHODOLOGY </SectionTitle> <Paragraph position="0"> For the present experiment the parsing model was trained on the entire treebank (99,720 words). We then prepared a new set of 20,202 segmented, POS-tagged words of Xinhua newswire text, which was blindly divided into 3 sets of equal size ( 10 words).</Paragraph> <Paragraph position="1"> Each set was then annotated in two or three passes, as summarized by the following table:</Paragraph> <Section position="1" start_page="0" end_page="2" type="sub_section"> <SectionTitle> Set Pass 1 Pass 2 Pass 3 </SectionTitle> <Paragraph position="0"> Here &quot;Annotators A&B&quot; means that Annotator B checked the work of Annotator A, then for each point of disagreement, both annotators worked together to arrive at a consensus structure. &quot;Parser&quot; is Chiang's parser, adapted to parse Chinese text as described by Bikel and Chiang [1].</Paragraph> <Paragraph position="1"> &quot;Revised parser&quot; is the same parser with additional modifications suggested by Annotator A after correcting Set 2. These revisions primarily resulted from a difference between the artificial evaluation metric used by Bikel and Chiang [1] and this real-world task. The metric used earlier, following common practice, did not take punctuation or empty elements into account, whereas the present task ideally requires that they be present and correctly placed. Thus following changes were made: The parser was originally trained on data with the punctuation marks moved, and did not bother to move the punctuation marks back. For Set 3 we simply removed the preprocessing phase which moved the punctuation marks.</Paragraph> <Paragraph position="2"> Similarly, the parser was trained on data which had all empty elements removed. In this case we simply applied a rule-based postprocessor which inserted null relative pronouns.</Paragraph> <Paragraph position="3"> Finally, the parser often produced an NP (or VP) which dominated only a single NP (respectively, VP), whereas such a are initial trees, is a (predicative) auxiliary tree, g is a modifier tree.</Paragraph> <Paragraph position="4"> structure is not specified by the bracketing guidelines. Therefore we applied another rule-based postprocessor to remove these nodes. (This modification would have helped the original evaluation as well.) In short, none of the modifications required major changes to the parser, but they did improve annotation speed significantly, as we will see below.</Paragraph> </Section> </Section> class="xml-element"></Paper>