File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-2003_concl.xml
Size: 7,717 bytes
Last Modified: 2025-10-06 13:54:19
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2003"> <Title>A Robust and Hybrid Deep-Linguistic Theory Applied to Large-Scale Parsing</Title> <Section position="6" start_page="8" end_page="8" type="concl"> <SectionTitle> 5 Applications and Evaluation </SectionTitle> <Paragraph position="0"> of the main advantages of a dependency-based parser such as Pro3Gres over other parsing approaches is that a mapping from the syntactic layer to a semantic layer (meaning representation) is partly simplified (Moll'a et al., 2000; Rinaldi et al., 2002).</Paragraph> <Paragraph position="1"> The original version of the QA system used the Link Grammar (LG) parser (Sleator and Temperley, 1993), which however had a number of significant shortcomings. In particular the set of the dependency relations used in LG is very idiosyncratic, which makes any syntactic-semantic mapping created for LG necessarily unportable and difficult to extend and maintain.</Paragraph> <Paragraph position="2"> A recent line of research concerns applications for the Semantic Web. The documents available in the World Wide Web are mostly written in natural language. As such, they are understandable only to humans. One of the directions of Semantic Web research is about adding a layer to the documents that somehow formalizes their content, making it understandable also to software agents. Such Semantic Web annotations can be seen as a way to mark explicitly the meaning of certain parts of the documents.</Paragraph> <Paragraph position="3"> The dependency relations provided by a parser such as Pro3Gres, combined with domain specific axioms, allow the creation of (some of) the semantic annotations, as described in (Rinaldi et al., 2003; Kaljurand et al., 2004).</Paragraph> <Paragraph position="4"> The modified QA system (using Pro3Gres) is being exploited in the area of 'Life Sciences', for applications concerning Knowledge Discovery over Medline abstracts (Rinaldi et al., 2004a; Dowdall et al., 2004). We illustrate some of the differences between general-purpose parsing and the parsing of highly technical texts like Medline and give two evaluations.</Paragraph> <Section position="1" start_page="8" end_page="8" type="sub_section"> <SectionTitle> 5.1 General unrestricted texts </SectionTitle> <Paragraph position="0"> We first report an evaluation on sentences from an open domain, which gives a good impression of the performance of the parser on general, unrestricted text.</Paragraph> <Paragraph position="1"> In traditional constituency approaches, parser evaluation is done in terms of the correspondence of the bracketting between the gold standard and the parser output. (Lin, 1995; Carroll et al., 1999) suggest evaluating on the linguistically more meaningful level of syntactic relations. Two evaluations on the syntactic relation level are reported in the following.</Paragraph> <Paragraph position="2"> First, a general-purpose evaluation using a hand-compiled gold standard corpus (Carroll et al., 1999), which contains the grammatical relation data of 500 random sentences from the Susanne corpus.</Paragraph> <Paragraph position="3"> The performance, shown in table 2, is, according to (Preiss, 2003), similar to a large selection of statistical parsers and a grammatical relation finder. Relations involving long-distance dependencies form part of these relations. In order to measure specifically their performance, a selection of them is also given: WH-Subject (WHS), WH-Object (WHO), passive Subject (PSubj), control Subject (CSubj), and the anaphor of the relative clause pronoun (RclSubjA).</Paragraph> </Section> <Section position="2" start_page="8" end_page="8" type="sub_section"> <SectionTitle> 5.2 Parsing highly technical language </SectionTitle> <Paragraph position="0"> While measuring general parsing performance is fundamental in the development of any parsing system there is a danger of fostering domain dependence in concentrating on a single domain.</Paragraph> <Paragraph position="1"> In order to answer how the parser performs over domains markedly different to the training corpus , the parser has been applied to the GENIA corpus (Kim et al., 2003), 2000 MEDLINE abstracts of more than 400,000 words describing the results of Biomedical research.</Paragraph> <Paragraph position="2"> Average sentence length is 27 words, the lan- null corpus, on subject, object, PP-attachment and clause subordination relations guage is very technical and extremely domainspecific. But the most striking characteristic of this domain is the frequency of MultiWord Terms (MWT) which are known to cause serious problems for NLP systems (Sag et al., 2002), (Dowdall et al., 2003). The token to chunk ratio: NPs = 2.3 , VPs = 1.3 (number of tokens divided by the number of chunks) is unusually high.</Paragraph> <Paragraph position="3"> The GENIA corpus does not include any syntactic annotation (making standard evaluation more difficult) but approx. 100, 000 MWTs are annotated and assigned a semantic type from the GENIA ontology.</Paragraph> <Paragraph position="4"> This novel parsing application is designed to determine how parsing performance interacts with MWT recognition as well as the applicability and possible improvements to the probablistic model over this domain, to test the hypothesis if terminology is the key to a successful parsing system. We do not discard this information, thus simulating a situation in which a near-perfect terminology-recognition tool is at one's disposal. MWT are regarded as chunks, the parsing thus takes place between between the heads of MWT, words and chunks.</Paragraph> <Paragraph position="5"> 100 random sentences from the GENIA corpus have been manually annotated for this evaluation and compared to the parser output. Despite the extreme complexity and technical language, parsing performance under these conditions is considerably better than on the Carroll corpus when using automated chunking, as table 3 reveals.</Paragraph> <Paragraph position="6"> It is worth noting that 10 of the 17 subject precision errors (out of 171 subjects) are &quot;hard&quot; cases involving long-distance dependencies (1 control, 4 relative pronouns) and 5 verb group chunking errors. Equally interesting, 2 of the 4 object recall errors (out of 79 objects) are due to 1 mistagging and 1 mischunking.</Paragraph> <Paragraph position="7"> In practice, MWT extraction is still not automated to the level of chunking or Name Entity recognition simulated in this experiment (for a comprehensive review of the state-of-the-art see (Castellv et al., 2001)). This is, in a large part, due to the lack of definitive orthographic, morphological and syntactic characteristics to differentiante between MWTs and canonical phrases. So MWT extraction remains a semi-automated task performed in cycles with the result of each cycle requiring manual validation. The return for this time consuming activity are the characteristics of MWTs which can be use to fine tune the algorithms during the next extraction cycle.</Paragraph> </Section> <Section position="3" start_page="8" end_page="8" type="sub_section"> <SectionTitle> 6Conclusion </SectionTitle> <Paragraph position="0"> We have suggested a robust, deep-linguistic grammar theory delivering grammatical relation structures as output, which are closer to predicate-argument structures than pure constituency structures, and more informative if non-local dependencies are involved. We have presented an implementation of the theory that is used for large-scale parsing. An evaluation at the grammatical relation level shows that its performance is state-of-the-art.</Paragraph> </Section> </Section> class="xml-element"></Paper>