File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2112_concl.xml

Size: 3,311 bytes

Last Modified: 2025-10-06 13:55:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2112">
  <Title>How bad is the problem of PP-attachment? A comparison of English, German and Swedish</Title>
  <Section position="8" start_page="86" end_page="86" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> The most important conclusion to be drawn from the above experiments and observations is the importance of profiling the data sets when working and reporting on PP attachment experiments. The profile should certainly answer the following questions: null 1. What types of nouns where used when the tuples were extracted? (regular nouns, proper names, deverbal nouns, etc.) 2. Are there prepositions which dominate in frequency and attachment rate (like the English preposition of)? If so, how does the data set look like without these dominating prepositions? null 3. What types of prepositions where regarded? (regular prepositions, contracted prepositions (e.g. in German am, im, zur), derived prepositions (e.g. English prepositions derived from gerund verb forms following, including, pending) etc.) 4. Is the extraction procedure restricted to noun+PP sequences in the verb phrase, or does it consider all such sequences? 5. What is the noun attachment rate in the data set? In order to find dominating prepositions we suggest a data profiling that includes the frequency and NARs of all prepositions in the data set. This will also give an overall picture of the number of prepositions involved.</Paragraph>
    <Paragraph position="1"> Our experiments have also shown the advantages of large treebanks for comparative linguistic studies. Such treebanks are even more valuable if they come in the same representation schema (e.g. TIGER-XML) so that they can be queried with the same tools. TIGER-Search has proven to be a suitable treebank query tool for our experiments although its statistics function broke down on some frequency counts we tried on large treebanks. For example, it was not possible to get a list of all prepositions with occurrence frequencies from a 50,000 sentence treebank.</Paragraph>
    <Paragraph position="2"> Another item on our TIGER-Search wish list is a batch mode so that we could run a set of queries and obtain a list of frequencies. Currently we have to trigger each query manually and copy the frequency results manually to an Excel file.</Paragraph>
    <Paragraph position="3"> Other than that, TIGER-Search is a wonderful tool which allows for quick sanity checks of the queries with the help of the highlighted tree structure displays in its GUI.</Paragraph>
    <Paragraph position="4"> We have compared noun attachment rates in English, German and Swedish over treebanks from various sources and with various annotation schemes. Of course, the results would be even better comparable if the treebanks were built on the same translated texts, i.e. on parallel corpora. Currently, there are no large parallel treebanks available. But our group works on such a parallel treebank for English, German and Swedish. Design decisions and first results were reported in (Volk and Samuelsson, 2004) and (Samuelsson and Volk, 2005). We believe that such parallel treebanks will allow a more focused and more detailed comparison of phenomena across languages.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML