File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/c00-1065_abstr.xml
Size: 5,156 bytes
Last Modified: 2025-10-06 13:41:35
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1065"> <Title>Hypertags</Title> <Section position="1" start_page="0" end_page="446" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Srinivas (97) enriches traditional morpho-syntactic POS tagging with syntactic information by introducing Supertags. Unfortunately, words are assigned on average a much higher number of Supertags than traditional POS. In this paper, we develop the notion of Hypertag, first introduced in Kinyon (00a) and in Kinyon (00b), which allows to factor the information contained in ~everal Supertags into a single structure and to encode flmctional information in a systematic lnanner. We show why other possible solutions based on mathematical properties of trees are unsatisfactory and also discuss the practical usefulness of this approach.</Paragraph> <Paragraph position="1"> Introduction As a first step prior to parsing, traditional Part of Speech (POS) tagging assigns limited morpho-syntactic information to lexical items. These labels can be more or less fine-grained depending on the tagset , but syntactic information is often absent or limited. Also, most lexical items are assigned several POS. Although lexical ambiguities are dealt with by POS taggers, either in a rule-based or in probabilistic manner, it is useful to delay this decision at a further parsing step (e.g. Giguet (98) shows that knowing constituent boundaries is crucial for solving lexical ambiguity correctly). In order to do so, it would help to be able to encode several POS into one compact representation.</Paragraph> <Paragraph position="2"> In order to assign richer syntactic information to lexical items Joshi & Srinivas (94) and Srinivas (97) introduce the notion of Supertags, developed within the fiamework of Tree Adjoining Grammars (TAG). The idea behind Supertags is to assign to each word in a sentence, instead of a traditional POS, an &quot;elementary tree&quot;, which constitutes a primitive syntactic structure within the TAG frmnework. A supertagged text can then be inputed to a parser or shallow parser, thus alleviating the task of the parser. Several problems remain though: * Even when no lexical ambiguity occurs, each word can anchor several trees (several hundreds for some verbs) I. On average for English a word is associated with 1.5 POS and with 9 supertags (Joshi (99)). One common solution to the problem is to only retain the &quot;best&quot; supertag for each word, or eventually the 3 best supertags for each word, but then early decision has an adverse effect on the quality of parsing if the wrong supertag(s) have been kept : one typically obtains between 75% and 92% accuracy when supertagging, depending on the type of text being supertagged and on the technique used) (cf Srinivas (97), Chen & al (99), Srinivas & Joshi (99)). This means that it may be the case that every word in 4 will be assigned the wrong supertag, whereas typical POS taggers usually achieve an accuracy above 95%.</Paragraph> <Paragraph position="3"> * Supertagged texts rely heavily on the TAG framework and therefore may be difficult to exploit without being familiar with this fornaal ism.</Paragraph> <Paragraph position="4"> * Supertagged texts are difficult to read and thus difficult to annotate manually.</Paragraph> <Paragraph position="5"> * Some structural information contained in Supertags is redundant * Some information is missing, especially with respect to syntactic functions 2.</Paragraph> <Paragraph position="6"> So our idea is to investigate how supertags can be underspecified so that instead of associating a set of supertags to each word, one could associate one single structure, which we call hypertag, and which contains the same information as a set of supertags as well as functional information Our practical goal is fourfolds : a) delaying decision for parsing b) obtaining a compact and readable representation, which can be manually annotated as a step towards building a treebank for French (cf Abeill6 & al. (00a), Cl6ment & Kinyon (00)). c) extracting linguistic information on a large scale such as lcxical preferences for verb subcategorization frames. (cf Kinyon (99a)) (1) Building an efficient, but nonetheless psycholinguistically motivated, processing model for TAGs (cf Kinyon (99b)) Ttms, in addition of being well-defined computational objects (Point a), hypertags should I)e &quot;readable&quot; (point b) and also motivated from a linguistic point of view (Points c & d).</Paragraph> <Paragraph position="7"> In the first part of this paper, we briefly introduce the LTAG frmnework and give exmnples of supertags. In a second part, we investigate several potential ways to underspecify supertags, and show why these solutions are unsatisfactory. In a third part, we explain the solution we have adopted, building up on the notion of MetaGrammar introduced by Candito (96) and Candito (99). Finally, we discuss how this approach can be used in practice, and why it is interesting for frameworks other than LTAGs.</Paragraph> </Section> class="xml-element"></Paper>