File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1620_intro.xml
Size: 3,709 bytes
Last Modified: 2025-10-06 14:04:00
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1620"> <Title>Multilingual Deep Lexical Acquisition for HPSGs via Supertagging</Title> <Section position="4" start_page="164" end_page="165" type="intro"> <SectionTitle> 2 Past Research </SectionTitle> <Paragraph position="0"> According to Baldwin (2005b), research on DLA falls into the two categories of in vitro methods, where we leverage a secondary language resource to generate an abstraction of the words we hope to learn lexical items for, and in vivo methods, where the target resource that we are hoping to perform DLA relative to is used directly to perform DLA.</Paragraph> <Paragraph position="1"> Supertagging is an instance of in vivo DLA, as it operates directly over data tagged with the lexical type system for the precision grammar of interest.</Paragraph> <Paragraph position="2"> Research on supertagging which is relevant to this paper includes the work of Baldwin (2005b) in training a transformation-based learner over data tagged with ERG lexical types. We discuss this method in detail in Section 5.2 and replicate this method over our English data set for direct comparability with this previous research.</Paragraph> <Paragraph position="3"> As mentioned above, other work on supertagging has tended to view it as a means of driving a beam search to prune the parser search space (Bangalore and Joshi, 1999; Clark and Curran, 2004). In supertagging, token-level annotations (gold-standard, automatically-generated or otherwise) for a given DLR are used to train a sequential tagger, akin to training a POS tagger over POS-tagged data taken from the Penn Treebank.</Paragraph> <Paragraph position="4"> One related in vivo approach to DLA targeted specifically at precision grammars is that of Fouvry (2003). Fouvry uses the grammar to guide the process of learning lexical items for unknown words, by generating underspecified lexical items for all unknown words and parsing with them.</Paragraph> <Paragraph position="5"> Syntactico-semantic interaction between unknown words and pre-existing lexical items during parsing provides insight into the nature of each unknown word. By combining such fragments of information, it is possible to incrementally arrive at a consolidated lexical entry for that word. That is, the precision grammar itself drives the incremental learning process within a parsing context. An alternate approach is to compile out a set of word templates for each lexical type (with the important qualification that they do not rely on pre-processing of any form), and check for corpus occurrences of an unknown word in such contexts.</Paragraph> <Paragraph position="6"> That is, the morphological, syntactic and/or semantic predictions implicit in each lexical type are made explicit in the form of templates which represent distinguishing lexical contexts of that lexical type. This approach has been shown to be particularly effective over web data, where the sheer size of the data precludes the possibility of linguistic preprocessing but at the same time ameliorates the effects of data sparseness inherent in any lexicalised DLA approach (Lapata and Keller, 2004).</Paragraph> <Paragraph position="7"> Other work on DLA (e.g. Korhonen (2002), Joanis and Stevenson (2003), Baldwin (2005a)) has tended to take an in vitro DLA approach, in extrapolating away from a DLR to corpus or web data, and analysing occurrences of words through the conduit of an external resource (e.g. a secondary parser or POS tagger). In vitro DLA can also take the form of resource translation, in mapping one DLR onto another to arrive at the lexical information in the desired format.</Paragraph> </Section> class="xml-element"></Paper>