File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-1238_intro.xml
Size: 2,899 bytes
Last Modified: 2025-10-06 14:06:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1238"> <Title>2 Feature identification</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Feature identification </SectionTitle> <Paragraph position="0"> AUG encodes lexical properties as feature structures (specifying such things as part-of-speech, number, tense, person, thematic role, etc.) whose values percolate up through a subsumption hierarchy by the process of unification (Sanfilippo, 1993). Syntactic constraints are imposed by forcing agreement between features of grammatically related structures. null Kazman (1994) argues that features correspond to semantic properties associated with thematic categories (e.g. nouns, verbs and adjectives) and that learning syntax is equivalent to figuring out how these properties impose constraints on the functional categories (e.g. determiners, auxiliaries, and complementizers) of a particular language. This study takes the slightly stronger position that the process by which thematic and functional categories are combined is mediated by morphological inflection. Like Kazman's system, Babel, the focus is on the role of inflectional affixes in the acquisition of agreement.</Paragraph> <Paragraph position="1"> But unlike Babel, which makes inferences over semantically related words identified through set operations on input already tagged with attributes, this work addresses feature identification as a bootstrapping problem, where inflectional affixes and the constraints they impose are inferred from plain text.</Paragraph> <Paragraph position="2"> A first approximation The first objective is to detect when and how inflection is manifest. This is addressed through generalisation on a word suffix tree (WST) constructed for the vocabulary of the language. A WST is a derivative of a letter-based multiway trie built from an ordered set of words. Each .distinct sequence of characters along a path in the trie is collapsed into a single node, resulting in a WST for which all leaf nodes are common suffixes to the prefix terminated by their parent node (Andersson, 1996). A sample portion of a WST is shown in Figure 1. Note that the symbol $ is a kind of NULL suffix, which shows that the parent node is itself a suffix and thus corresponds to the end of an actual word. It follows that its leaf nodes correspond to genuine morphological suffixes.</Paragraph> <Paragraph position="3"> Given that regular inflection is largely realised through suffixion on root categories, a first approximation of these categories may be given by assigning a common lexical identity to words that share the same set of suffixes. That is, it is assumed that words which inflect in the same ways likely belong to the same syntactic category. Clearly not all suffixes are inflectional. Therefore, some general restrictions</Paragraph> </Section> class="xml-element"></Paper>