File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/00/w00-1427_relat.xml

Size: 6,069 bytes

Last Modified: 2025-10-06 14:15:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1427">
  <Title>Robust, Applied Morphological Generation .... ......... ..... . _</Title>
  <Section position="5" start_page="204" end_page="206" type="relat">
    <SectionTitle>
4 Related Work
</SectionTitle>
    <Paragraph position="0"> We are following a well-established line of research into the use of finite-state techniques for lexical and shallow syntactic NLP tasks (e.g.</Paragraph>
    <Paragraph position="1"> Karttunen et al. (1996)). Lexical transducers have been used extensively for morphological analysis, and in theory a finite-state transducer implementing an analyser can be reversed to produce a generator. However, we are not aware of published research on finite-state morphological generators (1) establishing whether in practice they perform with similar efficiency to morphological analysers, (2) quantifying their type/token accuracy with respect to an independent, extensive 'gold standard', and (3) indicating how easily they can be integrated into larger systems. Furthermore, although a number of finite-state compilation toolkits (e.g.</Paragraph>
    <Paragraph position="2"> t(arttunen (1994)) are publicly available or can  be licensed for research use, associated large- length trailing strings and concatenating suf.scale l.inguis tic .,descriptions=-~ar,,,exa,mple=~n,-:.:~...~.. fixes ........ All ~mo~phologicaUy,..subreguta,r-~ :forms. glish morphological lexicons--are usually commercial products and are therefore not freely available to the NLG research community.</Paragraph>
    <Paragraph position="3"> The work reported here is-also related to work on lexicon representation and morphological processing using the DATR representation language (Cahill, 1993; Evans and Gazdar, must be entered explicitly in the lexicon, as well as irregular ones. The situation is similar in FUF/SURGE, morphological generation in the SURGE grammar (Elhadad and Robin, 1996) being performed by procedures which inspect lemma endings, strip off trailing strings when appropriate, and concatenate suffixes.</Paragraph>
    <Paragraph position="4"> .1996).</Paragraph>
    <Paragraph position="5"> cal and more of an engineering perspective, focusing on morphological generation in the context of wide-coverage practical NLG applications. There are also parallels to research in the two-level morphology framework (Koskenniemi, 1983), although in contrast to our approach this framework has required exhaustive lexica and hand-crafted morphological (unification) grammars in addition to orthographic descriptions (van Noord, 1991; Ritchie et al., 1992). The SRI Core Language Engine (A1shawi, 1992) uses a set of declarative segmentation rules which are similar in content to our rules and are used in reverse to generate word forms. The system, however, is not freely available, again requires an exhaustive stem lexicon, and the rules are not compiled into an efficiently executable finite-state machine but are only interpreted. null The work that is perhaps the most similar in spirit to ours is that of the LADL group, in their compilation of large lexicons of inflected word forms into finite-state transducers (Mohri, 1996). The resulting analysers run at a comparable speed to our generator and the (compacted) executables are of similar size. However, a full form lexicon is unwieldy and inconvenient to update: and a system derived from it cannot cope gracefully with unknown words because it does not contain generalisations about regular or subregular morphological behaviour.</Paragraph>
    <Paragraph position="6"> The morphological components of current widely-used NLG systems tend to consist of hard-wired procedural code that is tightly bound to the workings of the rest of the system.</Paragraph>
    <Paragraph position="7"> For instance, the Nigel grammar (Matthiessen, 1984) contains Lisp code that classifies verb, noun and adjective endings, and these classes are picked up by further code inside the t&lt;PML system (Bateman, 2000) itself which performs inflectional generation by stripping off variable However,.. we,.~adopt .less ..of .a~.theoreti~ .... -..,.,.Jn~ eLtr~ent~.,NI,G~-.systerns,~or.#hographic 4nformation is distributed throughout the lexicon and is applied via the grammar or by hard-wired code. This makes orthographic processing difficult to decouple from the rest of the system, compromising maintainability and ease of reuse.</Paragraph>
    <Paragraph position="8"> For example, in SURGE, markers for alan usage can be added to lexical entries for nouns to indicate that their initial sound is consonantor vowel-like, and is contrary to what their orthography would suggest. (This is only a partial solution since adjectives, adverbs--and more rarely other parts of speech--can follow the indefinite article and thus need the same treatment). The appropriate indefinite article is inserted by procedures associated with the grammar. In DRAFTER-2 (Power et al., 1998), an alan feature can be associated with any lexical entry, and its value is propagated up to the NP level through leftmost rule daughters in the grammar (Power, personal communication).</Paragraph>
    <Paragraph position="9"> Both of these systems interleave orthographic processing with other processes in realisation.</Paragraph>
    <Paragraph position="10"> In addition, neither has a mechanism for stating exceptions for whole subclasses of words, for example those starting us followed by a vowel-such as use and usual--which must be preceded by a. KPML appears not to perform this type of processing at all.</Paragraph>
    <Paragraph position="11"> We are not aware of any literature describing (practical) NLG systems that generate contractions. However, interesting linguistic research in this direction is reported by Pullmn and Zwicky (In preparation),. This work investigates tile underlying syntactic structure of sentences that block auxiliary reductions, for example those with VP ellipsis as in (5).</Paragraph>
    <Paragraph position="12"> (5) *She's usually home wh, en he's.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML