File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1024_metho.xml
Size: 13,824 bytes
Last Modified: 2025-10-06 14:14:56
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1024"> <Title>Managing information at linguistic interfaces*</Title> <Section position="3" start_page="161" end_page="164" type="metho"> <SectionTitle> 2 The Verbmobil Interface Term </SectionTitle> <Paragraph position="0"> The VIT encodes various pieces of information produced and used in the linguistic modules. The content of a VIT corresponds to a single segment (or utterance) in a dialog turn. This partitioning of turns enables the linguistic components to work incrementally. null</Paragraph> <Section position="1" start_page="161" end_page="162" type="sub_section"> <SectionTitle> 2.1 Multiple Levels of Information </SectionTitle> <Paragraph position="0"> A VIT is a record-like data structure whose fields are filled with semantic, scopal, sortal, morphosyntactic, prosodic, discourse and other information (see Table 1). These slots can be seen as analysis layers collecting different types of linguistic information that is produced by several modules. The information within and between the layers is linked together using constant symbols, called &quot;labels&quot;, &quot;instances&quot; and &quot;holes&quot;. These constants could be interpreted as skolemized logical variables which each denote a node in a graph. Besides purely linguistic information, a VIT contains a unique segment identifier that encodes the time span of the analyzed speech input, the analyzed path of the original word lattice, the producer of the VIT, which language is represented, etc. This identifier is used, for example, to synchronize the processing of analyses from different parsers. For processing aspects of VITs see Section 3.</Paragraph> <Paragraph position="1"> A concrete example of a VIT is given in Figure 2 in a Prolog notation where the slots are also marked.</Paragraph> <Paragraph position="2"> This example is further discussed in Section 2.2.</Paragraph> </Section> <Section position="2" start_page="162" end_page="163" type="sub_section"> <SectionTitle> 2.2 VIT Semantics </SectionTitle> <Paragraph position="0"> The core semantic content of the VIT is contained in the two slots: Conditions and Constraints.</Paragraph> <Paragraph position="1"> The conditions represent the predicates of the semantic content and the constraints the semantic dependency structure over those predicates. This partitioning between semantic content and semantic structure is modelled on the kind of representational metalanguage employed in UDRS semantics (Reyle, 1993) to express underspecification.</Paragraph> <Paragraph position="2"> The semantic representation is, thus, a metalanguage expression containing metavariables, termed labels, that may be assigned to object language constructs. Moreover, such a metalanguage is minimally recursive 3, in that recursive structure is expunged from the surface level by the use of metavariables over the recursive constituents of the object language. null In UDRSs quantifier dependencies and other scope information are underspecified because the constraints provide incomplete information about the assignment of object language structures to labels. However, a constraint set may be monotonically extended to provide a complete resolution. VIT semantics follows a similar strategy but somewhat extends the expressivity of the metalanguage.</Paragraph> <Paragraph position="3"> There are two constructs in the VIT semantic metalanguage which provide for an extension in expressivity relative to UDRSs. These have both been adopted from immediate precursors within the project, such as Bos et al. (1996), and further refined. The first of these is the mechanism of holes and plugging which originates in the hole semantics of Bos (1996). This requires a distinction between two types of metavariable employed: labels and holes. Labels denote the instantiated structures, primarily the individual predicates. Holes, on the other hand, mark the underspecified argument positions of propositional arguments and scope domains. Resolution consists of an assignment of la3We owe the term minimal recursion to Copestake et al. (1995), but the mechanism they describe was already in use in UDRSs.</Paragraph> <Paragraph position="4"> bels to holes, a plugging. In general, the constraint set contains partial constraints on such a plugging, expressed by the &quot;less than or equal&quot; relation (leq) but no actual equations, leq(L ,H) should be interpreted as a disjunction that the label is either subordinate to the hole or equal to it. Alternatively, this relation can be seen as imposing a partial ordering.</Paragraph> <Paragraph position="5"> A valid plugging must observe all such constraints.</Paragraph> <Paragraph position="6"> The other extension in expressivity was the introduction of additional constraints between labels. These form another type of abstraction away from logical structure, in that conjunctive structure is also represented by constraints. The purpose of this further abstraction is to allow lexical predicates to be linked into additional structures, beyond that required for a logical semantics. For example, focus structure may group predicates that belong to different scope domains. More immediately, prosodic information, expressed at the lexical level, can be linked into the VIT via lexical predicates. The constraints expressing conjunctive structure relate basic predicate labels and group labels which correspond to a conjunctive structure in the object language, such as a DRS, intuitively a single box. Grouping constraints take the form in_g (L, G), where the &quot;in group&quot; relation (in_g) denotes that the label L is a member of the group G. The content of a group is, thus, defined by the set of such grouping constraints.</Paragraph> <Paragraph position="8"> These two forms of abstraction from logical structure have a tendency to expand the printed form of a VIT, relative to a recursive term structure, but in one respect this is clearly a more compact representation, since lexical predicates occur only once but may engage in various types of structure.</Paragraph> </Section> <Section position="3" start_page="163" end_page="163" type="sub_section"> <SectionTitle> 2.3 An Example </SectionTitle> <Paragraph position="0"> Figure 2 shows the VIT produced by the analysis of the German sentence: Jedes Treffen mit lhnen hat ein interessantes Thema (&quot;Every meeting with you has an interesting topic&quot;). The instances which correspond to object language variables are represented by the sequence {il, i2, ...}, holes by {hi, h2, ...} and labels, including group and predicate labels, by {ll, 12 .... }. The base label of a predicate appears in its first argument position. The predicates haben, arg2 and arg3 share the same label because they form the representation of a single predication, in so-called neo-Davidsonian nota-</Paragraph> </Section> <Section position="4" start_page="163" end_page="164" type="sub_section"> <SectionTitle> ing Constraints </SectionTitle> <Paragraph position="0"> tion (e.g. (Parsons, 1991)). The two groups 120 and 18 form the restrictions of the existential quantifier, ein, and the universal, j ed, respectively. Two of the scoping constraints place the quantifiers' labels below the top hole, the argument of the mood operator (decl). The other two link the quantifiers respective scopes to the bottom label, in this case the main verb, but no constraints are imposed on the relative scope of the quantifiers. The whole structure is best viewed as a (partial) subordination hierarchy, as in A complete resolution would result from an assignment of the labels {11, 15, 117} to the three holes {h23, hi9, h7}. Taking into account the implicit constraint that any argument to a predicate is automatically subordinate to its label, there are in fact only two possibilities: the pluggings pl and p2 given below,</Paragraph> <Paragraph position="2"> corresponding to the two relative scopings of the quantifiers:</Paragraph> <Paragraph position="4"/> <Paragraph position="6"> A full resolution is, however, rarely necessary.</Paragraph> <Paragraph position="7"> Where transfer requires more specific scoping constraints these can be provided in a pairwise fashion, based on default heuristics from word order or prosodic cues, or at the cost of an additional call to the resolution component.</Paragraph> </Section> </Section> <Section position="4" start_page="164" end_page="165" type="metho"> <SectionTitle> 3 VIT Processing </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="164" end_page="164" type="sub_section"> <SectionTitle> 3.1 Lists as Data Structures </SectionTitle> <Paragraph position="0"> Almost all fields in a VIT record are represented as lists. 4 In general, list elements do not introduce any further recursive embedding, i.e. the elements are like fixed records with fields containing constants. The non-recursive nature of these list elements makes the access and manipulation of information (e.g. addition, deletion, refinement, etc.) very convenient. Moreover, the number of list elements can be kept small by distributing information over different slots according to its linguistic nature (see Section 3.2 below) or the producer of the specific kind of data (see Section 2.1). The kind of structuring adopted and the relative shortness of the lists make for rapid access to and operations on VITs.</Paragraph> <Paragraph position="1"> It is significant that efficient information access is dependent not only on an appropriate data structure, but also on the representational formalism implemented in the individual data items. This property has been presented as an independent motivation for minimally recursive representations from the Machine Translation point of view (Copestake et al., 1995), and has been most thoroughly explored in the context of the substitution operations required for transfer. We believe we have taken this argument to its logical conclusion, in implementing a non-recursive semantic metalanguage in an appropriate data structure. This, in itself, provides sufficient motivation for opting for such representations rather than, say, feature structures or the recursive QLFs (Alshawi et al., 1991) of CLE (Alshawi, 1992).</Paragraph> </Section> <Section position="2" start_page="164" end_page="164" type="sub_section"> <SectionTitle> 3.2 ADT Package </SectionTitle> <Paragraph position="0"> In general, linguistic analysis components are very sensitive to changes in input data caused by modifi4In typical AI languages, such as Lisp and Prolog, lists are built-in, and they can be ported easily to other programming languages.</Paragraph> <Paragraph position="1"> cations of analyses or by increasing coverage. Obviously, there is a need for some kind of robustness at the interface level, especially in large distributed software projects like Verbmobil with parallel development of different components. Therefore, components that communicate with each other should abstract over data types used at their interfaces. This is really a further projection of standard software engineering practice into the implementation of linguistic modules.</Paragraph> <Paragraph position="2"> In this spirit, all access to and manipulation of the information in a VIT is mediated by an abstract data type (ADT) package (Doma, 1996). The ADT package can be used to build a new VIT, to fill it with information, to copy and delete information within a VIT, to check the contents (see Section 3.3 below), to get specific information, to print a VIT, etc. To give an example of abstraction, there is no need to know where specific information is stored for later lookup. This is done by the ADT package that manages the adding of a piece of information to the appropriate slot. This means that the external treatment of the VIT as an interface term is entirely independent of the internal implementation and data structure within any of the modules and vice versa. 5</Paragraph> </Section> <Section position="3" start_page="164" end_page="165" type="sub_section"> <SectionTitle> 3.3 Consistency Checking </SectionTitle> <Paragraph position="0"> As a side effect of adopting an extensive ADT package we were able to provide a variety of checking and quality control functions. They are especially useful at interfaces between linguistic modules to check format and content errors. At the format checking level language-specific on-line dictionaries are used to ensure compatibility between the components. A content checker is used to test language-independent structural properties, such as missing or wrongly bound variables, missing or inconsistent information, and cyclicity.</Paragraph> <Paragraph position="1"> As far as we are aware, this is the first time that the results of linguistic components dealing with semantics can be systematically checked at module interfaces. It has been shown that this form of testing is well-suited for error detection in components with rapidly growing linguistic coverage. It is worth noting that the source language lexical coverage in the Verbmobil Research Prototype is around 2500 words, rising to 10K at the end of the second phase 6.</Paragraph> <Paragraph position="2"> Furthermore, the complex information produced by linguistic components even makes automatic output control necessary.</Paragraph> <Paragraph position="3"> The same checking can be used to define a quality rating, e.g. for correctness, interpretability, etc. of the content of a VIT. Such results are much better and more productive in improving a system than common, purely quantitative, measures based on failure or success rates.</Paragraph> </Section> </Section> class="xml-element"></Paper>