File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1185_metho.xml

Size: 14,500 bytes

Last Modified: 2025-10-06 14:08:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1185">
  <Title>Constraint-based RMRS Construction from Shallow Grammars</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 RMRS - For Partial Semantic
Representation
</SectionTitle>
    <Paragraph position="0"> Copestake (2003) presents a formalism for partial semantic representation that is derived from MRS semantics (Copestake et al., 2003). Robust Minimal Recursion Semantics is designed to support novel forms of integrated shallow and deep NLP, by accommodating semantic representations produced by NLP components of various degrees of partiality and depth of analysis - ranging from PoS taggers and NE recognisers over chunk and (non-)lexicalised context-free grammars to deep grammars like HPSG with MRS output structures.</Paragraph>
    <Paragraph position="1"> The potential of a variable-depth semantic analysis is most evident for applications with conflicting requirements of robustness and accuracy. Given a range of NLP components of different depths of analysis that deliver compatible semantic representations, we can apply flexible integration methods: apply voting techniques, or combine partial results from shallow and deep systems (Copestake, 2003).</Paragraph>
    <Paragraph position="2"> To allow intersection and monotonic enrichment of the output representations from shallow systems on one extreme of the scale with complete representations of deep analysis on the other, the missing specifications of the weakest system must be factored out from the most comprehensive deep representations. In the RMRS formalism, this concerns the following main aspects of semantic information: Argument encoding. A 'Parsons style' notation accommodates for partiality of shallow systems wrt. argument identification. Instead of predicates with fixed arity, e.g. l4:on(e0,e,y), predicates and arguments are represented as independent elementary predications: on(l4,e0), ARG1(l4,e), ARG2(l4,y).</Paragraph>
    <Paragraph position="3"> This accounts for uncertainty of argument identification in shallow grammars. Underspecification wrt. the type of argument is modeled in terms of a hierarchy over disjunctive argument types: ARG1&lt; ARG12, ARG2 &lt; ARG12, ARG12 &lt; . . .&lt; ARGn.</Paragraph>
    <Paragraph position="4"> Variable naming and equalities. Constraints for equality of variables in elementary predications are to be added incrementally, to accommodate for knowledge-poor systems like PoS taggers, where the identity of referential variables of, e.g., adjectives and nouns in potential NPs cannot be established, or else chunkers, where the binding of arguments to predicates is only partially established.</Paragraph>
    <Paragraph position="5"> An example of corresponding MRS (1.a) and  RMRS (1.b) representations illustrate these differences, cf. Copestake (2003).</Paragraph>
    <Paragraph position="6"> (1) Every fat cat sat on a mat</Paragraph>
    <Paragraph position="8"/>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 RMRS from Shallow Grammars
</SectionTitle>
    <Paragraph position="0"> We aim at a modular interface for RMRS construction that can be adapted to a wide range of existing shallow grammars such as off-the-shelf chunk parsers or probabilistic (non-)lexicalised PCFGs.</Paragraph>
    <Paragraph position="1"> Moreover, we aim at the construction of underspecified, but maximally constrained (i.e., resolved) RMRS representations from shallow grammars.</Paragraph>
    <Paragraph position="2"> A unification-based account. Chunk-parsers and PCFG parsers for sentential structure do in general not provide functional information that can be used for argument identification. While in languages like English argument identification is to a large extent structurally determined, in other languages arguments are (partially) identified by case marking.</Paragraph>
    <Paragraph position="3"> In case-marking languages, morphological agreement constraints can yield a high degree of completely disambiguated constituents. Morphological disambiguation can thus achieve maximally constrained argument identification for shallow analyses. We therefore propose a unification-based approach for RMRS construction, where agreement constraints can perform morphological disambiguation for partial (i.e. underspecified) argument identification. Moreover, by interfacing shallow analysis with morphological processing we can infer important semantic features for referential and event variables, such as PNG and TENSE information. Thus, morphological processing is also beneficial for languages with structural argument identification.</Paragraph>
    <Paragraph position="4"> A reparsing architecture. In order to realise a modular interface to existing parsing systems, we follow a reparsing approach: RMRS construction takes as input the output structure of a shallow parser. We index the nodes of the parse tree and extract a set of rules and lexicon entries with corresponding node indices. Reparsing of the original input string according to this set of rules deterministically replays the original parse. In the reparsing process we apply RMRS construction principles.</Paragraph>
    <Paragraph position="5"> Constraint-based RMRS construction. We define constraint-based principles for RMRS construction in a typed feature structure formalism. These constraints are applied to the input syntactic structures. In the reparsing step the constraints are resolved, to yield maximally specified RMRS representations.</Paragraph>
    <Paragraph position="6"> The RMRS construction principles are defined and processed in the SProUT processing platform (Drozdzynski et al., 2004). The SProUT system combines finite-state technology with unification-based processing. It allows the definition of finite state transduction rules that apply to (sequences of) typed feature structures (TFS), as opposed to atomic symbols. The left-hand side of a transduction rule specifies a regular expression over TFS as a recognition pattern; the right-hand side specifies the output in terms of a typed feature structure. The system has been extended to cascaded processing, such that the output of a set of rule applications can provide the input for another set of rewrite rules. The system allows several distinct rules to apply to the same input substring, as long as the same (maximal) sequence of structures is matched by these different rules. The output structures defined by these individual rules can be unified, by way of flexible interpreter settings. These advanced configurations allows us to state RMRS construction principles in a modular way.</Paragraph>
    <Paragraph position="7">  auf der Matte - A fat cat sat on the mat phrase &amp; [ID &amp;quot;11&amp;quot;, CAT &amp;quot;NP&amp;quot;, M-ID &amp;quot;1&amp;quot;, M-CAT &amp;quot;S&amp;quot;] lex &amp; [ID &amp;quot;12&amp;quot;, CAT &amp;quot;VVFIN&amp;quot;, M-ID &amp;quot;1&amp;quot;, M-CAT &amp;quot;S&amp;quot;] phrase &amp; [ID &amp;quot;13&amp;quot;, CAT &amp;quot;PP&amp;quot;, M-ID &amp;quot;1&amp;quot;, M-CAT &amp;quot;S&amp;quot;] lex &amp; [ID &amp;quot;111&amp;quot;, CAT &amp;quot;ART&amp;quot;, M-ID &amp;quot;11&amp;quot;, M-CAT &amp;quot;NP&amp;quot;] lex &amp; [ID &amp;quot;112&amp;quot;, CAT &amp;quot;ADJA&amp;quot;, M-ID &amp;quot;11&amp;quot;, M-CAT &amp;quot;NP&amp;quot;] lex &amp; [ID &amp;quot;113&amp;quot;, CAT &amp;quot;NN&amp;quot;, M-ID &amp;quot;11&amp;quot;, M-CAT &amp;quot;NP&amp;quot;] lex &amp; [ID &amp;quot;131&amp;quot;, CAT &amp;quot;APPR&amp;quot;, M-ID &amp;quot;13&amp;quot;, M-CAT &amp;quot;PP&amp;quot;] lex &amp; [ID &amp;quot;132&amp;quot;, CAT &amp;quot;ART&amp;quot;, M-ID &amp;quot;13&amp;quot;, M-CAT &amp;quot;PP&amp;quot;] lex &amp; [ID &amp;quot;133&amp;quot;, CAT &amp;quot;NN&amp;quot;, M-ID &amp;quot;13&amp;quot;, M-CAT &amp;quot;PP&amp;quot;]  Cascaded Reparsing. We extract information about phrase composition from the indexed input parse trees. For each local subtree, we extract the sequence of daughter nodes as TFS, recording for each node its node identifier (ID) together with the identifier (M-ID) and category (M-CAT) of its mother node (cf. Figure 2). This implicitly encodes instructions for phrase composition that are employed in the cascaded system to guide phrase composition and concurrent semantics construction.</Paragraph>
    <Paragraph position="8"> A general reparsing rule (cf. Figure 3) is applied to an input sequence of TFS for lexical or phrasal nodes and produces as output a TFS for the implicitly defined mother node. The rule specifies that for all nodes in the matched input sequence, their mother node identifier and category features (M-ID, M-CAT) must be identical, and defines the output (mother) node's local identifier and category feature (ID, CAT) by use of variable co-references (#var).</Paragraph>
    <Paragraph position="9"> Since the system obeys a longest-match strategy, the regular expression is constrained to apply to the same constituents as in the original parse tree.</Paragraph>
    <Paragraph position="10"> Cascaded reparsing first applies to the sequence of leaf nodes. The output node sequence is enriched with the phrase-building information from the original parse tree, and is again input to the phrase building and semantics construction rules. Thus, we define a cyclic cascade, where the output of a cascade is fed in as input to the same rules. The cycle terminates when no phrase building rule could be applied to the input, i.e. the root category has been derived.</Paragraph>
    <Paragraph position="11">  application, the SProUT system performs morphological lookup on the input words (Krieger and Xu, 2003). Morphological information is modeled in a TFS hierarchy with disjunctive types to underspecify ambiguities of inflectional features, e.g. case. We define very general principles for morpho-syntactic agreement, defining agreement between daughter and mother constituents individually for categories like determiner, adjective or noun (Figure 4). Since in our reparsing approach the constituents are pre-defined, the agreement projection principles can be stated independently for possible mother-daughter relations, instead of specifying complex precedence patterns for NPs. Defining morphological agreement independently for possibly occurring daughter constituents yields few and very general (disjunctive) projection principles that can apply to &amp;quot;unseen&amp;quot; constituent sequences.</Paragraph>
    <Paragraph position="12"> The rule in Figure 4 again exploits the longest-match strategy to constrain application to the pre-defined constituents, by specifying coreferent M-ID features for all nodes in the rule's input sequence.</Paragraph>
    <Paragraph position="13"> In reparsing, the (possibly disjunctive) morphological types in the output structure of the individual rule applications are unified, yielding partially resolved inflectional features for the mother node.</Paragraph>
    <Paragraph position="14"> For NP11, e.g., we obtain CASE nom by unification of nom (from ART and ADJA) and nom-accdat (from NN). The resolved case value of the NP can be used for (underspecified) argument binding in RMRS construction.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Semantics Projection Principles for
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Shallow Grammars
</SectionTitle>
      <Paragraph position="0"> Lexical RMRS conditions. Lexical entries for RMRS construction are constrained by types for PoS classes, with class-specific elementary predications (EP) in RMRS.RELS, cf. Figure 5. RELS and CONS are defined as set-valued features instead of lists. This allows for modular content projection principles (see below). We distinguish different types of EPs: ep-rel, defining relation and label, ep-rstr and ep-body for quantifiers, with LB and RSTR/BODY features. Arguments are encoded as a type ep-arg, which expands to disjunctive subtypes ep-arg-1, ep-arg-12, ep-arg-23, . . . , ep-arg-n.</Paragraph>
      <Paragraph position="1">  SProUT, the unification of output structures with set-valued features is defined as set union. While the classical list representation would require multiple content rules for different numbers of daughters, the set representation allows us to state a single content principle: it applies to each individual daughter, and yields the union of the projected set elements as the semantic value for the mother constituent.</Paragraph>
      <Paragraph position="2"> Argument and variable binding. Management features (KEY, BIND-ARG) propagate values of labels and variables for argument binding. The maximally specific type ep-arg-x of the arguments to be bound is determined by special bind-arg principles that define morpho-syntactic constraints (case, passive).</Paragraph>
      <Paragraph position="3"> For languages with structural argument identification we can employ precedence constraints in the regular expression part of argument binding rules.</Paragraph>
      <Paragraph position="4"> Content projection from flat structures. A challenge for principle-based RMRS construction from shallow grammars are their flat syntactic structures. They do not, in general, employ strictly binary structures as assumed in HPSG (Flickinger et al., 2003). Constituents may also contain multiple heads (cf. the PP in Fig. 1). Finally, chunk parsers do not resolve phrasal attachment, thus providing discontinuous constituents to be accounted for.</Paragraph>
      <Paragraph position="5"> With flat, non-binary structures, we need to assemble EP (ep-arg-x) conditions for argument binding for each potential argument constituent of a phrase. In the SRroUT system, this can again be done without explicit list operations, by application of individual argument binding rules that project binding EP conditions for each potential argument to the RELS feature of the mother. Thus, similar to Figure 6, we can state general and modular mother-daughter principles for argument binding.</Paragraph>
      <Paragraph position="6"> For multiple-headed constituents, such as flat PPs, we use secondary KEY and BIND-ARG features. For argument binding with chunk parsers, where PP attachment is not resolved, we will generate in-group conditions that account for possible attachments.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML