File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/p03-2017_metho.xml

Size: 9,963 bytes

Last Modified: 2025-10-06 14:08:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-2017">
  <Title>Towards Interactive Text Understanding</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 MDA: A semantics-based document au-
</SectionTitle>
    <Paragraph position="0"> thoring system The MDA (Multilingual Document Authoring) system [Brun et al 2000] is an instance (descended from Ranta's Grammatical Framework [Ranta 2002]) of a text-mediated interactive natural language generation system, a notion introduced by [Power and Scott 1998] under the name of WYSIWYM. In such systems, an author gradually constructs a semantic representation, but rather than accessing the evolving representation directly, she actually interacts with a natural language text generated from the representation; some regions of the text are active, and correspond to still unspecified parts of the representation; they are associated with menus presenting collections of choices for extending the semantic representation; the choices are semantically explicit and the resulting representation contains no ambiguities. The author thus has the feeling of only interacting with text, while in fact she is building a formal semantic object. One application of this approach is in multilingual authoring: the author interacts with a text in her own language, but the internal representation can be used to generate reliable translations in other languages. Fig. 1 gives an overview of the MDA architecture and Fig. 2 is a screenshot of the</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="3" type="metho">
    <SectionTitle>
3 Interactive Text Understanding
</SectionTitle>
    <Paragraph position="0"> In the current MDA system, menu choices are ordered statically once and for all in the semantic grammar  . However, consider the situation of an author producing a certain text while using some input document as an informal reference source. It would be quite natural to assume that the authoring system could use this document as a source of information in order to prime some of the menu choices.</Paragraph>
    <Paragraph position="1"> Thus, when authoring the description of a pharmaceutical drug, the presence in the input document of the words tablet and solution could serve to highlight corresponding choices in the menu corresponding to the pharmaceutical form of the drug. This would be relatively simple to do, but one could go further: rank menu choices and assign them confidence weights according to textual and contextual hints found in the input document. When the confidence is sufficiently high, the choice could then be performed automatically by the authoring system, which would produce a new portion of the output text, with the author retaining the ability of accepting or rejecting the system's suggestion. In case the confidence is not high enough, the author's choice would still be sped up through displaying the  This kind of functionality is what we call a text-mediated interactive text understanding system, or for short, an ITU system (see Fig. 3).</Paragraph>
    <Paragraph position="3"> While the order between choices listed in a menu does not vary, certain choices may be filtered out depending on the current authoring context; this mechanism relies on unification constraints in the semantic grammar.</Paragraph>
    <Paragraph position="4">  Note that we do not demand that the semantic representation built with an ITU system be a complete representation of the input document, rather it can be a structured description of some thematic aspects of that document. Similarly, it is OK for the input document not to contain enough information permitting the system or even the author to &amp;quot;answer&amp;quot; certain menus: then some active regions of the output text remain unspecified.</Paragraph>
    <Paragraph position="5"> We will now consider some directions to implement an ITU system.</Paragraph>
    <Paragraph position="6"> 4 From document normalization to ITU A first route towards achieving an ITU system is through an extension of ongoing work on document normalization [Max and Dymetman 2002, Max 2003]. The departure point is the following.</Paragraph>
    <Paragraph position="7"> Assume an MDA system is available for authoring a certain type of documents (for instance a certain class of drug leaflets), and suppose one is presented a &amp;quot;legacy&amp;quot; document of the same type, that is, a document containing the same type of information, but produced independently of the MDA system; using the system, a human could attempt to &amp;quot;re-author&amp;quot; the content of the input legacy document, thus obtaining a normalized version of it, as well as an associated semantic representation.</Paragraph>
    <Paragraph position="8"> An attempt to automate the re-authoring process works as follows. Consider the virtual space of semantic representations enumerated by the MDA grammar. For each such representation, produce, through the standard MDA realization process  a certain more or less rough &amp;quot;descriptor&amp;quot; of what the input text should contain if its content should correspond to that semantic representation; then define a similarity measure between this descriptor and the input text; finally perform an admissible heuristic search [Nilsson 1998] of the virtual space to find the semantics whose descriptor has the best similarity with the input text. This architecture can accomodate more or less sophisticated descriptors: from bags of content-words to be intersected with the input text, up to predicted &amp;quot;top-down&amp;quot; predicate-argument tuples to be matched with &amp;quot;bottom-up&amp;quot; tuples extracted from the input text through a rough information-extraction process.</Paragraph>
    <Paragraph position="9"> Up to now the emphasis of this work has been more on automatic reconstruction of a legacy document than on interaction, but we have recently started to think about adapting the approach to ITU. The heuristic search that we mentioned above associates with a menu choice an estimate of the best similarity score that could be obtained by some complete semantic structure extending that choice. It is then possible to rank choices according to that heuristic estimate (or some refinement of it obtained by deepening the  Which was initially designed to produce parallel texts in several languages, but can be easily adapted to the production of non-textual &amp;quot;renderings&amp;quot; of the semantic representations. null search a few steps down the line), and then to propose to the author a re-ranked menu.</Paragraph>
    <Paragraph position="10"> While we are currently pursuing this promising line of research because of its conceptual and algorithmic simplicity, it has some weaknesses.</Paragraph>
    <Paragraph position="11"> It relies on similarity scores between an input text and a descriptor that are defined in a somewhat ad hoc manner, it depends on parameters that are fixed a priori rather than by training, and it is difficult to associate with confidence levels having a clear interpretation.</Paragraph>
    <Paragraph position="12"> A way of solving these problems is to move towards a more probabilistic approach that combines advantages of being built on accepted principles and of having a well-developed learning theory. We finally turn our attention to existing work in this area that holds promise for improving ITU.</Paragraph>
  </Section>
  <Section position="5" start_page="3" end_page="5" type="metho">
    <SectionTitle>
5 Towards statistical ITU
</SectionTitle>
    <Paragraph position="0"> Recent research on the interactive statistical machine translation system TransType [Foster et al, 1997; Foster et al, 2002] holds special interest in relation to ITU. This system, outlined in Fig. 4, aims at helping a translator type her (unconstrained) translation of a source text by predicting sequences of characters that are likely to follow already typed characters in the target text; this prediction is done on the basis of information present in the source text. The approach is similar to standard statistical MT  , but instead of producing one single best translation, the system ranks several completion proposals according to a probabilistic confidence measure and uses this measure to optimize the length of completions proposed to the translator for validation. Evaluations of the first version of TransType have already shown significant gains in terms of the number of keystrokes needed for producing a translation, and work is continuing for making the approach effective in real translation environments. null If we now compare Fig. 3 and Fig. 4, we see strong parallels between TransType and ITU: language model enumerating word sequences vs  Initially statistical MT used a noisy-channel approach [Brown et al. 1993]; but recently [Och and Ney 2002] have introduced a more general framework based on the maximum-entropy principle, which shows nice prospects in terms of flexibility and learnability. An interesting research thread is to use more linguistic structure in a statistical translation model [Yamada and Knight 2001], which has some relevance to ITU since we need to handle structured semantic data.</Paragraph>
    <Paragraph position="1"> grammar enumerating semantic structures, source text vs input text as information sources, match between source text and target text vs match between input text and semantic structure.</Paragraph>
    <Paragraph position="2"> In TransType the interaction is directly with the target text, while in ITU the interaction with the semantic structure is mediated through an output text realization of that structure. We can thus hope to bring some of the techniques developed for TransType to ITU, but let us note that some of the challenges are different: for instance training the semantic grammars in ITU cannot be done on a directly observable corpus of texts.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML