File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/p01-1009_metho.xml

Size: 11,965 bytes

Last Modified: 2025-10-06 14:07:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1009">
  <Title>Alternative Phrases and Natural Language Information Retrieval</Title>
  <Section position="4" start_page="0" end_page="8" type="metho">
    <SectionTitle>
3 Analysis
</SectionTitle>
    <Paragraph position="0"> I present a formal approach to alternative phrases that is wider in scope than the alternatives reviewed in Section 2 (although less detailed in some respects than von Fintel and Hoeksema's work).</Paragraph>
    <Section position="1" start_page="0" end_page="8" type="sub_section">
      <SectionTitle>
3.1 Presupposition and Assertion
</SectionTitle>
      <Paragraph position="0"> In my analysis of alternative phrases, I make use of the pragmatic view of presuppositions explored by Lewis (1979) and Stalnaker (1974) which, stated loosely, sees them as propositions that must be true for an utterance to make sense. (For an overview of presupposition, see Beaver (1997).) The semantics of lexical entries are separated into assertion and presupposition as in Stalnaker (1974) and Karttunen and Peters (1979). The idea is also used in Webber et al. (1999) to capture anaphoric (non-structural) links between discourse connectives and material derivable from previous discourse, and in Stone and Doran (1997) and Stone and Webber (1998) for natural language generation.</Paragraph>
      <Paragraph position="1"> Lexical entries are written in the following form, where the semantic parameters scope both the assertion and presuppositions: word'</Paragraph>
      <Paragraph position="3"/>
    </Section>
    <Section position="2" start_page="8" end_page="8" type="sub_section">
      <SectionTitle>
3.2 Alternative Sets
</SectionTitle>
      <Paragraph position="0"> The concept of alternative sets plays an important role in the semantics of alternative phrases.</Paragraph>
      <Paragraph position="1"> An alternative set is a set of propositions which di er with respect to how one or more arguments are lled. For example, the alternative set flike(mary;jen); like(mary;bob);:::g represents the entities that Mary likes. An early discussion of these structures is provided in Karttunen and Peters (1979) where an analysis is given for the focus particle even. Alternative sets are also used by Rooth (1985) and Rooth (1992) to develop a detailed account of focus, particularly with the focus particle only.</Paragraph>
      <Paragraph position="2"> My analysis approximates this set of properties as a pair consisting of a set of entities (e.g. fjen;bob;:::g) and the property they share (e.g.</Paragraph>
      <Paragraph position="3"> x:like(mary;x)). My analyses of alternative phrases uses the relation alts(p;q) which, intuitively, speci es that the two sets of entities denoted by p and q can be found together in at least one alternative set in the knowledge base. The description component of the alternative set (i.e.</Paragraph>
      <Paragraph position="4"> the property) need not be known. It is important to note that although here I focus on unifying such structures, I also make use of the fact that alts is a relation that is symmetric and re exive, but not transitive.</Paragraph>
      <Paragraph position="5"> The alternative phrases I have analyzed fall into two classes: those that assemble a set from elements and those that excise a set from a larger set (as in exceptive phrases). In either case, one particular set of elements is of interest, the gure. With assembly words, the gure is either admitted into the set or combined with a complement to form a set. With excision, the gure is explicitly excluded from the ground. The gure may derive from structurally-related constituents (as with besides), or it may be presupposed (as with other).</Paragraph>
    </Section>
    <Section position="3" start_page="8" end_page="8" type="sub_section">
      <SectionTitle>
3.3 A Grammar Formalism
</SectionTitle>
      <Paragraph position="0"> I implement my analyses with Combinatory Categorial Grammar (Steedman, 1996; Steedman, 2000). CCG is a lexicalized grammar that encodes both the syntactic and semantic properties of a word in the lexicon. For the analyses presented in this paper, standard Categorial Grammar su ces.</Paragraph>
      <Paragraph position="1"> A minor variation is that rather than having the basic categories N and NP, I simply use NP.</Paragraph>
      <Paragraph position="2"> Noun phrases with and without determiners are distinguished with the bare feature.</Paragraph>
    </Section>
    <Section position="4" start_page="8" end_page="8" type="sub_section">
      <SectionTitle>
3.4 An Example
</SectionTitle>
      <Paragraph position="0"> In this section, I provide an analysis of one syntactic form of other in order to illustrate the semantic technique described above. Discussion of alternate syntactic forms and other alternative markers can be found in Bierner (2001).</Paragraph>
      <Paragraph position="1"> The semantic analysis below de nes other as an excision word that excludes the gure from the ground. The gure is a free variable that must be provided from the common ground or discourse, as is the case in (1).</Paragraph>
      <Paragraph position="3"> The analysis allows the derivation in Figure 1.</Paragraph>
      <Paragraph position="4"> other countries</Paragraph>
      <Paragraph position="6"> At this point, the semantics is dependent on the free variable f, the gure. This is re ected by the fact that, in isolation, other countries does not make sense. Although such anaphoric reference is di cult to resolve, in some constructions we can identify the gure without bringing full resolution techniques to bear|as we would have to in (1).</Paragraph>
      <Paragraph position="7"> Some of these constructions, as in Figure 2, are those that contain the word than, whose analysis is given below.</Paragraph>
      <Paragraph position="8"> than'</Paragraph>
      <Paragraph position="10"> The presupposition set in Figure 2 is the union of the presuppositions of other and than, as bound during the derivation. The remaining variable, f, can be determined solely from the derivation's presupposition set using the old AI planning heuristic \use existing objects&amp;quot; (Sacerdoti, 1977) to avoid inventing new objects when others are already available. In particular, we can unify alts(f; x:browser(x) ^:f(x)) and alts(netscape; x:browser(x)^:f(x)), discovering that f, the gure, is netscape. This then instantiates the remaining presupposition, yielding 8x:netscape(x)!browser(x): i.e. Netscape is a browser. Unifying logical forms to instantiate variables in this way follows the \interpretation as abduction&amp;quot; paradigm (Hobbs et al., 1993), where this merging is performed to exploit redundancy for \getting a minimal, and hence a best, interpretation.&amp;quot; null Similar analyses in terms of alternative sets have been developed for many other alternative phrases in Bierner (2001).</Paragraph>
      <Paragraph position="11"> In the next section I show that practical applications such as natural language search engines can bene t from appropriate approximations of this kind of analysis.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="8" end_page="8" type="metho">
    <SectionTitle>
4 Natural Language IR
</SectionTitle>
    <Paragraph position="0"> There are a variety of techniques for allowing natural language queries in information retrieval systems. The simplest approach is simply to remove the \function words&amp;quot; from the query and use the remaining words in a standard keyword search (Alta Vista). In more complex approaches, pattern matching (the EK search engine), parsing (Ask Jeeves), and machine learning (Zelle and Mooney, 1993) techniques can support the association of more appropriate keywords with a query.</Paragraph>
    <Paragraph position="1"> I will concentrate on the pattern matching technique of the Electric Knowledge search engine and other web browsers than Netscape</Paragraph>
    <Paragraph position="3"> shown how a theory of alternative phrases can drastically improve results.</Paragraph>
    <Section position="1" start_page="8" end_page="8" type="sub_section">
      <SectionTitle>
4.1 The EK search engine's Operational
Semantics
</SectionTitle>
      <Paragraph position="0"> The Electric Knowledge search engine uses pattern recognition to transform a natural language question into a series of increasingly more general Boolean queries that can be used with standard back-end retrieval techniques. The question is ltered through a hierarchy of regular expressions, and hand-crafted rules are used to extract information from the question into Boolean expressions. The regular expression matching is aided by an ISA hierarchy such that generalizations of keywords can be captured. As mentioned in Section 1, the fact that the presuppositions of alternative phrases encode hyponym information can be useful in augmenting this aspect of systems like the EK search engine.</Paragraph>
      <Paragraph position="1"> This technique su ers from the fact that, in order to be tractable, this set of patterns is limited and important information in the query can be lost. In particular, the Electric Knowledge search engine does not have patterns that attempt to associate alternative phrases with appropriate pieces of boolean query.</Paragraph>
      <Paragraph position="2"> To overcome this, an appropriate approximation of the semantic result of my analysis that is compatible with the back end search system must be found. For the Electric Knowledge search engine (similar approaches are certainly possible for other NLIR systems), a hybrid query has been introduced to account for alternative phrases, which combines a natural language query with further restrictions added in a system-speci c language.</Paragraph>
      <Paragraph position="3"> The syntax is shown in (7).</Paragraph>
      <Paragraph position="4">  (7) Query :j: ANSWER NOT NEAR (j word list) (8) What are some web browsers? :j: ANSWER NOT NEAR (j netscape)  The natural language query is separated from the restrictions by the :j: symbol. The restrictions specify that the answer to the query must not be near to certain words.</Paragraph>
      <Paragraph position="5"> The hybrid query in (8), for example, is a transformation of the original query What are some other web browsers than Netscape?. The EK search engine uses the natural language part of the query to initially locate possible answering documents. The rest of the query is used when gathering evidence for including a document in the nal results. The EK search engine nds a location in the document that should answer the query and then compares it against the criteria appended to the end of the query. If it does not meet the criteria (that it not be near the word Netscape), another location is tried. If there are no more possible answers, the document is rejected.</Paragraph>
      <Paragraph position="6"> This is, of course, not exactly what the original query meant. However, it is superior to queries like ''browsers'' AND NOT ''netscape'' which rejects all pages containing Netscape, even if they also contain other browsers. The evaluation in Section 5 shows that this operational semantics is su cient to dramatically improve the results for queries with alternative phrases.</Paragraph>
    </Section>
    <Section position="2" start_page="8" end_page="8" type="sub_section">
      <SectionTitle>
4.2 The Algorithm
</SectionTitle>
      <Paragraph position="0"> Instead of using the EK search engine's pattern matching on the initial question, the algorithm does a post-analysis of the syntactic structure produced by parsing the question with my analysis.</Paragraph>
      <Paragraph position="1"> The algorithm recursively descends the derivation, searching for semantic forms that result from alternative phrases. The information from the alternative phrase is removed and then appended to the end of the hybrid query in a di erent form. In Figure 2, for example, the information other than Netscape is removed, leaving only web browsers, to form the hybrid query in (8).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML