File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3001_intro.xml

Size: 9,693 bytes

Last Modified: 2025-10-06 14:04:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3001">
  <Title>results from a Wizard-of-Oz Experiment</Title>
  <Section position="4" start_page="1" end_page="3" type="intro">
    <SectionTitle>
2 Corpus and methodology
2.1 Experimental set-up
</SectionTitle>
    <Paragraph position="0"> In order to obtain a corpus of natural QA interactions, we designed a Wizard-of-Oz experiment.</Paragraph>
    <Paragraph position="1"> The experiment was set up in such a way that the exchanges between users and information system would be as representative as possible for the interaction between users and QA systems. We chose an ontology database instead of a text based closed domain QA system, however, because in order to simulate a real system short time responses were needed.</Paragraph>
    <Paragraph position="2"> 30 subjects took part in the experiment, which consisted in solving a task by querying LT-WORLD, an ontology containing information about language technology1, in English. The modality of interaction was typing through a chat-like interface.</Paragraph>
    <Paragraph position="3"> Three different tasks were designed: two of them concentrated on information browsing, the other one on information gathering. In the first task subjects had to find three traineeships at three different projects in three different institutions each on a different topic, and obtain some information about the chosen projects, like a contact address, a description, etc. In the second task, subjects had to find three conferences in the winter term and three conferences in the summer term, each one on a different topic and they had to obtain some information on the chosen conferences such as deadline, place, date.</Paragraph>
    <Paragraph position="4"> etc. Finally, the third task consisted of finding information for writing a report on European language technology in the last ten years. To this end, subjects had to obtain quantitative information on patents, organizations, conferences, etc.</Paragraph>
    <Paragraph position="5"> The Wizard was limited to very few types of responses. The main response was answering a question. In addition, she would provide intermediate information about the state of processing if the retrieval took too long. She could also make statements about the contents of the database when it did not contain the information asked for or when the user appeared confused about the structure of the domain. Finally, she could ask for clarification or more specificity when the question could not be understood. Yet the Wizard was not allowed to take the initiative by offering information that was not explicitely asked for. Thus all actions of the Wizard were directly dependent on those of the user.</Paragraph>
    <Paragraph position="6"> As a result we obtained a corpus of 33 logs (30 plus 3 pilot experiments) containing 125.534 words in 2.534 turns, 1.174 of which are user turns.</Paragraph>
    <Section position="1" start_page="1" end_page="3" type="sub_section">
      <SectionTitle>
2.2 Annotation scheme
</SectionTitle>
      <Paragraph position="0"> The corpus received a multi-layer annotaton2 consisting of five levels. The levels of turns and part-of-speech were automatically annotated. The level of turns records information about the speaker and time  stamp. For the other levels - the questions level, the utterances level, and the entities level - a specific annotation scheme was developed. For these, we only explain the aspects relevant for the present study.</Paragraph>
      <Paragraph position="1">  This level was conceived to keep track of the questions asked by the user which correspond to queries to the database. With the aim of annotating thematic relatedness between questions we distinguished two main kinds of thematic relations: those holding between a question and a previous question, quest(ion)-to-quest(ion)-rel(ation), and those holding between a question and a previous answer, quest(ion)-to-answ(er)-rel(ation).</Paragraph>
      <Paragraph position="2"> Quest-to-quest-rels can be of the following types: * refinement if the current question asks for the same type of entity as some previous question, but the restricting conditions are different, asking, thus, for a subset, superset or disjoint set  of the same class.</Paragraph>
      <Paragraph position="3"> (1) US: How many projects on language technologies are there right now? US: How many have been done in the past? * theme-entity if the current question is about the same entity as some previous question.</Paragraph>
      <Paragraph position="4"> (2) US: Where will the conference take place? US: What is the dead-line for applicants? * theme-property if the current question asks for the same property as the immediately preceding question but for another entity.</Paragraph>
      <Paragraph position="5"> (3) US: Dates of TALK project? US: Dates of DEREKO? * paraphrase if the question is the rephrasing of some previous question.</Paragraph>
      <Paragraph position="6"> * overlap if the content of a question is subsumed by the content of some previous question.</Paragraph>
      <Paragraph position="7"> We distinguish the following quest-to-answ-rels: * refinement if the current question asks for a subset of the entities given in the previous answer. null (4) LT: 3810.</Paragraph>
      <Paragraph position="8"> US: How many of them do research on language technology? * theme if the current question asks about an entity first introduced in some previous answer. (5) LT: Semaduct, ...</Paragraph>
      <Paragraph position="9"> US: What language technology topics does the Semaduct project investigate? Although Chai and Jin (2004) only consider tran null sitions among questions in dialogues about events, most of our relations have a correspondence with theirs. Refinement corresponds to their constraint refinement, theme-property to their participant-shift, and theme-entity to their topic exploration.</Paragraph>
      <Paragraph position="10">  Utterances are classified according to their speech-act: question, answer, assertion, or request. Our annotation of discourse structure is identical in spirit to the one proposed by Ahrenberg et al. (1990). A segment is opened with a user question to the database and is closed with its corresponding answer or an assertion by the system. Clarification requests and their corresponding answers form segments which are embedded in other segments. Requests to wait and assertions about the processing of a question are also embedded in the segment opened by the question.</Paragraph>
      <Paragraph position="11"> Fragmentary utterances are annotated at this level. We distinguish between fragments with a full linguistic source, fragments with a partial source, and fragments showing a certain analogy with the source. The first group corresponds to fragments which are structurally identical to the source and can, thus, be resolved by substitution or extension.  (6) US: Are there any projects on spell checking in Europe in the year 2006? US: And in the year 2005? Fragments with a partial source implicitly refer to some entity previously introduced, but some inference must be done in order to resolve them. (7) US: How is the contact for that project? US: Homepage?  The last group is formed by fragments which show some kind of parallelism with the source but which cannot be resolved by substitution.</Paragraph>
      <Paragraph position="12">  (8) US: Which conferences are offered in this winter term in the subject of English language? US: Any conferences concerning linguistics in general?  We distinguish the following types of reference to entities: identity or co-reference, subset/superset and bridging.</Paragraph>
      <Paragraph position="13"> Co-reference occurs when two or more expressions denote the same entity. Within this group we found the following types of implicit co-referring expressions which involve different degrees of explicitness: elided NPs, anaphoric and deictic pronouns, deictic NPs, and co-referent definite NPs. Elided NPs are optional arguments, that is, they don't need to be in the surface-form of the sentence, but are present in the semantic interpretation. In (9) there is an anaphoric pronoun and an elided NP both referring to the conference Speech TEK West 2006.  (9) US: The Speech TEK West 2006, when does it take place? LT: 2006-03-30 - 2006-04-01.</Paragraph>
      <Paragraph position="14"> US: Until when can I hand in a paper [ ]?  Bridging is a definite description which refers to an entity related to some entity in the focus of attention. The resolution of bridging requires some inference to be done in order to establish the connection between the two entities. In example (2) in subsection 2.2.1 there is an occurrence of bridging, where the dead-line is meant to be the dead-line of the conference currently under discussion.</Paragraph>
      <Paragraph position="15"> Finally, subset/superset reference takes place when a linguistic expression denotes a subset or superset of the set of entities denoted by some previous linguistic expression. Subset/superset reference is sometimes expressed through two interesting contextual phenomena: nominal ellipsis3, also called semantic ellipsis, and one-NPs4. Nominal ellipsis occurs within an NP and it is namely the noun what 3Note, however, that nominal ellipsis does not necessarily always denote a subset, but sometimes it can denote a disjoint set, or just lexical material which is omitted. 4One-NPs are a very rare in our corpus, so we are not considering them in the present study.</Paragraph>
      <Paragraph position="16"> is missing and must be recovered from the context. Here follows an example: (10) US: Show me the three most important.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML