File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-1015_evalu.xml

Size: 6,106 bytes

Last Modified: 2025-10-06 14:00:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1015">
  <Title>Semi-Automatic Recognition of Noun Modifier Relationships</Title>
  <Section position="7" start_page="99" end_page="101" type="evalu">
    <SectionTitle>
7 Evaluation
</SectionTitle>
    <Paragraph position="0"> We present the results of evaluating the NMR analyzer in the context of a large knowledge acquisition experiment (see Barker et al., 1998). The NMR analyzer is one part of a larger interactive semantic analysis system. The experiment evaluated the semantic analysis of Atkinson (1990). We refer to it as the small engines experiment. Other experiments have shown similar results.</Paragraph>
    <Paragraph position="1"> We consider three evaluation criteria. First, we evaluate the analyzer's ability to learn to make better suggestions to the user as more noun phrases are analyzed. Second, we evaluate its coverage by comparing the number of relationships assigned with the total number of such relationships in the text (i.e., the number it should have assigned). Third, we assess the burden that semi-automatic analysis places on the user.</Paragraph>
    <Section position="1" start_page="99" end_page="100" type="sub_section">
      <SectionTitle>
7.1 Improvement in System Performance
</SectionTitle>
      <Paragraph position="0"> Since the system starts with no previous noun phrase analyses, the user is responsible for supplying NMRs at the beginning of a session. To measure the rate of learning, we compare the cumulative number of assignments required from the user to the cumulative number of correct assignments suggested by the system.</Paragraph>
      <Paragraph position="1"> In the small engines experiment, 886 modifier-noun pairs were assigned an NMR. We consider the system's assignment correct when the correct label is among its suggestions. According to this definition, 608 of the 886 NMRs (69%) were assigned correctly by the system. For most  of these assignments (97.5%) the system offered a single suggestion. It had multiple (on average 3.3) suggestions only 22 times.</Paragraph>
      <Paragraph position="2"> Phrase (15): small gasoline engine There is a relationship between gasoline and gasoline_engine.</Paragraph>
      <Paragraph position="3"> Please enter a valid NMR label: inst Do you accept the NMR Instrument: gasoline is used in gasoline__engine Y_ There is a relationship between small and small_gasoline_engine.</Paragraph>
      <Paragraph position="4"> Please enter a valid NMR label: prop Do you accept the NMR Property: small_gasoline__engine is small Y Phrase (16): the repair of diesel engines There is a relationship between diesel and diesel_engine.</Paragraph>
      <Paragraph position="5">  NMR Analyzer's best suggestions for this input: (1) prop: diesel_engine is diesel (2) inst: diesel is used in diesel_engine  Please enter a number between 1 and 2: _2 Do you accept the NMR Instrument: diesel is used in diesel_engine Y There is a relationship between diesel_engine and repair.</Paragraph>
      <Paragraph position="6">  NMR Analyzer's best suggestions for this input: (1) agt: repairis performed by dieselengine (2) caus: diesel_engine causes repair (7) obj: diesel_engine is acted on by repair (12) top: repairis concerned with diesel_engine  Please enter a number between 1 and 12: 7 Do you accept the NMR Object: diesel_en~lin e is acted on by repair Y Phrase (17): diesel engine repair shop Do you accept the NMR Instrument: diesel is used in diesel_engine Y__ Do you accept the NMR Object: diesel_engine is acted on by diesel_engine_.repair Y There is a relationship between diesel_ engine_repair and diesel_enginerepair_shop. Please enter a valid NMR label: purp Do you accept the NMR Purpose: dieselengine_repair__shop is meant for dieseC engine_repair Y Phrase (18): an auto repair center Do you accept the NMR Object: auto is acted on by auto_repair Y Do you accept the NMR Purpose: auto_repair_ centeris meant for auto_repair Y Figure I: NMR analysis interaction for (15)-(18) Figure 2 shows the cumulative number of NMR assignments supplied by the user versus those determined correctly by the system. After about 100 assignments, the system was able to make the majority of assignments automatically. The curves in the figure show that the system learns to make better suggestions as more phrases are analyzed.</Paragraph>
    </Section>
    <Section position="2" start_page="100" end_page="100" type="sub_section">
      <SectionTitle>
7.2 NMR Coverage
</SectionTitle>
      <Paragraph position="0"> The NMR analyzer depends on a parser to find noun phrases in a text. If parsing is not 100% successful, the analyzer will not see all noun phrases in the input text. It is not feasible to find manually the total number of relationships in a text--even in one of only a few hundred sentences. To measure coverage, we sampled 100 modifier-noun pairs at random from the small engines text and found that 87 of them appeared in the analyzer's output. At 95% confidence, we can say that the system extracted between 79.0% and 92.2% of the modifier-noun relationships in the text.</Paragraph>
    </Section>
    <Section position="3" start_page="100" end_page="101" type="sub_section">
      <SectionTitle>
7.3 User Burden
</SectionTitle>
      <Paragraph position="0"> User burden is a fairly subjective criterion. To measure burden, we assigned an &amp;quot;onus&amp;quot; rating to each interaction during the small engines experiment. The onus is a number from 0 to 3.0 means that the correct NMR was obvious, whether suggested by the system or supplied by the user. 1 means that selecting an NMR required a few moments of reflection. A rating of 2 means that the interaction required serious thought, but we were  ultimately able to choose an NMR. 3 means that even after much contemplation, we were unable to agree on an NMR.</Paragraph>
      <Paragraph position="1"> The average user onus rating was 0.1 for NMR interactions in the small engines experiment. 808 of the 886 NMR assignments received an onus rating of 0; 71 had a rating of 1; 7 received a rating of 2. No interactions were rated onus level 3.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML