File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-3301_evalu.xml

Size: 6,691 bytes

Last Modified: 2025-10-06 13:59:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3301">
  <Title>The Semantics of a Definiendum Constrains both the Lexical Semantics and the Lexicosyntactic Patterns in the Definiens</Title>
  <Section position="9" start_page="4" end_page="5" type="evalu">
    <SectionTitle>
7 Results
</SectionTitle>
    <Paragraph position="0"> Our chi-square statistics show that for any pair of semantic types {SDT(X), SDT(Y)}, X [?] Y, the distributions of SDef are statistically different at alpha=0.05; the results show that the semantic types of the defined terms correlate to the semantic types in the definitions. Our results also show that the syntactic patterns are distributed differently among different semantic types of the defined terms (alpha=0.05). null Our results show that many semantic types that appear in definitions are statistically correlated with the semantic types of the defined terms. The average number and standard deviation of statistically correlated semantic types is 80.6+-35.4 at P&lt;&lt;0.0001.</Paragraph>
    <Paragraph position="1"> Figure 1 shows three SDT ([Body Part, Organ, or Organ Component], [Disease or Syndrome], and [Organization]) with the corresponding top five statistically correlated semantic types that appear in their definitions. Our results show that in a total of 112 (or 83.6%) cases, SDT appears as one of the top five statistically correlated semantic types in SDef, and that in a total of 94 (or 70.1%) cases, SDT appears at the top in SDef. Our results indicate that if a definitional term has a semantic type SDT, then the terms in its definition tend to have the same or related semantic types.</Paragraph>
    <Paragraph position="2"> We examined the cases in which the semantic types of definitional terms do not appear in the top five semantic types in the definitions. We found that in all of those cases, the total numbers of definitions that were used for statistical analysis were too small to obtain statistical significance. For example, when SDT is &amp;quot;Entity&amp;quot;, the minimum size for a SDef was 4.75, which is larger than the total number of the definitions (i.e., 4). As a result, some actually correlated semantic types might be undetected due to insufficient sample size.</Paragraph>
    <Paragraph position="3"> Our results also show that the lexicosyntactic patterns of definitional sentences are SDT-dependent.</Paragraph>
    <Paragraph position="4"> Our results show that many lexicosyntactic patterns that appear in definitions are statistically correlated with the semantic types of defined terms.</Paragraph>
    <Paragraph position="5"> The average number and standard deviation of statistically correlated lexico-syntactic patterns is 1656.7+-1818.9 at P&lt;&lt;0.0001. We found that the more definitions an SDT has, the more lexicosyntactic patterns.</Paragraph>
    <Paragraph position="6"> Figure 2 shows the top 10 lexicosyntactic patterns (based on chi-square statistics) that were captured by Autoslog-TS with three different SDT; namely, [Disease or Syndrome], [Body Part, Organ, or Organ Component], and [Organization]. Figure 3 shows the top 10 lexicosyntactic patterns ranked by AutoSlog-TS which incorporated the frequencies of the patterns (Riloff and Philips 2004). Figure 4 lists the top 30 common patterns across all different semantic types SDT. We found that many common lexicosyntactic patterns (e.g., &amp;quot;...known as...&amp;quot;, &amp;quot;...called&amp;quot;, &amp;quot;...include...&amp;quot;) have been identified by other research groups through either manual or semi-automatic pattern discovery (Blair-Goldensohn et al. 2004).</Paragraph>
  </Section>
  <Section position="10" start_page="5" end_page="6" type="evalu">
    <SectionTitle>
8 Discussion
</SectionTitle>
    <Paragraph position="0"> The statistical correlations between SDT and SDef may be useful to enhance the performance of a definition-question-answering system by at least two means. First, the semantic types may be useful for word sense disambiguation. A simple application is to rank definitional sentences based on the distributions of the semantic types of terms in the definitions to capture the definition of a specific sense. For example, a biomedical definitional question answering system may exclude the definition of other senses (e.g., &amp;quot;feeling&amp;quot; as shown in the sentence &amp;quot;The locus of feelings and intuitions; 'in your heart you know it is true'; 'her story would melt your heart.'&amp;quot;) if the semantic types that define &amp;quot;heart&amp;quot; do not include [Body Part, Organ, or Organ Component] of terms other than &amp;quot;heart&amp;quot;.</Paragraph>
    <Paragraph position="1"> Secondly, the semantic-type correlations may be used as features to exclude non-definitional sentences. For example, a biomedical definitional question answering system may exclude the following non-definitional sentence &amp;quot;Heart rate was  unaffected by the drug&amp;quot; because the semantic types in the sentence do not include [Body Part, Organ, or Organ Component] of terms other than &amp;quot;heart&amp;quot;. SDT-dependent lexicosyntactic patterns may enhance both the recall and precision of a definitional question answering system. First, the large sets of lexicosyntactic patterns we generated automatically may expand the smaller sets of lexicosyntactic patterns that have been reported by the existing question answering systems. Secondly, SDT-dependent lexicosyntactic patterns may be used to capture definitions.</Paragraph>
    <Paragraph position="2"> The common lexicosyntactic patterns we identified (in Figure 4) may be useful for a generic definitional question answering system. For example, a definitional question answering system may implement the most common patterns to detect any generic definitions; specific patterns may be implemented to detect definitions with specific SDT.</Paragraph>
    <Paragraph position="3"> One limitation of our work is that the lexicosyntactic patterns generated by Autoslog-TS are within clauses. This is a disadvantage because 1) lexico-syntactic patterns can extend beyond clauses (Cui et al. 2005) and 2) frequently a definition has multiple lexicosyntactic patterns. Many of the patterns might not be generalizible. For example, as shown in Figure 2, some of the top ranked patterns (e.g., &amp;quot;Subj_AuxVp_&lt;dobj&gt;_BE_ARMY&gt;&amp;quot;) identified by AutoSlog-TS may be too specific to the text collection. The pattern-ranking method introduced by AutoSlog-TS takes into consideration the frequency of a pattern and therefore is a better ranking method than the chi-square ranking (shown in Figure 3).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML