File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/00/w00-1204_relat.xml

Size: 5,608 bytes

Last Modified: 2025-10-06 14:15:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1204">
  <Title>Using Co-occurrence Statistics as an Information Source for Partial Parsing of Chinese</Title>
  <Section position="3" start_page="25" end_page="27" type="relat">
    <SectionTitle>
3 Related Work
</SectionTitle>
    <Paragraph position="0"> Statistical measures of association appfied to bigram co-occurrence counts have been used most extensively in terminology and collocation extraction. (Manning and Shfitze, 1999) contains a good introduction to this topic.</Paragraph>
    <Paragraph position="1"> (Kageura, 1999) is an especially good empirical comparison of the performance of several measures of association on a set of tasks in both terminology extraction and in morpheme splitting of Chinese character sequences. This latter tasks which can be seen as a very restricted form of parsing, has been treated in a body of interesting work, including (Sun, Shen and Tsou, 1998), (Lee, 1999) . This work has generally used vee/y simple heuristic control policies, such as repeatedly splitting at the point of lowest mutual information. The use of similar  approaches for general parsing received some early exploration (Brill, Magerman, Marcus and Santofini, 1990), (Magerman and Marcus, 1990), but this approach seems to have lost popularity. This may be because using co-occurrence statistics as a sole source of guidance may become insufficient as the object of parsing moves from the veery local structure of word splitting to the longer-distance dependencies of general parsing. The current work attempts to remedy this by using a general leafing device to balance co-occurrence statistics with other information to be integrated into a larger control policy.</Paragraph>
    <Section position="1" start_page="26" end_page="27" type="sub_section">
      <SectionTitle>
Conclusions and Future Work
</SectionTitle>
      <Paragraph position="0"> Our experiments show that simple statistical information gathered ~om the unprocessed surface structure of large-scale text has value in guiding parsing decisions. However, we feel that there is still a great deal of further advantage to be gained from this approach. Our next step will be to include co-oecu~ence information from a much larger corpus, containing on the order of 108 characters.</Paragraph>
      <Paragraph position="1"> We would also like to experiment with other definitions of co-occurrence. (Yuret, 1998) describes some very interesting work, in a different framework from ours, in which a parser using only co-occurrence mutual information was able to achieve a high precision but low recall when co-occurrence was defined as adjacent co-occurrence, and low precision but high recall when co-occurrence was defined as occurrence within the same sentence. We would like to experiment with ways of balancing these two measures.</Paragraph>
      <Paragraph position="2"> We also suspect that significant gal.~ are possible through a more sophisticated inclusion of the statistics in the decision making process. The current diseretization scheme is very simple, but there is ample empirical evidence that</Paragraph>
      <Paragraph position="4"> discrefization which takes into account target categories can significantly improve classification accuracy (Dougherty, Kohavi, and Sahami, 1995).</Paragraph>
      <Paragraph position="5"> The several articles we have cited which use exclusively co-occurrence information to predict constituent boundaries are very interesting for the simplicity of their control structures, but in one important way they are more complex than the current work: they make decisions by explicitly comparing the measures of association between different pairs of words. We predict that augmenting the feature set to allow our parser to be sensitive to this kind of information would be a very valuable extension.</Paragraph>
      <Paragraph position="6"> A related issue is the choice of learning methodology. The Winnow learner has served us well with its ability to handle very large feature sets, but it is weak in its ability to take advantage of the interaction between features.</Paragraph>
      <Paragraph position="7"> We would like to experiment with learning methods which do not suffer from this weakness, and with methods for automatic feature extraction which could supplement Winnow.</Paragraph>
      <Paragraph position="8"> We experimented with a nondeterministic control policy for the parser, using cost-front search to fred the most probable series of parsing decisions, but we found this not to be very useful. Over a series of comparative experiments, the non.deterministic control policy consistently raised precision by a small margin, lowered recall by a small margin, increased run times by an order of magnitude or more, and for about 10% of the test.set sentences exhausted system resources before finding any parse at all. We posit that these problems may in part be due to the fact that while the Winnow learner is otherwise quite well adapted for our purposes, its output is not intended to be interpreted probabilistically. In the future we intend to run parallel experiments with more probabilisticaUy oriented learners; we  are espeeiaUy interested in experimenting with a Maximum Entropy model.</Paragraph>
      <Paragraph position="9"> In the larger context, we plan to experiment with more sophisticated, model-based unsupervised learning methods, including clustering and beyond, and ways of providing their gathered knowledge to the parser, to make the fullest possible use of the vast wealth of un-annotated text available.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML