File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/c00-2089_concl.xml
Size: 2,947 bytes
Last Modified: 2025-10-06 13:52:43
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2089"> <Title>Tagging and Chunking with Bigrams</Title> <Section position="5" start_page="618" end_page="618" type="concl"> <SectionTitle> 4 Conclusions and Future Work </SectionTitle> <Paragraph position="0"> In this 1)aper, we have t)rcscntcd a system tot&quot; Tagging and Chunldng based on an Integrated Language Model that uses a homogeneous tbrmalism (finite-state machine) to combine different knowledge sources: lexical, syntacti(:al and contextual inodels. It is feasible l)oth in terms of 1)erfl)rmanc(; and also in terms of computational (:tliciency.</Paragraph> <Paragraph position="1"> All the models involv(:d are learnt automatically fi'om data, so the system is very tlexibte and 1)ortable and changes in the reference language., lexical tags or other kinds of chunks can be made in a direct way.</Paragraph> <Paragraph position="2"> The tagging accuracy (96.9% using BIG and 96.8% using BIG-BIG) is higher tlmn other similar alIl)roaches. This is because we have used the tag di('tionary (including the test set in it) to restrict the possible tags for unknown words, this assmnplion obviously in(:rease the rates of tagging (we have not done a quantitative study of this factor).</Paragraph> <Paragraph position="3"> As we have mentioned above, the comparison with other approaches is ditficult due mnong other reasons to tim following ones: the definitions of base NP are not always the stone, the sizes of the train and the test sets are difl'erent and the knowledge sources used in the learning process are also different. The precision for NP-chunking is similm' to other statistical at)preaches t)resented in section 1, tbr 1)oth the integrated process (94.6%) and l;tm sequential process using a tagger based on 1)igrams (94.9%). The recall rate is slightly lower than for some apl)roaches using the integrated system (93.6%) and is similar for the sequential process (94.1%). When we used the sequential system taking an error ti'ee input (IDEAL), the performance of the system obviously increased (95.5% precision and 94.7% recall). These results show the influence of tagging errors on the process.</Paragraph> <Paragraph position="4"> Nevertheless, we are studying why the results lietween the integrated process and the sequential process are diflbrent. We are testing how the introduction of soIne adjustnmnt factors among the models tk)r we, ighting the difl'erent 1)robability distribution can lint)rove the results.</Paragraph> <Paragraph position="5"> The models that we have used in this work, are illgrams, but trigrams or any stochastic regular model can be used. In this respect, we have worked on a more coml)lex LMs, formalized as a. finite-state automata which is learnt using Grammatical Inference tectufiques. Also, our ai)l)roach would benefit fl'om the inclusion of lexical-contextual in%rmation into the LM.</Paragraph> </Section> class="xml-element"></Paper>