File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/94/c94-2173_concl.xml
Size: 2,788 bytes
Last Modified: 2025-10-06 13:57:13
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-2173"> <Title>PATTERN MATCHING IN THE TEXTRACT INFORMATION EXTRACTION SYSTEM</Title> <Section position="10" start_page="7067" end_page="7067" type="concl"> <SectionTitle> 7 CON(JL(JSION </SectionTitle> <Paragraph position="0"> lit the ,\];'q)alle,qe lilicro(qeC:tl'OlliCB ai\]d corporate .\](>itll, vent, u res <lonul,i us, TEXT RA CT perf'<)rmed equally with t,h,<.! t, ol>rankiNg oNicial s,ystems ;it, the 'I'IILqTER/M U(\]-5 system eva\]ua,t.ioN. Alt, hough l)erl'oruna,nce of F,.a,IJ,(;rll lua.tching~ tnust be ewdua.ted, Ihe high I)erforma.nce or TEXTI~A(/r suggests that the paLtern n.atcher worked well in extra,cting infof ma,tion froul t\]ie l, ext;. '\]'tie p:<i,l,t(;rn nlal, cher (&quot;'I'I'~X'I'IIA(/I&quot;s scores submitie(I to MU(~-5 were u,official. \[I was scored ollicially M'ler the confelX~llCe, The official scores showed slight dilfcrences from ulmf ficiaJ onus.</Paragraph> <Paragraph position="1"> has not been tested to languages other than Japanese. It is expected to work to other languages with some minor modifications given that the input is segmented into primitive words tagged with their parts of speech.</Paragraph> <Paragraph position="2"> The TEXTRACT Japanese microelectronics system was developed in only three weeks by one person. In spite of its simplicity, it showed the high performance. This result also suggests that the pattern matching architecture is highly portable across similar domains of the same la.nguage, thus facilitating rapid system development. Developing and maintaining TEXTRACT's pattern matching based architecture is easier and less complex than that of a full parsing system, as experienced in the early stage of StIOGUN system development \[aacobs 93b\].</Paragraph> <Paragraph position="3"> Corpus analysis took about half of the development time, since only a KWIC (Key Word In Context) list and a word fi'equency tool were used to acquire the concept-word lists and the template patterns. Using good statistical corpus analysis tools will shorten the development time and promise a high performance. The tools should not only collect patterns of interest with context, but also give statistical data to show how well deft ned patterns are working when they are applied in the system.</Paragraph> <Paragraph position="4"> At MUC-5 meeting, P&R F-measure of one of the top-ranking systems was claimed to be close to the human perfornlance \[Jacobs 93b\]. r To match the system performance of a pattern matching system to human performallce, the preprocessor must recognize expressions to be extracted at nearly 100% accuracy given that other components simply merge information and generate ontput.</Paragraph> </Section> class="xml-element"></Paper>