File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/m98-1021_concl.xml
Size: 2,067 bytes
Last Modified: 2025-10-06 13:58:03
<?xml version="1.0" standalone="yes"?> <Paper uid="M98-1021"> <Title>DATE TRAILER PP SS S P TEXTDOCID ... SLUGSTORYID PREAMBLENWORDS DOC</Title> <Section position="11" start_page="10" end_page="10" type="concl"> <SectionTitle> CONCLUSION </SectionTitle> <Paragraph position="0"> One of the design features of our system which sets it apart from other systems is that it is designed fully within the sgml paradigm: the system is composed from several tools which are connected via a pipeline with data encoded in sgml. This allows the same tool to apply di#0Berent strategies to di#0Berent parts of the texts using di#0Berent resources. The tools do not convert from sgml into an internal format and back, but operate at the sgml level.</Paragraph> <Paragraph position="1"> Our system does not rely heavily on lists or gazetteers but instead treats information from such lists as #5Clikely&quot; and concentrates on #0Cnding contexts in which such likely expressions are de#0Cnite. In fact, the #0Crst phase of the enamex analysis uses virtually no lists but still achieves substantial recall.</Paragraph> <Paragraph position="2"> The system is document centred. This means that at each stage the system makes decisions according to a con#0Cdence level that is speci#0Cc to that processing stage, and drawing on information from other parts of the document. The system is truly hybrid, applying symbolic rules and statistical partial matching techniques in an interleaved fashion.</Paragraph> <Paragraph position="3"> Unsurprisingly the major problem for the system were single capitalised words, mentioned just once or twice in the text and without suggestive contexts. In such a case the system could not apply contextual assignment, assignmentby analogy or lexical lookup.</Paragraph> <Paragraph position="4"> At the time we participated in the muc competition, our system was not particularly fast|it operated at about 8 words per second, taking around 3 hours to process the 100 articles. This has now considerably improved.</Paragraph> </Section> class="xml-element"></Paper>