File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/c96-2180_abstr.xml
Size: 1,076 bytes
Last Modified: 2025-10-06 13:48:41
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2180"> <Title>United Kingdom</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We describe a robust text-handling component, which can deal with free text in a wide range of formats and can successfully identify a wide range of phenomena, including chemical formulae, dates, numbers and proper nouns. The set of regular expressions used to capture numbers in written form (&quot;sechsundzwanzig&quot;) in German is given as an example. Proper noun &quot;candidates&quot; are identified by means of regular expressions, these being then rejected or accepted on the basis of run-time interaction with the user. This tagging component is integrated in a large-scale grammar development environment, and provides direct input to the grammatical analysis component of the system by means of &quot;lift&quot; rules which convert tagged text into partial linguistic structures. null</Paragraph> </Section> class="xml-element"></Paper>