File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/82/c82-1032_concl.xml
Size: 4,434 bytes
Last Modified: 2025-10-06 13:55:58
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-1032"> <Title>ANALYSIS AND PROCESSING OF COMPACT TEXT</Title> <Section position="6" start_page="203" end_page="203" type="concl"> <SectionTitle> CONCLUSIONS </SectionTitle> <Paragraph position="0"> The computer results presented above, considered along with manual analysis of other document sets, lead to several major conclusions about the characteristics of compact text: I. Repetitive ungranmlaticality is grarmnatical for the text set.</Paragraph> <Paragraph position="1"> Within a given set of data, there are recurrent ungrammatical constructions. These forms can be characterized and made a part of the parsing grammar. The departures from grammaticality are limited and can be related in a regular way to full sentente types in English.</Paragraph> <Paragraph position="2"> II. Word choice is quasi-grammatical.</Paragraph> <Paragraph position="3"> In repetitive single-topic text, word subclasses that are specific to the subject</Paragraph> </Section> <Section position="7" start_page="203" end_page="205" type="concl"> <SectionTitle> ANALYSIS AND PROCESSING OF COMPACT TEXT </SectionTitle> <Paragraph position="0"> Conjunctivae were pale.</Paragraph> <Paragraph position="1"> Appetite is good; She slept well; Eating well. Pain on dorsiflexion of left foot.</Paragraph> <Paragraph position="2"> Temp 99.6, Pulse 120, Respiration Rate 16, Weight 19.5 ibs.</Paragraph> <Paragraph position="3"> Temp normal.</Paragraph> <Paragraph position="4"> Low grade temp finally cleared.</Paragraph> <Paragraph position="5"> DTR's are normal.</Paragraph> <Paragraph position="6"> Heart murmur heard; Slight tenderness to touch, Liver palpable 6 cm.</Paragraph> <Paragraph position="7"> Meningitis; Has sickle cell disease. Patient began to vomit; Patient developed mild cold.</Paragraph> <Paragraph position="8"> She remained well; Pt had a complete recovery. Patient was active; Occasionally rubs hands. ist admission to BH for meningitis. She was seen in Emergency Room because of a temp of 105.</Paragraph> <Paragraph position="9"> To be followed in hematology; Seen in 2) In many cases B-PART word, TEST word are deleted when reconstructable from context, e.g. CSF grew out pneumococeus = CSF culture grew out pneumococcus.</Paragraph> <Paragraph position="10"> 3) Mention of PATIENT is omitted in Patterns I-V. 4) Key: ( ) = optional element { } = choice of one among elements in braces 206 E. MARSH and N. SAGER matter are found in particular combinations. These patterns are so marked that deviations can be considered ungrammatical for the discourse. For example, in medical records, (16) would be possible, while (17) would not. (16) Patient admitted to hospital on 11/5/81. (17) *Meningitis admitted to hospital on 11/5/81, III. Deletions are reconstructable.</Paragraph> <Paragraph position="11"> Deletions are recon~tructable on the basis of both syntax and regularity of sub-class patterning. It was seen above that deleted (reconstructable) elements are either function words known to be deleted in other English'forms (e.g. be), or * distinguished words of the sublanguage (e.g. ~). IV. Texts are convergent.</Paragraph> <Paragraph position="12"> While it would be improper to say &quot;when you've seen one, you've seen them all,&quot; compact te~ts within a given area are remarkably similar. In the set of eight documents referred to above, six generalized semantic patterns occurred in the first document processed. No new types were recognized in the remaining seven documents.</Paragraph> <Paragraph position="13"> On'the lexical level, while new vocabulary is found in each new document (socalled &quot;seepage&quot;), this tends to taper off. In a prior study of journal articles, it was found that sees,age after the 7th article remained at about 20%. In the processing of medical records, we found that, when changing from one medical sub-field to another, the new vocabulary in a set of documents containing 2200 distinct lexical items was 27%.</Paragraph> <Paragraph position="14"> The above four properties of compact text: grammaticallty despite syntactic deviation, regular patterning of subject-specific vocabulary, recoverable deletions, and convergence as new texts are analyzed, make it possible to process the content of documentS with syntactic procedures that operate on the full free text.</Paragraph> </Section> class="xml-element"></Paper>