File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/n06-1009_concl.xml
Size: 1,507 bytes
Last Modified: 2025-10-06 13:55:08
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1009"> <Title>Role of Local Context in Automatic Deidentification of Ungrammatical, Fragmented Text</Title> <Section position="9" start_page="71" end_page="71" type="concl"> <SectionTitle> 7 Conclusion </SectionTitle> <Paragraph position="0"> We presented a set of experimental results that show that local context contributes more to deidentification than dictionaries and global context when working with medical discharge summaries. These documents are characterized by incomplete, fragmented sentences, and ad hoc language. They use a lot of jargon, many times omit subjects of sentences, use entity names that can be misspelled or foreign words, can include entity names that are ambiguous between PHI and non-PHI, etc. Similar documents in many domains exist; our experiments here show that even on such challenging corpora, local context can be exploited to identify entities.</Paragraph> <Paragraph position="1"> Even a rudimentary statistical representation of local context, as captured by unigrams and bigrams of lemmatized keywords and part-of-speech tags, gives good results and outperforms more sophisticated approaches that rely on global context. The simplicity of the representation of local context and the results obtained using this simple representation are particularly promising for many tasks that require processing ungrammatical and fragmented text where global context cannot be counted on.</Paragraph> </Section> class="xml-element"></Paper>