File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-2013_metho.xml
Size: 4,601 bytes
Last Modified: 2025-10-06 14:08:10
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-2013"> <Title>Named Entity Extraction with Conditional Markov Models and Classifiers</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Classification </SectionTitle> <Paragraph position="0"> The candidate phrases proposed by the extraction component are subsequently annotated with sort labels. The main advantage of dividing up the task this way is that we can take a lot more context into account for classifying phrases. For example, features that may be relevant now include: the length of the phrase, the first/last k words in the phrase, the position of the phrase in the sentence, whether the words f'utbol or liga were mentioned in the same sentence, etc. Such features would be awkward to incorporate into a single-phase approach using a Markov model to predict phrase tags at the same time as sort labels.</Paragraph> <Paragraph position="1"> We chose a fairly standard independent feature (a.k.a. &quot;naive Bayes&quot;) model, mostly as a matter of convenience. Obviously any other classifier framework could have been used instead. For both languages we use as features the length of the phrase, its distance from the start of the sentence, the identity of the words inside the phrase viewing it as a set of words (i.e., discarding positional information), the identity and other properties (including whether a word starts with an upper/lower case letter) of the first k and last ka11 words in the phrase, and the identity and other properties of the word(s) preceding and following the phrase. The optimal parameter settings differ for Spanish and Dutch. For example, in Spanish the identities of the first k a1 6 words is very important for classification performance, whereas long preceding or trailing contexts do not help much, if at all. For Dutch, the identities of words inside the phrase is less helpful (k a1 3 is optimal), and more preceding and trailing context has to be used. In addition, knowing whether a sentence (or, ideally, a news article) is about soccer was helpful for Spanish. A feature that tests for the presence of f'utbol and a few semantically related words is the only aspect of the classification component that is particular to one language. Other language specific information, e.g. names of Spanish provinces, did not turn out to be useful.</Paragraph> <Paragraph position="2"> Table 2 shows performance figures for the classification component on the raw development data.</Paragraph> <Paragraph position="3"> Equivalently one can think of these results as if we had applied our classifiers to the output of a perfect extraction component that does not make any mistakes. We can already see for Spanish that performance is lowest for the sort MISC, which does not seem very homogeneous, and may perhaps best be chosen by default if no other class applies. Trying to predict MISC directly seems to be a misguided effort. This will become even clearer below when we look at the overall performance of our approach.</Paragraph> <Paragraph position="4"> Spanish dev. precision recall Fb velopment data sets for the two languages used in this shared task.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Putting it all together </SectionTitle> <Paragraph position="0"> A theoretical problem with our task decomposition is how to train the classifiers used in the second phase. What they will eventually see as input is the output of the extraction component, which may contain mistakes, e.g., cases where the beginning or end of a phrase was mispredicted. Since we want to build and refine the classification component independently of the extraction component, we have to train the classifiers on the phrases in the labeled training data. It is not clear a priori that this kind of independent development comes without a performance penalty, since we may have forgotten to show the second-phase classifiers examples of truncated or badly mangled phrases that were produced because of imperfections of the extraction component which makes up the first phase of our approach. Based on the independence assumption behind the task decomposition we would expect the overall performance on the Spanish development data set to be 0a9 8723 a12 0a9 8217 a13 0a9 7168a9 As we can see from the actual results in Table 3, this is not very far from the observed performance. We conclude that independent development of the two components did not impact overall performance.</Paragraph> </Section> class="xml-element"></Paper>