File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-3015_metho.xml
Size: 9,703 bytes
Last Modified: 2025-10-06 14:09:49
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-3015"> <Title>Syntax-based Semi-Supervised Named Entity Tagging</Title> <Section position="5" start_page="57" end_page="57" type="metho"> <SectionTitle> 3 Named Entity Recognition </SectionTitle> <Paragraph position="0"> In this level, the system used a group of syntax-based rules to recognize and extract potential named entities from constituency and dependency parse trees. The rules are used to produce our training data; therefore they needed to have a narrow and precise coverage of each type of named entities to minimize the level of training noise.</Paragraph> <Paragraph position="1"> The processing starts from construction of constituency and dependency parse trees from the input text. Potential NEs are detected and extracted based on these syntactic rules.</Paragraph> <Section position="1" start_page="57" end_page="57" type="sub_section"> <SectionTitle> 3.1 Constituency Parse Features </SectionTitle> <Paragraph position="0"> Replicating the study performed by Collins-Singer (1999), we used two constituency parse rules to extract a set of proper nouns (along with their associated contextual information). These two constituency rules extracted proper nouns within a noun phrase that contained an appositive phrase and a proper noun within a prepositional phrase.</Paragraph> </Section> <Section position="2" start_page="57" end_page="57" type="sub_section"> <SectionTitle> 3.2 Dependency Parse Features </SectionTitle> <Paragraph position="0"> We observed that a proper noun acting as the sub-ject or the object of a sentence has a high probability of being a particular type of named entity.</Paragraph> <Paragraph position="1"> Thus, we expanded our syntactic analysis of the data into dependency parse of the text and extracted a set of proper nouns that act as the subjects or objects of the main verb. For each of the subjects and objects, we considered the maximum span noun phrase that included the modifiers of the subjects and objects in the dependency parse tree.</Paragraph> </Section> </Section> <Section position="6" start_page="57" end_page="58" type="metho"> <SectionTitle> 4 Named Entity Classification </SectionTitle> <Paragraph position="0"> In this level, the system assigns one of the 4 class labels (<PER>, <ORG>, <LOC>, <NONE>) to a given test NE. The NONE class is used for the expressions mistakenly extracted by syntactic features that were not a NE. We will discuss the form of the test NE in more details in section 5. The underlying model we consider is a Naive Bayes classifier; we train it with the Expectation-Maximization algorithm, an iterative parameter estimation procedure.</Paragraph> <Section position="1" start_page="57" end_page="58" type="sub_section"> <SectionTitle> 4.1 Features </SectionTitle> <Paragraph position="0"> We used the following syntactic and spelling features for the classification: Full NE Phrase.</Paragraph> <Paragraph position="1"> Individual word: This binary feature indicates the presence of a certain word in the NE.</Paragraph> <Paragraph position="2"> Punctuation pattern: The feature helps to distinguish those NEs that hold certain patterns of punctuations like (...) for U.S.A. or (&.) for A&M. All Capitalization: This binary feature is mainly useful for some of the NEs that have all capital letters. such as AP, AFP, CNN, etc.</Paragraph> <Paragraph position="3"> Constituency Parse Rule: The feature indicates which of the two constituency rule is used for extract the NE.</Paragraph> <Paragraph position="4"> Dependency Parse Rule: The feature indicates if the NE is the subject or object of the sentence. Except for the last two features, all features are spelling features which are extracted from the actual NE phrase. The constituency and dependency features are extracted from the NE recognition phase (section 3). Depending on the type of testing and training schema, the NEs might have 0 value for the dependency or constituency features which indicate the absence of the feature in the recognition step.</Paragraph> </Section> <Section position="2" start_page="58" end_page="58" type="sub_section"> <SectionTitle> 4.2 Naive Bayes Classifier </SectionTitle> <Paragraph position="0"> We used a Naive Bayes classifier where each NE is represented by a set of syntactic and word-level features (with various distributions) as described above. The individual words within the noun phrase are binary features. These, along with other features with multinomial distributions, fit well into Naive Bayes assumption where each feature is dealt independently (given the class value). In order to balance the effects of the large binary features on the final class probabilities, we used some numerical methods techniques to transform some of the probabilities to the log-space.</Paragraph> </Section> <Section position="3" start_page="58" end_page="58" type="sub_section"> <SectionTitle> 4.3 Semi-supervised learning </SectionTitle> <Paragraph position="0"> Similar to the work of Nigam et al. (1999) on document classification, we used Expectation Maximization (EM) algorithm along with our Naive Bayes classifier to form a semi supervised learning framework. In this framework, the small labeled dataset is used to do the initial assignments of the parameters for the Naive Bayes classifier.</Paragraph> <Paragraph position="1"> After this initialization step, in each iteration the Naive Bayes classifier classifies all of the unlabeled examples and updates its parameters based on the class probability of the unlabeled and labeled NE instances. This iterative procedure continues until the parameters reach a stable point.</Paragraph> <Paragraph position="2"> Subsequently the updated Naive Bayes classifies the test instances for evaluation.</Paragraph> </Section> </Section> <Section position="7" start_page="58" end_page="354" type="metho"> <SectionTitle> 5 Empirical Study </SectionTitle> <Paragraph position="0"> Our study consists of a 9-way comparison that includes the usage of three types of training features and three types of testing schema.</Paragraph> <Section position="1" start_page="58" end_page="58" type="sub_section"> <SectionTitle> 5.1 Data </SectionTitle> <Paragraph position="0"> We used the data from the Automatic Content Extraction (ACE)'s entity detection track as our labeled (gold standard) data.1 For every NE that the syntactic rules extract from the input sentence, we had to find a matching NE from the gold standard data and label the extracted NE with the correct NE class label. If the extracted NE did not match any of the gold standard NEs (for the sentence), we labeled it with the <NONE> class label.</Paragraph> <Paragraph position="1"> We also used the WSJ portion of the Penn Tree Bank as our unlabeled dataset and ran constituency and dependency analyses2 to extract a set of unlabeled named entities for the semi-supervised classification. null</Paragraph> </Section> <Section position="2" start_page="58" end_page="354" type="sub_section"> <SectionTitle> 5.2 Evaluation </SectionTitle> <Paragraph position="0"> In order to evaluate the effects of each group of syntactic features, we experimented with three different training strategies (using constituency rules, dependency rules or combinations of both). We conducted the comparison study with three types of test data that represent three levels of coverage (recall) for the system: 1. Gold Standard NEs: This test set contains instances taken directly from the ACE data, and are therefore independent of the syntactic rules.</Paragraph> <Paragraph position="1"> 2. Any single or series of proper nouns in the text: This is a heuristic for locating potential NEs so as to have the broadest coverage.</Paragraph> <Paragraph position="2"> 3. NEs extracted from text by the syntactic rules. This evaluation approach is similar to that of Collins and Singer. The main difference is that we have to match the extracted expressions to a pre- null labeled gold standard from ACE rather than performing manual annotations ourselves.</Paragraph> <Paragraph position="3"> All tests have been performed under a 5-fold cross validation training-testing setup. Table 1 presents the accuracy of the NE classification and the size of labeled data in the different training-testing configurations. The second line of each cell shows the size of labeled training data and the third line shows the size of testing data. Each column presents the result for one type of the syntactic features that were used to extract NEs. Each row of the table presents one of the three testing schema. We tested the statistical significance of each of the cross-row accuracy improvements against an alpha value of 0.1 and observed significant improvement in all of the testing schemas.</Paragraph> <Paragraph position="4"> Our results suggest that dependency parsing features are reasonable extraction patterns, as their accuracy rates are competitive against the model based solely on constituency rules. Moreover, they make a good complement to the constituency rules proposed by Collins and Singer, since the accuracy rates of the union is higher than either model alone. As expected, all methods perform the best when the test data are extracted in the same manner as the training examples. However, if the systems were given a well-formed named entity, the performance degradation is reasonably small, about 2% absolute difference for all training methods.</Paragraph> <Paragraph position="5"> The performance is somewhat lower when classifying very general test cases of all proper nouns.</Paragraph> </Section> </Section> class="xml-element"></Paper>