File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-1089_evalu.xml
Size: 4,679 bytes
Last Modified: 2025-10-06 13:59:39
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1089"> <Title>Guessing Parts-of-Speech of Unknown Words Using Global Information</Title> <Section position="6" start_page="710" end_page="711" type="evalu"> <SectionTitle> 4 Related Work </SectionTitle> <Paragraph position="0"> Several studies concerning the use of global information have been conducted, especially in named entity recognition, which is a similar task to POS guessing of unknown words. Chieu and Ng (2002) conducted named entity recognition using global features as well as local features. In their ME model-based method, some global features were used such as &quot;when the word appeared first in a position other than the beginning of sentences, the word was capitalized or not&quot;. These global features are static and can be handled in the same manner as local features, therefore Viterbi decoding was used. The method is efficient but does not handle interactions between labels.</Paragraph> <Paragraph position="1"> Finkel et al. (2005) proposed a method incorporating non-local structure for information extraction. They attempted to use label consistency of named entities, which is the property that named entities with the same lexical form tend to have the same label. They defined two probabilistic models; a local model based on conditional random fields and a global model based on log-linear models. Then the final model was constructed by multiplying these two models, which can be seen as unnormalized log-linear interpolation (Klakow, 1998) of the two models which are weighted equally. In their method, interactions between labels in the whole document were considered, and they used Gibbs sampling and simulated annealing for decoding. Our model is largely similar to their model. However, in their method, parameters of the global model were estimated using relative frequencies of labels or were selected by hand, while in our method, global model parameters are estimated from training data so as to fit to the data according to the objective function.</Paragraph> <Paragraph position="2"> One approach for incorporating global information in natural language processing is to utilize consistency of labels, and such an approach have been used in other tasks. Takamura et al.</Paragraph> <Paragraph position="3"> (2005) proposed a method based on the spin models in physics for extracting semantic orientations of words. In the spin models, each electron has one of two states, up or down, and the models give probability distribution of the states. The states of electrons interact with each other and neighboring electrons tend to have the same spin. In their method, semantic orientations (positive or negative) of words are regarded as states of spins, in order to model the property that the semantic orientation of a word tends to have the same orientation as words in its gloss. The mean field approximation was used for inference in their method.</Paragraph> <Paragraph position="4"> Yarowsky (1995) studied a method for word sense disambiguation using unlabeled data. Although no probabilistic models were considered explicitly in the method, they used the property of label consistency named &quot;one sense per discourse&quot; for unsupervised learning together with local information named &quot;one sense per collocation&quot;. There exist other approaches using global information which do not necessarily aim to use label consistency. Rosenfeld et al. (2001) proposed whole-sentence exponential language models. The method calculates the probability of a sentence s as follows: where p0(s) is an initial distribution of s and any language models such as trigram models can be used for this. fi(s) is a feature function and can handle sentence-wide features. Note that if we regard fi,j(t) in our model (Equation (7)) as a feature function, Equation (8) is essentially the same form as the above model. Their models can incorporate any sentence-wide features including syntactic features obtained by shallow parsers. They attempted to use Gibbs sampling and other sampling methods for inference, and model parameters were estimated from training data using the generalized iterative scaling algorithm with the sampling methods. Although they addressed modeling of whole sentences, the method can be directly applied to modeling of whole documents which allows us to incorporate unlabeled data easily as we have discussed. This approach, modeling whole wide-scope contexts with log-linear models and using sampling methods for inference, gives us an expressive framework and will be applied to other tasks.</Paragraph> </Section> class="xml-element"></Paper>