File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/p05-1075_concl.xml
Size: 4,297 bytes
Last Modified: 2025-10-06 13:54:42
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1075"> <Title>A Nonparametric Method for Extraction of Candidate Phrasal Terms</Title> <Section position="6" start_page="610" end_page="611" type="concl"> <SectionTitle> 5 Schone and Jurafsky's results differ from Krenn & </SectionTitle> <Paragraph position="0"> Evert (2001)'s results, which indicated that frequency performed better than the statistical measures in almost every case. However, Krenn and Evert's data consisted of n-grams preselected to fit particular collocational patterns. Frequency-based metrics seem to be particularly benefited by linguistic prefiltering.</Paragraph> <Paragraph position="1"> Julius Caesar, Winston Churchill, potato chips, peanut butter, Frederick Douglass, Ronald Reagan, Tia Dolores, Don Quixote, cash register, Santa Claus At ranks 3,000 to 3,010, the bigrams are: Ted Williams, surgical technicians, Buffalo Bill, drug dealer, Lise Meitner, Butch Cassidy, Sandra Cisneros, Trey Granger, senior prom, Ruta Skadi At ranks 10,000 to 10,010, the bigrams are: egg beater, sperm cells, lowercase letters, methane gas, white settlers, training program, instantly recognizable, dried beef, television screens, vienna sausages In short, the n-best list returned by the mutual rank ratio statistic appears to consist primarily of phrasal terms far down the list, even when N is as low as 5. False positives are typically: (i) morphological variants of established phrases; (ii) bigrams that are part of longer phrases, such as cream sundae (from ice cream sundae); (iii) examples of highly productive constructions such as an artist, three categories or January 2.</Paragraph> <Paragraph position="2"> The results for trigrams are relatively sparse and thus less conclusive, but are consistent with the bigram results: the mutual rank ratio measure performs best, with top ranking elements consistently being phrasal terms.</Paragraph> <Paragraph position="3"> Comparison with the n-best list for other metrics bears out the qualitative impression that the rank ratio is performing better at selecting phrasal terms even without filtering. The top ten bigrams for the true mutual information metric at N=5 are: a little, did not, this is, united states, new york, know what, a good, a long, a moment, a small Ranks 3000 to 3010 are: waste time, heavily on, earlier than, daddy said, ethnic groups, tropical rain, felt sure, raw materials, gold medals, gold rush Ranks 10,000 to 10,010 are: quite close, upstairs window, object is, lord god, private schools, nat turner, fire going, bering sea,little higher, got lots The behavior is consistent with known weaknesses of true mutual information -- its tendency to overvalue frequent forms.</Paragraph> <Paragraph position="4"> Next, consider the n-best lists for log-likelihood at N=5. The top ten n-grams are: sheriff poulson, simon huggett, robin redbreast, eric torrosian, colonel hillandale, colonel sapp, nurse leatheran, st. catherines, karen torrio, jenny yonge N-grams 3000 to 3010 are: comes then, stuff who, dinner get, captain see, tom see, couple get, fish see, picture go, building go, makes will, pointed way N-grams 10000 to 10010 are: sayings is, writ this, llama on, undoing this, dwahro did, reno on, squirted on, hardens like, mora did, millicent is, vets did Comparison thus seems to suggest that if anything the quality of the mutual rank ratio results are being understated by the evaluation metric, as the metric is returning a large number of phrasal terms in the higher portion of the n-best list that are absent from the gold standard.</Paragraph> <Paragraph position="5"> Conclusion This study has proposed a new method for measuring strength of lexical association for candidate phrasal terms based upon the use of Zipfian ranks over a frequency distribution combining n-grams of varying length. The method is related in general philosophy of Mutual Expectation, in that it assesses the strenght of connection for each word to the combined phrase; it differs by adopting a nonparametric measure of strength of association. Evaluation indicates that this method may outperform standard lexical association measures, including mutual information, chi-squared, log-likelihood, and the T-score.</Paragraph> </Section> class="xml-element"></Paper>