File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2603_concl.xml

Size: 12,921 bytes

Last Modified: 2025-10-06 13:55:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2603">
  <Title>Decomposition Kernels for Natural Language Processing</Title>
  <Section position="6" start_page="19" end_page="22" type="concl">
    <SectionTitle>
5 Kernels for word semantic ambiguity
</SectionTitle>
    <Paragraph position="0"> Parsing a natural language sentence often involves the choice between different syntax structures that are equally admissible in the given grammar. One of the most studied ambiguity arise when deciding between attaching a prepositional phrase either to the noun phrase or to the verb phrase. An example  could be: 1. eat salad with forks (attach to verb) 2. eat salad with tomatoes (attach to noun)  The resolution of such ambiguities is usually performed by the human reader using its past experiences and the knowledge of the words meaning. Machine learning can simulate human experience by using corpora of disambiguated phrases to compute a decision on new cases. However, given the number of different words that are currently used in texts, there would never be a sufficient dataset from which to learn. Adding semantic information on the possible word meanings would  permitthelearningofrulesthatapplytoentirecategories and can be generalized to all the member words.</Paragraph>
    <Section position="1" start_page="20" end_page="20" type="sub_section">
      <SectionTitle>
5.1 Adding Semantic with WordNet
</SectionTitle>
      <Paragraph position="0"> WordNet (Fellbaum, 1998) is an electronic lexical database of English words built and annotated by linguistic researchers. WordNet is an extensive and reliable source of semantic information that can be used to enrich the representation of a word. Each word is represented in the database by a group of synonym sets (synset), with each synset corresponding to an individual linguistic concept.</Paragraph>
      <Paragraph position="1"> AllthesynsetscontainedinWordNetarelinkedby relations of various types. An important relation connects a synset to its hypernyms, that are its immediately broader concepts. The hypernym (and its opposite hyponym) relation defines a semantic hierarchy of synsets that can be represented as a directed acyclic graph. The different lexical categories (verbs, nouns, adjectives and adverbs) are contained in distinct hierarchies and each one is rooted by many synsets.</Paragraph>
      <Paragraph position="2"> Several metrics have been devised to compute a similarity score between two words using Word-Net. In the following we resort to a multiset version of the proximity measure used in (Siolas and d'Alche Buc, 2000), though more refined alternatives are also possible (for example using the conceptual density as in (Basili et al., 2005)). Given theacyclicnatureofthesemantichierarchies, each synset can be represented by a group of paths that followsthehypernymrelationsandfinishinoneof thetoplevelconcepts. Twopathscanthenbecompared by counting how many steps from the roots they have in common. This number must then be  normalizeddividingbythesquarerootoftheproduct between the path lengths. In this way one can accounts for the unbalancing that arise from different parts of the hierarchies being differently detailed. Given two paths pi and piprime, let l and lprime be their lengths and n be the size of their common part, the resulting kernel is:</Paragraph>
      <Paragraph position="4"> The demonstration that k is positive definite arise from the fact that n can be computed as a positive kernel k[?] by summing the exact match kernels between the corresponding positions in pi and piprime seen as sequences of synset identifiers. The lengths l and lprime can then be evaluated as k[?](pi,pi) and k[?](piprime,piprime) and k is the resulting normalized version of k[?].</Paragraph>
      <Paragraph position="5"> The kernel between two synsets s and sprime can then be computed by the multi-set kernel (G&amp;quot;artner et al., 2002a) between their corresponding paths.</Paragraph>
      <Paragraph position="6"> Synsets are organized into forty-five lexicographer files based on syntactic category and logical groupings. Additional information can be derived by comparing the identifiers l and lprime of these file associated to s and sprime. The resulting synset kernel is:</Paragraph>
      <Paragraph position="8"> where P is the set of paths originating from s and the exact match kernel d(l,lprime) is 1 if l [?] lprime and 0 otherwise. Finally, the kernel ko between two words is itself a multi-set kernel between the corresponding sets of synsets:</Paragraph>
      <Paragraph position="10"> where S are the synsets associated to the word o.</Paragraph>
    </Section>
    <Section position="2" start_page="20" end_page="21" type="sub_section">
      <SectionTitle>
5.2 PP Attachment Experimental Results
</SectionTitle>
      <Paragraph position="0"> The experiments have been performed using the Wall-Street Journal dataset described in (Ratnaparkhi et al., 1994). This dataset contains 20,800 training examples and 3,097 testing examples.</Paragraph>
      <Paragraph position="1"> Each phrase x in the dataset is reduced to a verb xv, its object noun xn1 and prepositional phrase formed by a preposition xp and a noun xn2. The target is either V or N whether the phrase is attachedtotheverborthenoun. Datahavebeenpreprocessed by assigning to all the words their corresponding synsets. Additional meanings derived from specific synsets have been attached to the words as described in (Stetina and Nagao, 1997).</Paragraph>
      <Paragraph position="2"> The kernel between two phrases x and xprime is then computed by combining the kernels between single words using either the sum or the product.</Paragraph>
    </Section>
    <Section position="3" start_page="21" end_page="21" type="sub_section">
      <SectionTitle>
ResultsoftheexperimentsarereportedinTab.3
</SectionTitle>
      <Paragraph position="0"> for various kernels parameters: S or P denote if the sum or product of the kernels between words are used, W denotes that WordNet semantic informationisadded(otherwisethekernelbetweentwo null wordsisjusttheexactmatchkernel)andLdenotes that lexicographer files identifiers are used. An additionalgaussiankernelisusedontopofKpp. The C and g parameters are selected using an independent validation set. For each setting, accuracy, precision and recall values on the test data are reported, along with the standard deviation of the estimatedbinomialdistributionoferrors. Theresults demonstrate that semantic information can help in resolving PP ambiguities. A small difference exists between taking the product instead of the sum of word kernels, and an additional increase in the amount of information available to the learner is given by the use of lexicographer files identifiers.</Paragraph>
      <Paragraph position="1"> 6 Using declarative knowledge for NLP kernel integration Data objects in NLP often require complex representations; suffice it to say that a sentence is naturally represented as a variable length sequence of word tokens and that shallow/deep parsers are reliably used to enrich these representations with links between words to form parse trees. Finally, additional complexity can be introduced by including semantic information. Various facets of this richness of representations have been extensively investigated, including the expressiveness of various grammar formalisms, the exploitation of lexical representation (e.g. verb subcategorization, semantic tagging), and the use of machine readable sources of generic or specialized knowledge (dictionaries, thesauri, domain specific ontologies). All these efforts are capable to address language specific sub-problems but their integration into a coherent framework is a difficult feat. Recent ideas for constructing kernel functions starting from logical representations may offer an appealing solution. G&amp;quot;artner et al. (2002) have proposed a framework for defining kernels on higher-order logic individuals. Cumby and Roth (2003) used description logics to represent knowledge  jointlywithpropositionalizationfordefiningakernel function. Frasconi et al. (2004) proposed kernels for handling supervised learning in a settingsimilartothatofinductivelogicprogramming null where data is represented as a collection of facts and background knowledge by a declarative program in first-order logic. In this section, we briefly reviewthisapproachandsuggestapossiblewayof exploiting it for the integration of different sources of knowledge that may be available in NLP.</Paragraph>
    </Section>
    <Section position="4" start_page="21" end_page="22" type="sub_section">
      <SectionTitle>
6.1 Declarative Kernels
</SectionTitle>
      <Paragraph position="0"> The definition of decomposition kernels as reported in Section 2 is very general and covers almost all kernels for discrete structured data developed in the literature so far. Different kernels are designed by defining the relation decomposing an example into its &amp;quot;parts&amp;quot;, and specifying kernels for individual parts. In (Frasconi et al., 2004) we proposed a systematic approach to such design, consisting in formally defining a relation by the set of axioms it must satisfy. We relied on mereotopology (Varzi, 1996) (i.e. the theory of parts and places) in order to give a formal definition of the intuitive concepts of parthood and connection. The formalization of mereotopological relations allows to automatically deduce instances of such relations on the data, by exploiting the background knowledge which is typically available on structured domains. In (Frasconi et al., 2004) we introduced declarative kernels (DK) as a set of kernels working on mereotopological relations, such as that of proper parthood ([?]P) or more complex relations based on connected parts.</Paragraph>
      <Paragraph position="1"> A typed syntax for objects was introduced in order to provide additional flexibility in defining kernels on instances of the given relation. A basic kernel on parts KP was defined as follows:</Paragraph>
      <Paragraph position="3"> where dT matches objects of the same type and k is a kernel over object attributes.</Paragraph>
      <Paragraph position="4">  Declarative kernels were tested in (Frasconi et al., 2004) on a number of domains with promising results, includingabiomedicalinformationextraction task (Goadrich et al., 2004) aimed at detecting protein-localization relationships within Medline abstracts. A DK on parts as the one defined in Eq. (10) outperformed state-of-the-art ILP-based systems Aleph and Gleaner (Goadrich etal., 2004) in such information extraction task, and required about three orders of magnitude less training time.</Paragraph>
    </Section>
    <Section position="5" start_page="22" end_page="22" type="sub_section">
      <SectionTitle>
6.2 Weighted Decomposition Declarative
Kernels
</SectionTitle>
      <Paragraph position="0"> Declarative kernels can be combined with WDK in a rather straightforward way, thus taking the advantages of both methods. A simple approach is that of using proper parthood in place of selectors, and topology to recover the context of each proper part. A weighted decomposition declarative kernel (WD2K) of this kind would be defined as in Eq. (10) simply adding to the attribute kernel k a context kernel that compares the surrounding of a pair of objects--as defined by the topologyrelation--usingsomeaggregatekernelsuchas null PPK or HIK (see Section 3). Note that such definition extends WDK by adding recursion to the concept of comparison by selector, and DK by adding contextstothekernelbetweenparts. Multiplecontextscanbeeasilyintroducedbyemployingdiffer- null ent notions of topology, provided they are consistent with mereotopological axioms. As an example, if objects are words in a textual document, we can define l-connection as the relation for which two words are l-connected if there are consequential within the text with at most l words in between, and obtain growingly large contexts by increasing l. Moreover, an extended representation of words, as the one employing WordNet semantic information, could be easily plugged in by including a kernel for synsets such as that in Section 5.1 into the kernel k on word attributes. Additional relations could be easily formalized in order to exploit specific linguisitc knowledge: a causal relation would allow to distinguish between preceding and following context so to take into consideration word order; an underlap relation, associating two objects being parts of the same super-object (i.e.</Paragraph>
      <Paragraph position="1"> pre-terminalsdominatedbythesamenon-terminal node), would be able to express commanding notions. null The promising results obtained with declarative kernels (where only very simple lexical information was used) together with the declarative ease to integrate arbitrary kernels on specific parts are all encouraging signs that boost our confidence in this line of research.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML