File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/w05-0503_concl.xml
Size: 3,848 bytes
Last Modified: 2025-10-06 13:54:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0503"> <Title>Using Morphology and Syntax Together in Unsupervised Learning</Title> <Section position="7" start_page="25" end_page="25" type="concl"> <SectionTitle> 6 Results and discussion </SectionTitle> <Paragraph position="0"> Table 1 presents the description length, broken into its component terms (see (3)), for the baseline case and the alternative analyses resulting from our algorithm. The table shows the total description length of the model, as well as the individual terms: the signature term DL(s), the suffix term DL(F), the lexical categories term, DL(P), total morphology, DL(M), and the compressed length of the data, DL(D). We present results for two iterations for four threshold values (th=0.8,1.0,1.2,1.5) using our collapsing algorithm.</Paragraph> <Paragraph position="1"> Table 2 presents th random DL derived from the random collapsing, in a fashion parallel to Table 1. We show the results for only one iteration of random collapsing, since the first iteration already shows a substantial increase in description length.</Paragraph> <Paragraph position="2"> Figure 1 and Figure 2 present graphically the total description length from Tables 1 and 2 respectively. The reader will see that all collapsing of signatures leads to a shortening of the description length of the morphology per se, and an increase in the compressed length of the data. This is an inevitable formal consequence of the MDL-style model used here. The empirical question that we care about is whether the combined description length increases or decreases, and what we find is that when collapsing the signatures in the way that we propose to do, the combined description length decreases, leading us to conclude that this is, overall, a superior linguistic description of the data. On the other hand, when signatures are collapsed randomly, the combined description length increases. This makes sense; randomly decreasing the formal simplicity of the grammatical description should not improve the overall analysis. Only an increase in the formal simplicity of a grammar that is grammatically sensible should have this property. Since our goal is to develop an algorithm that is completely data-driven and can operate in an resulting cases when collapsing signatures randomly. unsupervised fashion, we take this evidence as supporting the appropriateness of our algorithm as a means of collapsing signatures in a grammatically and empirically reasonable way. We conclude that the collapsing of signatures on the basis of similarity of context vectors of signature transforms (in a space consisting of high frequency words and signature transforms) provides us with a useful and significant step towards solving the signature collapsing problem. In the context of the broader project, we will be able to use signature transforms as a more effective means for projecting lexical categories in an unsupervised way.</Paragraph> <Paragraph position="3"> As Table 1 shows, we achieve up to 30% decrease in the number of signatures through our proposed collapse. We are currently exploring ways to increase this value through powers of the adjacency matrix of the signature graph.</Paragraph> <Paragraph position="4"> In other work in progress, we explore the equally important signature purity problem in graph theoretic terms: we split ambiguous signature transforms into separate categories when we can determine that the edges connecting left-context features and right-context features can be resolved into two sets (corresponding to the distinct categories of the transform) whose leftfeatures have no (or little) overlap and whose right features have no (or little) overlap. We employ the notion of minimum cut of a weighted graph to detect this situation.</Paragraph> </Section> class="xml-element"></Paper>