File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/n03-2015_concl.xml
Size: 2,118 bytes
Last Modified: 2025-10-06 13:53:29
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-2015"> <Title>Unsupervised Learning of Morphology for English and Inuktitut</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> HubMorph achieves the same performance as Linguistica on the words in Tom Sawyer. It does so with a general technique based on building a hub-automaton.</Paragraph> <Paragraph position="1"> In addition to being simple, HubMorph can be generalized to deal with more complex morphologies.</Paragraph> <Paragraph position="2"> We have applied HubMorph to Inuktitut for dividing such words as ikajuqtaulauqsimajunga (&quot;I was helped in the recent past&quot;, ikajuq-tau-lauq-sima-junga). The path in a hub automaton for most Inuktitut words would have many hubs, because the words have many divisions.</Paragraph> <Paragraph position="3"> Currently, there are many limitations. The search for hubs in the middle of words is very difficult and requires merging nodes to induce new words. This will be necessary because Inuktitut theoretically has billions of words and only a small fraction of them has occurred in our source (the Nunavut, Canada Hansards).</Paragraph> <Paragraph position="4"> Also, because each word has many morphemes, it is difficult to correctly detect the divisions for roots and suffixes. In general, there are no prefixes in Inuktitut, only infixes and suffixes.</Paragraph> <Paragraph position="5"> Finally, there are many dialects of Inuktitut and many spelling variations. In general, the written language is phonetic and the spelling reflects all the variations in speech.</Paragraph> <Paragraph position="6"> When HubMorph performs unsupervised learning of Inuktitut roots, it achieves a precision of 31.8% and a recall of 8.1%. It will be necessary to learn more of the infixes and suffixes to improve these scores.</Paragraph> <Paragraph position="7"> We believe that hub-automata will be the basis of a general solution for IndoEuropean languages as well as for Inuktitut.</Paragraph> </Section> class="xml-element"></Paper>