XML Viewer - c04-1203

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1203_intro.xml
Size: 14,011 bytes
Last Modified: 2025-10-06 14:02:15
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1203">
  <Title>A Natural Language Processing Infrastructure for Turkish</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Applications Based on TOY
</SectionTitle>
    <Paragraph position="0"> In this section, we will present some applications that were developed using the TOY infrastructure.</Paragraph>
    <Paragraph position="1"> Each subsection will briefly explain the application, the TOY components used, and the modifications done on the infrastructure.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Conversational agent - TOYagent
</SectionTitle>
      <Paragraph position="0"> Smith (1994) classifies dialogue styles that can be adopted by the computer during human-computer interaction into four modes, depending on the degree of control that the computer has on the dialogue: Directive, suggestive, declarative, and passive.</Paragraph>
      <Paragraph position="1"> TOYagent's original approach mostly suits the passive mode, where the user has complete control, and the computer passively acknowledges user statements, and provides information only as a response to direct user requests.</Paragraph>
      <Paragraph position="2"> TOYagent (Demir 2003) enables users to make on-line additions to the lexicon without the need to know Prolog. When faced with a word that it is unable to parse morphologically, TOYagent engages in a (mostly menu-driven) subdialogue with the user to identify the root, category, and morphophonemic properties of the word, and adds the appropriate entries to the lexicon. The meanings of these new words can be incorporated to the system by the logic program synthesis facility, which enables the user to provide natural language descriptions for new predicates in terms of existing predicates. These descriptions are automatically converted to Prolog clauses and added to the knowledge base of the program for future use.</Paragraph>
      <Paragraph position="3"> The original dialogue algorithm embedded in TOYagent can be summarized as follows:  1. Read a sentence (this may cause a &amp;quot;word learning&amp;quot; subdialogue if one or more words in the sentence cannot be parsed by the morphological analyzer) 2. Analyze the sentence using the DCG parser, resolving anaphors if necessary. If the syntactic parse is unsuccessful, report this to the user and GOTO 1.</Paragraph>
      <Paragraph position="4"> 3. If the mood is &amp;quot;statement&amp;quot;, then the user is making a declarative statement; use the built-in theorem prover to try to prove the logical formula corresponding to the sentence. There are two possibilities: (In the following, all the &amp;quot;canned&amp;quot; responses are in Turkish, of course.) a. If the formula can be proven using the current contents of the knowledge base, the information contained in the sentence is already there; respond with &amp;quot;Thanks, I know that&amp;quot; b. If Prolog fails to prove the formula with its current knowledge, then negate the formula and try to prove this negation. There are two possibilities: null i. If this new formula can be proven using the current contents of the knowledge base, the infor null mation contained in the sentence is contradictory with what we already know; respond with &amp;quot;I do not think so&amp;quot; ii. If Prolog fails to prove this new formula with its current knowledge, create the necessary discourse and event markers and assert the Prolog clauses representing the input sentence to the knowledge base, responding with &amp;quot;Thanks for the information&amp;quot; 4. If the mood is &amp;quot;yes_no_question&amp;quot;, the user has asked a yes-no question; use the built-in the prover to try to prove the sentence's logical formula.</Paragraph>
      <Paragraph position="5"> There are two possibilities: a. If the formula can be proven using the current contents of the knowledge base, respond with &amp;quot;Yes&amp;quot;  b. If Prolog fails to prove the formula with its current knowledge, then negate the formula and try to prove this negation. There are two possibilities: null i. If this new formula can be proven using the current contents of the knowledge base, respond with &amp;quot;No&amp;quot; ii. If Prolog fails to prove this new formula with its current knowledge, respond with &amp;quot;I do not know.&amp;quot; 5. If the mood is &amp;quot;wh_question&amp;quot;, the user has asked  a wh-question; use the built-in theorem prover on the sentence's logical formula. The associated program of each question word scans the knowledge base and produces the relevant answer. The answer can be printed out directly, or, if required, in the form of a grammatical sentence generated by a procedure that first prepares a new logical formula from the produced knowledge items and then uses the syntax and morphology components to form the statement corresponding to this formula. GOTO 1.</Paragraph>
      <Paragraph position="6"> The following conversation fragments, in which user entries are shown in boldface, illustrate several aspects of TOYagent. (The English translations are not part of TOYagent's input-output, and have been added &amp;quot;manually.&amp;quot;) As an example to &amp;quot;online&amp;quot; learning of lexical entries, we deleted the word &amp;quot;ana&amp;quot; (&amp;quot;mother&amp;quot;) from the lexicon, and carried out the following dialogue with the program: Her ana guzeldir. (Every mother is beautiful) &amp;quot;ana&amp;quot; kelimesini bilmiyorum. (I do not know the word &amp;quot;ana&amp;quot;) Kelimenin koku nedir? (What is the stem of this  word?) 1: a 2: an 3: ana Hangisi: 3.</Paragraph>
      <Paragraph position="7"> (Please enter: 3.) &amp;quot;ana&amp;quot; kelimesinin tipi nedir? (What is the type of the word &amp;quot;ana&amp;quot;?) 1: Cins isim (1: Common noun) 2: Sifat (2: Adjective) 3: Ozel isim(3: Proper noun)  Hangisi: 3.</Paragraph>
      <Paragraph position="8"> (Please enter: 3) Sozlue eklendi. (It has been added to the lexicon) &amp;quot;ana&amp;quot; ne demek? Anlatir misin? (Could you explain the word &amp;quot;ana&amp;quot;?) Cocuu olan bir bayan anadir. (A female who has a child is a mother) Teekkurler, orendim. (Thanks for the information) Since the unknown word could have appeared in an inflected form in the input sentence, TOYagent first asks about the actual stem. A Prolog rule corresponding to the relevant universally quantified logic formula is prepared and asserted for each of the user's sentences read in this dialogue. The last line is the program's response to the original input sentence. null Aye bir anadir. (Aye is a mother) Teekkurler, orendim. (Thanks for the information) Aye guzel midir? (Is Aye beautiful?) Evet. (Yes) Neden? (Why?) Her ana guzeldir. (Every mother is beautiful) Aye anadir. (Aye is a mother) The affirmative answer requires stepping through the implication translated to Prolog during the processing of the original user entry. The question &amp;quot;why?&amp;quot; is answered by translating the Prolog formulas used for the previous answer back to sentence form.</Paragraph>
      <Paragraph position="9"> Definite noun phrases are treated as anaphors: Canan kucuk bir cocuk. (Canan is a small child) Teekkurler, orendim. (Thanks for the information) Cocuk kahvaltida zeytin yedi. (The child ate olives for breakfast) Teekkurler, orendim. (Thanks for the information) Kim zeytin yedi? (Who ate olives?) Canan zeytin yedi. (Canan ate olives) Kac kii kahvaltida zeytin yedi? (How many people ate olives for breakfast?) Bir kii kahvaltida zeytin yedi. (One person ate olives for breakfast) The definite noun phrase in the second user entry (&amp;quot;the child&amp;quot;) is matched to the most recently mentioned child. As mentioned earlier, question words have small Prolog programs corresponding to them. The answer extracted from the knowledge base is presented in the form of a grammatically correct sentence. (The fact that every child is also a person is one of the commonsense items that have been preencoded in the knowledge base.) A rudimentary capability of commonsense reasoning about time is implemented: The &amp;quot;time&amp;quot; argument in verb predicates has a substructure with slots for the beginning and ending points of the interval corresponding to the event. (In the present version, only a small subset of the verbal lexicon entries have their time subslots manually encoded for this purpose.) Hours are used as the unit interval. Kemal kucuk bir cocuk. Butun kucuk cocuklar 10 saat uyurlar. (Kemal is a small child. All small children sleep for 10 hours) Teekkurler, orendim. (Thanks for the information) Kemal saat 23'te uyudu. (Kemal fell asleep at 23 hours) Teekkurler, orendim. (Thanks for the information) Kemal ne zaman uyudu? (When did Kemal fall asleep?) Kemal yirmiucte uyudu. (Kemal fell asleep at twenty three) Kemal ne zaman uyandi? (When did Kemal wake up?) Kemal dokuzda uyandi. (Kemal woke up at nine) Note that the program is able to do the &amp;quot;modulo 24&amp;quot; calculation required for producing the appropriate answer.</Paragraph>
      <Paragraph position="10"> To find pronominal references in the absence of gender information, the semantic network is utilized. In the following excerpt, the pronoun &amp;quot;o&amp;quot; (&amp;quot;he/she/it&amp;quot;) is correctly deduced to correspond to &amp;quot;cay&amp;quot; (&amp;quot;tea&amp;quot;), since the network does not allow &amp;quot;Kemal&amp;quot;, a human name, to be the agent of the word &amp;quot;bit-&amp;quot; (&amp;quot;to be consumed entirely&amp;quot;), which can have only inanimate material at that role.</Paragraph>
      <Paragraph position="11"> Kemal kahvaltida ne icti? (What did Kemal drink for breakfast?) Bilmiyorum. (I do not know) Kemal cay icti ise o bitmitir. (If Kemal drank tea, (he/she/it) must have been consumed entirely) Teekkurler, orendim. (Thanks for the information) Kemal cay icti. (Kemal drank tea) Teekkurler, orendim.(Thanks for the information) Cay bitmi midir? (Has the tea been consumed entirely?) null Evet (Yes) The latest release of TOYagent (Oun 2003) is able to manage conversations with multiple agents, can adapt different &amp;quot;attitudes&amp;quot; about whether to believe what a user says depending on the user's profile, and has the capability of detecting and pointing out inconsistencies among the statements made by different users. This version also supports an optional &amp;quot;inquisitive&amp;quot; dialogue mode, where the computer questions the user about the values of currently empty slots in the verb predicates corresponding to previous user statements.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Turkish Natural Language Interface
For SQL Queries (NALAN-TS)
</SectionTitle>
      <Paragraph position="0"> NALAN-TS (Maden, Demir and Ozcan 2003) is a Turkish natural language query interface for SQL databases, formed of a syntactic parser, semantic analyzer, meaning extractor, SQL constructor and executer. It is a dictionary based application and includes Turkish and database dictionaries.</Paragraph>
      <Paragraph position="1"> Figure 6. NALAN-TS Flow Diagram.</Paragraph>
      <Paragraph position="2"> The shaded modules in Figure 6 were taken completely from the TOY infrastructure, except for a few modifications like the addition of new Turkish syntax rules and a different format for the semantic representation of the words in the dictionary. TOY's knowledge base interface is taken as the basis by NALAN-TS.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Turkish Speaking Assistant -TUSA
</SectionTitle>
      <Paragraph position="0"> TUSA (eker, 2003) is a natural language interface for an online personal calendar. The morphological analyzer/generator of TOY was taken as a basis in this project with modifications made for utilizing.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Generating Java Class Skeleton Using a
</SectionTitle>
      <Paragraph position="0"> Natural Language Interface- TUJA TUJA (Ozcan, eker and Karadeniz 2004) is a natural language interface for generating Java source code and creating an object-oriented semantic network. This program uses TOY's morphological analyzer/generator as the starting point.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.5 Other Applications
</SectionTitle>
      <Paragraph position="0"> Ballhysa (2000) used TOY to produce a prototypical sentence-level translator between Albanian, English, and Turkish. (To our knowledge, this is the first NLP work ever done on Albanian) Dutaaci (2002) used the morphological component to tag a Turkish corpus of nearly ten million words to collect statistics and compared the performance of an N-gram model of speech recognition based on morphemes with those based on words or syllables. Tekeli (2002) made use of the word-level components to build an &amp;quot;ELIZA-like&amp;quot; (Covington 1994) dialogue program which caricaturizes Fatih Terim, a famous soccer coach and an idiosyncratic Turkish speaker.</Paragraph>
      <Paragraph position="1"> The program's &amp;quot;bag of tricks&amp;quot; includes coming up with rhyming responses to user sentences. Bilsel (2000) developed a &amp;quot;poem expert&amp;quot; for analyzing Turkish folk poems for their rhyme and meter properties, a demanding task which is part of the high-school curriculum in Turkey.</Paragraph>
      <Paragraph position="2"> Conclusion Our work on TOY is continuing on many fronts: The DCG component is currently being extended to cover both a bigger subset of Turkish syntax, and some types of agrammatical sentences. We hope that TOY will be useful in the development of many other applications in the near future.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML