File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/w93-0313_metho.xml
Size: 18,116 bytes
Last Modified: 2025-10-06 14:13:31
<?xml version="1.0" standalone="yes"?> <Paper uid="W93-0313"> <Title>Experiences about Compound Dictionary on Computer Networks</Title> <Section position="3" start_page="0" end_page="113" type="metho"> <SectionTitle> 2. IMPLEMENTATION OVERVIEW </SectionTitle> <Paragraph position="0"> The experimental on-line dictionary system implemented at NTT Research Laboratories is called Avenue. It is used daily by researchers. Most of them are Japanese. Avenue has one central server machine and many client machines. About 3000 researchers are able to access this system. More than 500 people of them have used the system, and it currently averages more than 100 requests per hour during the day. More than two hundred machines have been connected to the server. Ten to twenty machines usually connect to the server during the day. All of the information is stored in the central server.</Paragraph> <Paragraph position="1"> Avenue consists of a Japanese-Japanese dictionary, an English-Japanese dictionary, an information science dictionary, a computer jargon dictionary, an acronym dictionary, and an office telephone directory. We are currently working on adding more dictionaries, such as Japanese-English dictionary, and an English-English dictionary, and a thesaurus.</Paragraph> <Paragraph position="2"> All of the source information is convened into a uniform format. The server reads all of the dictionaries and then builds a combined index. Since this index is in memory, only a few disk operations are required to handle a request. This means that the server responds quickly. There are three ways to access the server: through remote shell command, through Emacs, and through HyperCard. UNIX users usually use the Emacs interface. Macintosh users usually use the HyperCard interface. These interfaces are built as extensions of the existing software.</Paragraph> <Paragraph position="3"> The remote shell provides a simple command line interface. Though it does not provide sophisticated functionality, it does have one important characteristic. It does not require any installation. This feature is very important for gaining new users.</Paragraph> <Paragraph position="4"> The Emacs interface is more sophisticated. By pressing one key, the word at the cursor is selected, a dedicated window appears, and the meanings appear. Mouse action or retyping is not required. It produces outlined text if the word exists in multiple dictionaries. Users can therefore quickly see which dictionaries contain the word. They can then explore the meanings in detail by using familiar Emacs commands. The HyperCard interface provides similar functions.</Paragraph> </Section> <Section position="4" start_page="113" end_page="115" type="metho"> <SectionTitle> 3. EXAMPLES </SectionTitle> <Paragraph position="0"> Avenue simultaneously presents information from various dictionaries. The example in Fig. 1 shows the information provided for &quot;abc.&quot; The underlined part is user input. The first list contains words that begin with ABC. &quot;ABCL/I&quot; is the name of a programming language and is an entry in the information science dictionary. &quot;ABC_Powers&quot; is an entry in the English-Japanese dictionary. &quot;ABC%IJ~&quot; is an entry in the Japanese-Japanese dictionary. Japanese sometimes uses English letters for imported words.</Paragraph> <Paragraph position="1"> word: abe ABCL/I; ABC_Powers; ABC$1J~Jt; -- eiwa: ABC ABC, A. B.C. American Bowling Congress; American Broadcasting Company; Australian Broadcasting Commission..</Paragraph> <Paragraph position="2"> --- eiwa: ABC ABC \[e'ibi':si':\] (~,,~ ABC' s, ABCs)n. 1.~)k'77.&quot;~'7 \]-. 2. ~. -- kojien: ABC :r_-- E'--5,- \[ABC\] (Argent i ne) -'7&quot;~)~ (Br az i l) .~') (Ch i 1 e) ~o -- acron: ABC ABC - American Broadcasting Company word: Fig.1. Example for &quot;abc&quot;.</Paragraph> <Paragraph position="3"> In this example, three dictionaries have an entry for the word &quot;ABC.&quot; They are English-Japanese (eiwa), Japanese-Japanese (kojien), and Acronym (acron) dictionaries. This entry is in the Japanese-Japanese dictionar 3, because, as in English, Japanese people use it to denote &quot;the first step&quot;. They also sometimes use it to denote Argentine, Brazil, and Chile. The latter description does not appear in English dictionaries.</Paragraph> <Paragraph position="4"> Another example is shown in Fig. 2. This example shows an effect derived from a field specific dictionary. When we enter the word &quot;amoeba,&quot; Avenue shows that it is also a computer jargon. This jargon dictionary has 1532 entrys of words. Though this number is relatively small, combination with other dictionaries is useful. It is useful to know that some words have specialized meanings. Since the English-Japanese dictionary and the special dictionary have the same interface, users can obtain various kinds of information in a uniform manner. word: amoeba amoeba ; ameba; --- kojien: amoeba _7.~--\]~ \[arno e b a\] 7&quot; --- computer jargon: amoeba amoeba :/@-mee'b@/n. Humorous term for the Commodore Amiga personal computer. --- eiwa: amoeba a * moe * ba \[_mi':b_\] (:~j~, -bae \[-bi:\] , -bas) n. 7 ~ --\]'~, f.~:~,,.~. word: &quot;7' J --,'~ 7~ -~,~-~ ~ tO; 7~ -~'~5 ,L ~+5 ; --- kojien: '7' ~ --\]~&quot; _T.J.-\]'~'_ \[amo e b a.a me b a\] 7&quot; -\]% --- waei: 7 ~ --\]~</Paragraph> <Paragraph position="6"> The second word entry in Fig.2 is &quot;'7' ~ -- / v', which is how amoeba is written in Japanese. It appears in the English-Japanese dictionary (eiwa). It is a loanword and is read &quot;arne:ba.&quot; Japanese understand that this word comes from foreign language since it is written in the character set used for Ioanwords. However, this word conveys no information about the small creature but the pronunciation. Since there are many ioanwords in Japanese, we have to consult a Japanese-Japanese dictionary (kojien) to get the detailed information. The Japanese-Japanese dictionary shows that it has only one cell, and is less than 0.2 mm in size.</Paragraph> <Paragraph position="7"> In the example shown in Fig.3, Avenue responds to the words &quot;Albert,&quot; &quot;Einstein,&quot; and &quot;Albert Einstein&quot; and presents information about Albert Einstein. The user gets information for both the given name and the family name. It is a result of the combination since one dictionary has only the word, &quot;Albert&quot; and another dictionary has only the word, &quot;Einstein.&quot; Fig.3. Example for Einstein, Albert and Albert Einstein.</Paragraph> <Paragraph position="8"> Avenue is more likely to find a word than a single dictionary. Users find this to be very important; they seem to feel as if it contains complete information. They use Avenue even if they are not sure whether it contains the word or not.</Paragraph> </Section> <Section position="5" start_page="115" end_page="116" type="metho"> <SectionTitle> 4. USER INTERFACES </SectionTitle> <Paragraph position="0"> The user interface is a key element to gain users. It is difficult to determine what kind of interface is good for users. Some clear policies are necessary to design user interface. We attempted not to change the way users use the computer. We therefore use several existing systems for the user interface: the Rsh command, Emacs, and HyperCard.</Paragraph> <Paragraph position="1"> The Rsh command is an existing UNIX command, as we have already explained. Since it is a standard network command, no installation is required and documentation is always available.</Paragraph> <Paragraph position="2"> A user only has to know the server's name to start using our system.</Paragraph> <Paragraph position="3"> Furthermore, users can combine the Rsh command with other commands in the standard manner. Since information goes through standard input and standard output, users can easily write additional programs in order to format the output. If the service were to ask the user to logon to another computer, this additional programming would become more troublesome.</Paragraph> <Paragraph position="4"> After using the &quot;Rsh&quot; command for a while, most users find retyping the word cumbersome and become annoyed with excessive output. When the output does not fit onto one screen, the user has to suspend output in order to read all of it. Using Emacs program and a HypcrCard stack overcome these problems, that is, retyping and excessive output.</Paragraph> <Paragraph position="5"> The Emacs program picks up the word at the cursor. At any time, one key stroke will initiate dictionary access for that word. The HyperCard stack picks up a selected region. In both cases there is no need for typing.</Paragraph> <Paragraph position="6"> The Emacs program displays the information in outline mode. If the output is long. the detailed information is hidden by the interface program. Users can explore the hidden parts after they scan all of the headings. The HyperCard stack has a dictionary preference list and cursor movement buttons. Users can arrange the order of the dictionaries and even ignore some of the dictionaries. They can also go backward and forward, dictionary by dictionary by using a dedicated button on the stack.</Paragraph> </Section> <Section position="6" start_page="116" end_page="116" type="metho"> <SectionTitle> 5. RECORDING USER BEHAVIOR </SectionTitle> <Paragraph position="0"> It is important for an information system to record each user's behavior; who uses what.</Paragraph> <Paragraph position="1"> Dedicated interface programs usually solve this problem. However, they introduce another problem: installation. Our observations show that users tend to continue using printed dictionaries if software installation is required at the user's site. A special trick is needed to record user names and their requests when users will not install related software.</Paragraph> <Paragraph position="2"> UNIX has a standard command called &quot;Rsh&quot; or &quot;Remsh.&quot; It executes commands at a remote machine. This command sends the user's name when it requests a job to be run on another machine. Rsh's protocol is designed so that the regular user cannot disguise himself as another user, even if he builds his own network programs. The problem with &quot;Rsh&quot; is that it requires strict registration in order to ensure system security. If a new user should register himself before using Avenue, he would refrain from using Avenue.</Paragraph> <Paragraph position="3"> The problem of installation and registration was solved by creating a modified server. After our modification, it responds to everyone, but limits the commands that can be run. Since &quot;Rsh&quot; provides user identification, it is easy for the modified server to record who uses what. There is no installation or modification at the user site. Only the central server has a special program.</Paragraph> <Paragraph position="4"> From the user's point of view, new machines and new users can access the dictionaries without registering by using this method. The only thing a user has to know is the name of the server machine. From the operator's point of view, he will have a record of user behavior without installation or registration.</Paragraph> </Section> <Section position="7" start_page="116" end_page="117" type="metho"> <SectionTitle> 6. PROBLEMS FOUND FROM ACCESS RECORD </SectionTitle> <Paragraph position="0"> We analyzed Avenue's access record in order to find various problems. We assume that a user has encountered a problem when he makes several requests within a short time period, that is, several minutes. We therefore picked up those places where a user repeatedly accessed Avenue.</Paragraph> <Paragraph position="1"> We then entered the same words so that we could see what the user actually got. We thus identified five common problems.</Paragraph> <Paragraph position="2"> Problem (1): The user needs a variation of the given word.</Paragraph> <Paragraph position="3"> If the word is a headword, it will be in the candidate list from Avenue. If it is not a headword, he must guess the spelling. Sometimes he enters Japanese word to get some hints.</Paragraph> <Paragraph position="4"> Problem (2): The user needs an idiom.</Paragraph> <Paragraph position="5"> Idioms usually appear among the definitions, not as headwords. It is difficult to find the correct headword for a given idiom. Furthermore, the dictionary may be inconsistent. For example, &quot; ~to&quot; may he used in one place, while &quot;'-- to,&quot; which has a space, is used in another place. Problem (3): The user needs an example.</Paragraph> <Paragraph position="6"> After finding an English word in the Japanese-English dictionary, a user freque.ntly consults the English-English dictionary to get an example. When the entry does not contmn an example sentence, he sometimes starts entering relatively simple words, hoping to find some examples. When this word is a relatively rare word, this search for examples happens more frequently. Problem (4): The user cannot enter the character Japanese characters are hard to read and harder to enter since Japanese uses thousands of Chinese characters (Kanji) and many other characters. It often happens that a user can understand the meaning of a character, but cannot pronounce it. Unless the character can be pronounced, it is very hard to input the character into the computer. Users sometimes enter words that have some relation in meaning in order to obtain the character. Once it is obtained, he enters the desired word using cut and paste.</Paragraph> <Paragraph position="7"> Problem (5): The user is not sure of the correct spelling of the word When the spelling is uncertain, the user will often enter words that have similar spelling. If the correct one is not found, Japanese words are often entered.</Paragraph> <Paragraph position="8"> Problems (1), (2), and (3) indicated the need for additional dictionaries: a thesaurus, an idiom dictionary, and a corpus of English. It is important that this fact comes from the actual record of usage.</Paragraph> <Paragraph position="9"> Problem (4) reflects a problem in handling Japanese. Currently, Japanese characters are converted from pronunciation to characters when they are input to a computer. If the character cannot be pronounced, it is very hard to enter the character. Though this is an apparent problem, we had failed to recognize it. This is because we have taken it for granted unconsciously. Though this problem is not specific to Avenue, it is important to know that our user actually have this problem.</Paragraph> <Paragraph position="10"> Problem (5) means that users sometimes fail to specify what they want to know. This is a common problem in information retneval. Though we do not have a good idea for overcoming it, we can recognize it based on actual usage.</Paragraph> <Paragraph position="11"> Several problems have been identified by focusing on repetitive access from one user. It is important for us to be aware of the problems so that we can improve the system. Some problems are due to the lack of certain dictionaries. We have thus identified a specific improvement that needs to be made.</Paragraph> </Section> <Section position="8" start_page="117" end_page="118" type="metho"> <SectionTitle> 7. COOPERATION WITH PUBLISHERS </SectionTitle> <Paragraph position="0"> Cooperation with publishers is essential in operating network dictionary systems. Since these systems and printed dictionaries are in a competitive relation, cooperation is a rather subtle issue. Luckily, publishers are searching for new ways of publishing. For example, they are initiating CD-ROM publication. Network systems are another future publication form.</Paragraph> <Paragraph position="1"> The recording mechanism is a key to making cooperation possible. With it, the users can be identified, along with their number of uses. It also provides valuable information to publishers to revise their dictionary. For example, the record shows which words may be candidates for addition. Publishers have agreed to provide their information to us in return for a fee and a complete record of user activity.</Paragraph> <Paragraph position="2"> 8. OTHER PROBLEMS AND FUTURE WORK One problem for future work is that, though many entries may appear for one word, each may have different format. This sometimes makes the information hard to read. Since the information originally comes from printed dictionaries, there is some variance in format. It is rather difficult to reformat all of them. Although the Avenue interface has a mechanism to add a formatting program for each dictionary, it is troublesome to write such a program for all dictionaries.</Paragraph> <Paragraph position="3"> Furthermore, it is hard to write program that will produce a clean and neat formal Another problem is that too much information may sometimes be given for one word. If it does not fit one screen, it is difficult to find the needed information. Although the interface has outline control and cursor movement control, which reduce the trouble, it will become a more severe problem as the number of information sources increases.</Paragraph> <Paragraph position="4"> Dictionary preference is another technical issue. Users will prefer different sets of dictionaries, depending on their speciality. Computer engineers and linguists consult different dictionaries. Furthermore, A person's preference may change over time as their interests change; He may be computer engineer one time and linguist another time. Avenue currently provides only one-dimensional list of dictionaries. If there are many information sources, a one-dimensional list may be too limited for many users. A more flexible and powerful mechanism is needed to specify the relations among dictionaries.</Paragraph> </Section> class="xml-element"></Paper>