File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-1513_metho.xml
Size: 6,129 bytes
Last Modified: 2025-10-06 14:14:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1513"> <Title>Hdrug. A Flexible and Extendible Development Environment for Natural Language Processing.</Title> <Section position="5" start_page="94" end_page="95" type="metho"> <SectionTitle> * Type Signature </SectionTitle> <Paragraph position="0"> For example, the subcat_principle/3 relation is displayed as in Figure 3.</Paragraph> <Paragraph position="1"> sets. In this example a left-corner (lc_mixtus) parser, a head-corner (hc9_mixtus) parser, an inactive chart parser (inact_p) and a bottom-up active chart parser (bu) were compared on a test-set of 5000 word graphs. Timings are in milliseconds and the input size is the number of transitions in the word graph. Note that in this example the parsers only parse the best path through the word graph. The left-corner and head-corner parsers perform this task much faster than the other two: average CPU-times are up to 500 milliseconds, whereas the chart-based parsers require up to 8000 milliseconds on average.</Paragraph> </Section> <Section position="6" start_page="95" end_page="96" type="metho"> <SectionTitle> 4 OVIS </SectionTitle> <Paragraph position="0"> The NWO Priority Programme Language and Speech Technology is a research programme aiming at the development of spoken language information systems. Its immediate goal is to develop a demonstrator of a public transport information system, which operates over ordinary telephone lines.</Paragraph> <Paragraph position="1"> This demonstrator is called OVIS, Openbaar Vervoer Informatie Systeem (Public Transport Information System). The language of the system is Dutch. Refer to (Bores et al., 1995; van Noord et al., 1996) for further information of this Programme.</Paragraph> <Paragraph position="2"> The natural language understanding component of OVIS analyses the output of the speech recognizer (a word graph) and passes this analysis to the dialogue manager (as an update expression). Word graphs are weighted acyclic finite-state automata which represent in a compact format the hypotheses of a speech recognizer. Each path through the word graph is a possible analysis of the user utterance; weights indicate the confidence of the speech recognizer.</Paragraph> <Paragraph position="3"> The relation between such word graphs and update expressions is defined by means of a Definite Clause Grammar of Dutch. This DCG and a number of parsers have been developed with the Hdrug system. The functionality of Hdrug has been used to compare the different parsers with respect to efficiency on sets of sentences and word graphs. For example, upon loading a specific set of such word graphs, the system can be asked to parse each of the word graphs with a specified subset of the available parsers, and to display information concerning parse corner parser hc, a left-corner parser lc, an inactive chart parser, an active chart parser, a bottom-up Earley parser bu-earley and an LR parser lr_cyk. Note that in this example the parsers parse all paths through the word graph. For this particular test-set the head-corner parser performs best. As can be seen in the graph it treats 96% of the input word-graphs within 200 milliseconds. times and memory usage for each of those parsers.</Paragraph> <Paragraph position="4"> For example, figure 5 is the result of a test run of 5000 word graphs for four different parsers. For slower parsers it is useful to implement a time-out to make sure that test sets can be treated within a reasonable amount of time. In such cases mean cputime does not make sense; therefore, it is also possible to obtain a graph in which the percentage of inputs that can be completed within a certain amount of cputime is displayed. This is supported in Hdrug as well; an example is given in figure 6. Similar support is provided for the analysis of a given test-set of sentences with respect to input size and with respect to the number of readings assigned.</Paragraph> <Paragraph position="5"> The functionality of Hdrug has been extended in various ways for the OVIS application. For example, a procedure has been implemented which can be used to generate random sentences, as a means to find errors in the grammar. The menu bar is extended with a new menu-button which provides an interface to this new feature. Incorporating such new features in the user interface is very straightforward. null Furthermore, similar to the VIEW menu of Ale it is also possible to obtain visualisation of datastructures such as lexical entries and grammar rules. This menu also provides an interface for the visualisation of word graphs by piping these word graphs to either the VCG (Sander, 1995) or dotty (Koutsofios and North, 1994) graph drawing tools.</Paragraph> <Paragraph position="6"> Apart from adding new menu buttons it is also easy to add items to existing pull-down menus. For example, in OVIS we are not only interested in the speed of the parser, but also in the accuracy. A component has been implemented which measures word accuracy, sentence accuracy and concept accuracy (by comparing the results of analysis with a given annotation). This functionality is available through a number of new items on the TEST-SUITE menu.</Paragraph> <Paragraph position="7"> If a test suite has been loaded, then we can use this component to measure word accuracy and sentence accuracy of a number of difference analysis methods. Information is displayed in a window which is updated every now and then (the interval can be set by the user). Such an information window looks as in figure 7.</Paragraph> <Paragraph position="8"> be defined by means of a TCL script. The integration of such extensions with the Hdrug user interface is trivial.</Paragraph> </Section> <Section position="7" start_page="96" end_page="96" type="metho"> <SectionTitle> 5 Final remarks </SectionTitle> <Paragraph position="0"> The main characteristics of Hdrug are its extendability and flexibility. We believe that. if such systems are useful for computational linguists, then these two criteria are of extreme importance.</Paragraph> </Section> class="xml-element"></Paper>