File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-3085_metho.xml

Size: 5,896 bytes

Last Modified: 2025-10-06 14:12:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-3085">
  <Title>Lexicon I Template Filter Words ~1 ndexes Textual Stories \[Parser i I I %c~Ttl to.co, i Structures ~ I~ Terms An Architectural Overview of NAS</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ARCHITECTURAL OVERVIEW
</SectionTitle>
    <Paragraph position="0"> Tile architecture of NAS is modular consisting of several main subsystems, viz., a set of preprocessors and template filters, a parser, a lexicon, a semantic interpreter, a set of concept bases, and an indexer (Figure 1). The system is transportable in that it can be interfaced to different news streams and indexing vocabularies. null A preprocessor receives a news stream which can be flom a satellite dish link, a direct line, or a text file, and identifies the beginning and ending of stories in addition to their titles, sentences, and words. Since the format of each news feed, e.g., Reuters or Kyodo, is distinct from the others, a single preprocessor will accept only one news feed. For rigidly-formatted articles that are numerical or non-texttlal in form, a template filter, which is an indexing component of low-level routines, categorizes them from lhe title while deriving specifics l'rom the body of tile story.</Paragraph>
    <Paragraph position="1">  indexed by the ternplate filter. The system outputs the company name with a descriptor &amp;quot;3rd Quarter Earnings,&amp;quot; as well as the current and cumulative earnings or losses.</Paragraph>
    <Paragraph position="2">  This structure denotes that a product introduction is one where a company introduces (or, synonymously, releases) a product. In short, it is a list of typed nodes. A report will be characterized as a product introduction story if it contains a sentence some of whose grammatical components (e.g., agent, predicate, theme) can be associated to the corresponding nodes of{ 2), Suppose, l:or example, a news item reports  {3) Alpha Corp said it plans to release a new  In contrast, textual stories require grammatical processing and these are sent to the parser and semantic interpreter. The parser which relies on the principles of Government-Binding (GB) Theory (/Chomsky 1981/3, outputs predicate-argument structure of each sentence of a sto W.a In doing so, the parser identi ties empty categories, viz., PROs, traces, and variables, and thematic relations, and resolves antecedent and anaphor and pronominal bindings. It should be noted that the parser is interfaced to a lexicon of over 17,000 items that was developed by analyzing strings (words) from a newswim. The size of the lexicon is sufficient for news processing.</Paragraph>
    <Paragraph position="3"> The semantic interpreter maps the grammatical structures onto conceptual representations or filters stored in a concept base. For instance, a representation for &amp;quot;Product Introduction&amp;quot; is The parser binds the pronominal it, the agent of plans, to A~)ha Cor~tx, the subject or agent of the matrix clause. The parser also detects an empty category, viz., PRO, in the embedded sentence (proposition) with release as the verb and binds the pronominal to it.</Paragraph>
    <Paragraph position="4"> Since bound arguments share the same semantic features, the semantic interpreter determines that the agent of release in (4) is of type COMPANY. In other words, PRO inherits the property of COMPANY from the agent of the matrix sentence via the intermediate pronominal k. It also determines that the predicate release is synonymous with introduce and the theme workstation is a product. With the arguments typed and membership of the predicate within a synonym class known, the semantic processor can match the corresponding nodes of the most deeply embedded clause of (4) with (2), and thus determines that the sentence is about a product introduction. Associated with each conceptual filter is a set of indexing procedures that are invoked 398 2 by the indexing mechanism when a conceptual filter is satisfied. These thnctions are integrated with databases containing the indexing vocabulary and they identify specific information about a story including company, personal, and product names, and descriptors indicating specific relationships, Figure 3 illustrates the corporate and personal name identification capabilities and the level of &amp;quot;understanding&amp;quot; as reflected by the subheadings. null</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
BENCHMARKS
</SectionTitle>
    <Paragraph position="0"> Formal benchnmrks have been established based on news from Reuters. On a Symbolics 3640, NAS can process entire days of news (500-600 storiesMay) in 35-40 minutes and can assign indexes to approximately 75% of the stories. (The phumed goal of at least 85% coverage is certainly achievable.) Accuracy was .judged exlremely high by a group of independent indexers and editors. Quality cotdd not be judged quantitatively clue to the complexity artd subjectivity of the indexing terms and procedures. null</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FUTURE DIRECTIONS
</SectionTitle>
    <Paragraph position="0"> In addition to continual extensions to the various colnponents of the system, a design for an interface of NAS to deductive databases has begun. The development of this extension would enable databases to be generated automatically with indexes being stored as logical relations, thereby, permitting retrieval or alerting capabilities based on explicit as well as implicit c,r inferred information.</Paragraph>
    <Paragraph position="1"> A NOTE ON THE IMPLEMENTATION NAS was developed in ZetaLisp on Symbolics workstations. It has been converted to Common Lisp and runs on Maclvory and Macintosh computers.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML