File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1057_metho.xml
Size: 17,385 bytes
Last Modified: 2025-10-06 14:11:49
<?xml version="1.0" standalone="yes"?> <Paper uid="C86-1057"> <Title>Text Analysis and Knowledge Extraction</Title> <Section position="1" start_page="0" end_page="242" type="metho"> <SectionTitle> i. Introduction </SectionTitle> <Paragraph position="0"> The study of text understanding and knowlegde extraction has been actively done by many researchers. The authors also studied a method of structured information extraction from texts without a global text analysis. The method is available for a comparatively sbort text such as a patent claim clause and an abstract of a technical paper.</Paragraph> <Paragraph position="1"> This paper describes tile outline of a method of knowledge extraction from a longer text which needs a global tex analysis. The kinds of texts ~e expository texts ~) or explanation texts-'.</Paragraph> <Paragraph position="2"> Expository texts described here mean those which have various hierarchical headings such as a title, a heading of each section and sometimes an abstract.</Paragraph> <Paragraph position="3"> In this deEinJtion, most of texts, including technical papers reports and newspapers, are expository. Texts of this kind disclose the main knowledge in a top-down manner and show not only the location of an attribute value in a text but also severn\[ key points of the content. This property of expository texts contrasts with that of novels and stories in which an unexpected development of the plot is preferred.</Paragraph> <Paragraph position="4"> This paper pays attention to such characteristics of expository texts and describes a method of anal yzing texts by referring to information contained in the intersentential relations and the headings of texts and then extracting requested knowledge such as a summary from texts in an efficient way.</Paragraph> <Paragraph position="5"> 2. Analysis of intersententia\] relations Tile global sentential analysis is performed by using the information contaJ ned in the intersentential relations and the headings of a text by a method combining both the bottom-up and the top-down manner. Various kinds of intersentential relations\]\]~ve been proposed so far by many linguists &quot;--. By referring to these proposals, intersentential relations are class\] lied tentatively into about 8 items. They are a detail, an additional, a parallel, a rephrase, an example, a temporal succession, a cansal and a reasoning relation as described in the following subsections.</Paragraph> <Paragraph position="6"> Detail relations If a term t 2 is the topic term J n a sentence S 2 and i\[ I: is a complementary term of the topLc term t\] in the preceding sentence $I as shown in Expr.(1), S,. is called the detail of ~1&quot; S * (PRE~' p , K * t., K,~&quot; t~, K &quot;t .) \]&quot; ' 1 l\]&quot; =t Iz&quot; z rl &quot; rl $5: (PRED: p~, K,,,: t., K ~: t ,,) (I) Z Z Z\] ~Z rz rZ S &quot; 3&quot; &quot;''''~'''&quot; where K:t represents a pair of a ease label and a term, and the term w:ith a double underline denotes a topic.</Paragraph> <Paragraph position="7"> The sentence level of S I to that of S 2 depends on the property of the sentence S 3 following to S 2 and the relation among the terms contained in the sentences S 1 S 2 and S.. If the sentence S 3 is connected to S 1 more closely than $2, for example, if the sentence S 3 has the topic term tl of the sentence S\] as the topic, it is considered that the principal sentence is S and the sentence level of S~ is lower than that of ~..</Paragraph> <Paragraph position="8"> 0 z &quot; t n the other hand, if S I is an introductory sentence of a term t 2 and the articles related to t 2 are described in some sentences following to $I, or I :if t~ is the ~,lobal topic of the section, the z sentence S is considered the principal sentence.</Paragraph> <Paragraph position="9"> \]'he global 2 topic can be easily identified by inspecting the headings of the section the title and the like, whatever it :i s an attribute name or an attribute value without reading through the whole text.</Paragraph> <Paragraph position="10"> If the term t 2 in the sentence S. belongs to a kind of pronouns such as &quot;in the following ones&quot; or &quot;as follows&quot;, the sentence S J s set at the same 2 level as that of $I. At the summarization stage, the system tries to s~orten the part consisting of S 1 and S^ by replacing the pronoun t~ in S. by the main g z I content given :in S 2, namely, the main part consisting of t . and p .</Paragraph> <Paragraph position="11"> \[Example I\] r2 2 (a) S\]: SGS receives an ordered triple from a user. $2: Tile triple's form is category, input-frames, conditions on the sentence.</Paragraph> <Paragraph position="12"> $3: SGS regards tile ordered triple as a goal.</Paragraph> <Paragraph position="13"> S O describes; the content of a term &quot;ordered triplg&quot; in S\] , and S~ has tile topic term &quot;SGS&quot; in S Heine S is the ~e-a\[l of S ,and S is the i: . . ' 2 &quot; &quot; &quot; \] 1 &quot; pr:l ncl.paPS sengence. (b) S\]: In th:is section, the overview of LFG is described.</Paragraph> <Paragraph position="14"> $2: LFG is an extension of context free grammar an(\] has the following two structures.</Paragraph> <Paragraph position="15"> $3: One is a c-structure which represents the surface word and phrase configurations, and tile other is a f-structure ......</Paragraph> <Paragraph position="16"> S\] is an introductory sentence of a term &quot;LFG&quot; which Js the global topic in a section taken from a text. S has a kind of pronoun &quot;tile following two 2 . &quot; structures whose contents are described Jn S 3.</Paragraph> <Paragraph position="17"> Hence, S is tile principal sentence and tile sentence 2 .... level of S_ is the same as that of S^ z &quot; &quot; As a sl)ecial case of detail relations, there are a rephrase relation and an example relation. These intersententJal relations between sentences S t and S 2 can be identified by referring to their sentent:ial constructions and sentence modifying adverbs such as &quot;in other words&quot; and &quot;for example&quot; . The principal sentence of them is, in most cases, the sentence S 1 Jn an expository text.</Paragraph> <Paragraph position="18"> Additional relations If the current sentence has the same sentential topic t. as that of the preceding \].</Paragraph> <Paragraph position="19"> sentences and describes another attributes or functions of the topic, the current sentence is called an additional sentence to the preceding sentences. The sentential form of the relation is</Paragraph> <Paragraph position="21"> generally assumed to be the same except-for the case that the global topic is put in a predicate part of them. It can be also considered that additional relations hold among various sentential groups of the same \].eve\]. such as chapters sections or paragraphs under a global topic contained in a title.</Paragraph> <Paragraph position="22"> 3PS~ Other sententJal relations There are other intersentential relations. They are roughly classified into a serial and a concurrent or an extended parallel relation.</Paragraph> <Paragraph position="23"> A serial relation such as a temporal succession a caasal or a reasoning relation has tile same physical locatioa of focus or the same logical object while it has a time shift or a logical inference step shift between adjacent sententia\] groups.</Paragraph> <Paragraph position="24"> A concurrent relation has the same t:i.me instant of the event occurrences or the same stages of logical inference while Jt has a distance or a spatial positional shift hetween the physical or tile logical objects described in the adjacent sentential groul)s.</Paragraph> <Paragraph position="25"> The level number of a sentence to the adjacent sentential groups in these relations is assigned ill a similar way to that of the detail or the additional relation by referring to the inter-sentential relations and the global topics. In usual cases, the difference between a principal sentence level and the adjacent sentence level is usually set within one level.</Paragraph> <Paragraph position="26"> As seen in the above, a sentence or a sentential group has an intersentential relation to some adjacent sentences or sentential groups. The intersentential relation between adjacent sentences is sinlilar to a relation between adjacent words or word groups combined through rewriting rules of a sentence. The intersentential relations are classified into two classes. One of them Js a relation such as a detail relation which holds between a principal sentence and the auxiliary or modifying sententJal group with a lower level than the principal sentence as shown :in Fig.\].(a). The other is a juxtaposition relation like an additional relation which holds among several coherent sentences with the same level in usual as shown in FJg.l(b).</Paragraph> <Paragraph position="27"> n 1 n 2 n 1 n 2 ........ n m In these diagrams a leaf node represents a sentence of a text and an intermediate node denotes a representative sentence of the direcL descendents or the principal parts of them. A name r attached to an arc bridging over several branches denotes an intersentential relation.</Paragraph> <Paragraph position="28"> 3__~.Textgnalysis An expository text has a title and consists of several sections. The title shows tile main topics of the text. The heading of each section shows local topics of each section and constitutes the attributes of the main top:ics.</Paragraph> <Paragraph position="29"> Each of maJ n sections sometimes has ~ln introductory remark followed by the main part. The content of tile main part is almost covered with the subframe predetermined by tile heading and the title.</Paragraph> <Paragraph position="30"> The global cohesion of a section is assured by a relation J n which each maia part of the section shares some items of Lhe same subframe with other main parts.</Paragraph> <Paragraph position="31"> Based on the above idea of text construction, a text anelysJ.s Js (lone after parsing of each sentence. First, each pronoun is replaced by the antecedent noun word with tile aids of an anaphora analysis. Thea, tile interlnediate expressJ on of each sentence of the text is transformed into the normal form in which each topic term J s :inherJ ted together with a double under\] ined nlark. The expressions to be nornla\]ized are object-apposition expressions ~ obj ec.t-conlponent expressions predicate-cause expressi.ons, expressions which, have ~) a term consisting of a case label, and others . After normalization, the part of top:ics and the content of each sentence are first: identifieddeg Second, intersentent i a\] relatJ ons between two adjacent sentences are identified JndetermJrlistJcally based on the assumptions of two classes of intersentential relations inentJoned in section 2. Third , tile ma:ia sentence is identified by referring to tile intersentential relations and the heading o\[ the section under tile main topics of tile title. The lower \]eve\] sentence :is indented as a modifier of the main sentence. Sometimes, tile know\].edge of the specific field :is required for better understanding of the relations among main sententia\] groups and various headings of the text. A case :frame of a knowledge base for the specific field is provided in which each slot is filled with the most general term Jn the specific field. Fourth, a subframe name is prefixed to each nlail\] sent_ential group by referring to the category of the main predicate term of tile main sentence and the subframe designated by the heading of the section and the title of the text. The basic subframe names are, for example, FUNCTION, COMPOSITION and PROPERTY in descript:ion of actions and physical objects.</Paragraph> <Paragraph position="32"> As seen in the above, the main work of the text analysis is to :identify the main senl:entJal groups and to assign to thenl a staadard attribute name of a subirame in a specified field. These frames and attribute names are used as a key of a specific field for efficJ ently storing and retrieving the knowledge contained in texts.</Paragraph> <Paragraph position="33"> The next example of text analysis J s taken from a technical paper Jn language processing.</Paragraph> <Paragraph position="34"> \[Example 2\] Titie: A natural language understanding system for data management Heading of Section: Generating English sentences Heading of Subsection: The selector (l)The selector's inaia job is to construct a graph relevant to the input statement. (2)In constructing this graph the selector first copies the portion of the semantic net which :ks to be output. (3)It then uses inverse mapping functions to produce a more surface, but still case grammar based representation of the information to be output. (4) Inverse mapping functions map the mameric representation for (late to a more surface one, (5)The selector constructs inoda\]:i.ty I.ists next and chooses a surface ordering rule(SOR) for each verb o17 the resulting structure. (6)SORs spec:ify tile order of the syntactic cases associated te a particular verb to be output.</Paragraph> <Paragraph position="35"> \[U t\]l(~ above text: the :i ntersentent i a\] re\] ations and the leveJs ol7 sentences are \[denti17Jed, snd tile label of a subfranle is prefixed I:o each senteuce as shown :in Fig.2(a) aud (hi.</Paragraph> <Paragraph position="36"> FJg.2(b) '\]'he conlpositJon of the text A symbo:l &quot;&quot;'&quot; denotes a term prefixed te tile_' subfranle conta\[n:ing the marl( ,,C/c-&quot; and modif led by the sub\] rame.</Paragraph> <Paragraph position="37"> /4. ('.eFleratJotl of answeri.ug, selllierlces for PS1ueries \]n this sectJon~ sentollce generaL:ion or text geaeration for answer:lag a request :is described br:ief\]y. Text geueration ks the inverse process of text aua\]ysis and :is inseparable from text. analysts :ill a sense that the text generation provides an basic idea on text construction for g, ivell iuformaLion to be represented. A given query is parsed and t:he i.ntermedia Le expression is cons tructed. Then t:he requJ red i n \[orma tJ on i s retr:ieved and transformed \]\[1to LI surface express:ion in the \]el \[owdng steps: (\]) The interlnediake exl)ressJons related Lo tlle ulaJn topios of the query are extracted in the order or the \]eve\] related to the query from I:he analyzed text or the datiabase storing i.t under a guide ef the frame \].abe\]. and other heading :information as well as the index of the terms contained in the text. The \]eve\] of a description :in the text :is avai\]able \[or selection of tile knowledge source to be exLrated.</Paragraph> <Paragraph position="38"> (2) \[\['he intermediate expressions are rearranged in the coherent and readab:le order, for examl)\]e, in the occurrence order or tile eveuts~ alld all answer se.quence :i S coustruclied.</Paragraph> <Paragraph position="39"> (3) Under a given bounded length the answer sequence is grouped or segmented to several parts and sentential topics are selected to be expanded into surface expressions.</Paragraph> <Paragraph position="40"> (/4) The sentential fornl of each of tile segments is selected to one of phrase, simple, romp\[ ex nnd conlpouud sur\[ace exprc'ss\]ons by referring Lo the senkentJ al topi c.</Paragraph> <Paragraph position="41"> The suuunary of the text given in Example 2 Js generated \[rom tile analyzed resu\] Ls shown in Fig.2(b) hy referring ta tile steps 2 3 and 4. Fig.3 shows two summaries construe ted from the descriptions o~ the text: tip to \].eve\] I and 3 , where the part enc\].osed wJ.l:h brackets is the part generated \[rom the descriptions of level 3.</Paragraph> <Paragraph position="42"> Ievel I evel \]:The selector (:onstructs a graph relevant to the input .';taLement.</Paragraph> <Paragraph position="43"> 3:The selector COIIStrtlcts a graph relevaut to the input statement. In the constructJ on, the selector llerfurn~s the \]~o\] \]owJ llg processes. First, the selector copies the porlt:ion of tile semalltJ C solo Then, it produces a lnore Stir.lace but case gramnlar based represer~tation with i.nverse mapping :\[ urw. tJ ons \[which map a aunler J c representation to a more surfnce one\]deg l&quot;Jnal\]y, it: constructs moda\]Jty lists and chooses a surl:ace order:ing rule \[ wMch specifies the order o17 syntactic cases \] for each verb.</Paragraph> </Section> <Section position="2" start_page="242" end_page="242" type="metho"> <SectionTitle> Fig.3 Generated summaries 5. ConclusJ ou </SectionTitle> <Paragraph position="0"> All experinlenta\] system is under construction based on our s t r uc Ltlred-illf orl\]la t i Oll extractioll system constructed prev:ious\]y, rl'h:is paper focusses attention ou the content suggested by the headiag and :intersent:ent i a\] structures alld assigns a sentence \] eve l to each sentence. I~1\] ipsis aod restocaLJon problem o\[ krlown structures Oll syntax uud special f:ield knowledge is not considered here.</Paragraph> <Paragraph position="1"> However, it seems that there are no serious problems in many speci\]\[ic fields at an :interactive mode wJ Lh users.</Paragraph> </Section> class="xml-element"></Paper>