File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1038_metho.xml
Size: 17,922 bytes
Last Modified: 2025-10-06 14:11:23
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-1038"> <Title>ON A SEMANTIC MODEL FOR MULTI-LINGUAL PARAPHRASING</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> DG (Declarative Generation) Model and also reports preliminary linguistic generation </SectionTitle> <Paragraph position="0"> experiments done on system 3ASS (Japanese Synthesis System) developed under the DG framework.</Paragraph> <Paragraph position="1"> While there already have been a few generation systems, such as BABEI~3(N. M. Goldman), MUMBLE~(D. D. McDonald) etc. which intend to resolve lexical selection using special procedures or descrimination networks, the DG model can paraphrase semantic information written in the CFL frame language into target sentences using its built-in semantic inference capability. Conceptually, the model is divided into two logical phases, MU, which is the syntax generation phase, including lexicaI selection and syntax selection, and TLG, surface structure generation phase, including transformation and morphological generation. The model utilizes a semantic dictionary and a Iexical dictionary both written in CFL for lexical selection, in which the functional requirements are limited to those semantic inference capabilities found in CFL. It is of great importance that these capabilities have already been used in the contextual understanding of languages.</Paragraph> <Paragraph position="2"> The main DG model characteristics are as follow.</Paragraph> <Paragraph position="3"> 1) It presents a new way for semantic-directe~d lexical selection and syntax selection using frame inference capability. 2) Because of the modularity of all knowledge required for paraphrasing, this model greatly reduces knowledge base management costs. 3) It is generally independent of target languages, since the contents written in CFL and the built-in inference capability are thoroughly independent of languages. Accordingly, the DG logic functions are easily adaptable to the paraphrasing function for any natural language understanding systems or semantic-directed mechanical translation systems.</Paragraph> </Section> <Section position="4" start_page="0" end_page="239" type="metho"> <SectionTitle> 2. DG generation model basis </SectionTitle> <Paragraph position="0"> The main purpose behind proposin~ the DG paraphrasing model lies in formally bridging the qualitative gap between interli~4&quot;~t~nd surface structures. The inputs are a sentence style indicator, a generation control, and a set of,instance semantic depictions. In the first phase MU, input semantic depictions are transformed into a syntax structure utilizing a semantic dictionary, a lexical dictionary and sjntax generation rules under the control of the sentence style indicator and the generation control. In the second phase TLG, the syntax structure is transformed into a sur~\[ace string structure by a series of surface structure generation rules.</Paragraph> <Paragraph position="1"> CFL is a frame-based representation schema, with representation units called depictions correspond to both a dictionary entry and a semantic description. It has embedded semantic-directed inference capabilities, which function to transform input semantic descriptions'to a structure with morphological and syntactic information.</Paragraph> <Paragraph position="2"> Depictions describe sentential semantics based on Fillmore's case theory. Figure I shows the simplest examples of depictions. Here in the examples, depictions are classified into two categories~ schema in Figure la and instance in Figure lb. A schema is distinguished from an instance by the fact that all instance depictees (depiction names)) except for distinct names, are postfixed by a distinct nun~ber. From a pragmatic viewpoint, schema compose a semantic dictionary in the long-term memory, while instances describe concrete events and descriptions in the short-term memory.</Paragraph> <Paragraph position="4"> Fic\[.l Examples of semantic depictions Fig.la describes spacla~Ltransportation (~TRAN5) abstract where attribute descriptions are corresponding to case frame descriptions. Filling up ACTOR and INSTR with ~PETER and ~CAR .001, respectively, instantiates 9c~gma A and then produces the instance Fig.lb &quot;Peter drives&quot;.</Paragraph> <Paragraph position="5"> In CFL) two kinds of inference functions forr~he basis for logico-semantic lexical selection.</Paragraph> <Paragraph position="6"> Type 1. An implication test function acting on a combination of either an attribute yalue and an attribute condition, or one attribute condition and another attribute condition. Type 2. An association test function between depictions.</Paragraph> <Paragraph position="7"> Here, attribute condition is written by Boolean formula for semantic depictions. Type 2 function, which are realized by integrating Type I functions, can play a role in determining whether a semantic depiction is semantically identical to another or not. In natural language understanding) this facility is frequently used to determine referents. Functional Description : FD The FD schema) which is an n-ary tree structure, is used to describe syntax structures, syntax generation rules) and surface structure generation rules. Figure 7 shows a list-form representation of a FD structure, where the root node is DISCOURSE. The intermediate nodes in this framework are labeled with grammatical markers or case markers. The main merit of this kind of tree structure is that any value in leaf or substructure can be identified by its distinct path from the root node. Leaves are segregated into three kinds of values~ string values, depictees and numerals.</Paragraph> <Paragraph position="8"> The following two sections exphcate the inherent mechanism along the line of the linguistic paraphrasing process.</Paragraph> <Paragraph position="9"> 3. Syntax Generation - its knowledge and processin~ The MU mechanism is formalized by iterative invocation of two primitive operations) Match and Unify. The Match function adds morphological and local syntactic information to semantic depictions using a lexical dictionary and a semantic dictionary. After Matching) the Unify function is employed to modify and extend the given sentence style indicator A SEMANTIC MODEL FOR MULTI-LINGUAL PARAPHRASING 241 (called intermediate syntax structure), by applying syntax generation rules to the structure obtained by Match. In the initial stage, Match is applied to the semantic depiction specified by a depictee in the FD sentence style indicator leaf (see Figure 4).</Paragraph> <Paragraph position="10"> Match operation Lexlcal depictions, the lexical dictionary entries themselves, feature a S-prefixed depictee, and play a primal role in the mapping of a semantic depiction into a morphological and syntactic structure. The lexical depiction format is basically equal to that of a semantic depiction, with some extensions. As shown in Figure 2, the attribute names in a lexical depiction have such forms as SUB (=ACTOR), TOLOC (=TOPLACE), VEP, B=LEX etc., which are divided into two categories: I. X (=Y), 2. U=Z The following must necessarily hold for a successful Match of lexical and semantic depictions.</Paragraph> <Paragraph position="11"> i) Y must exist as an attribute name in a semantic depiction.</Paragraph> <Paragraph position="12"> if) X=Y (transformed in Match), and U=Z must be partial paths in a final syntax structure. Now, assume semantic depiction ~A (for example ~TRANS.001 in Fig.l) is given. The process first tries to find the lexical depiction SB (for example SDRIVE in Fig.2), one oL whose ancestors has a depiction name (depictee) identical, except for their prefixes, to one of the ancestors for the given semantic depiction.</Paragraph> <Paragraph position="14"> F_j=q.2 A lexical depiction Fi9.3 A Match result If nosuch lexical depictions are found, MU will terminate. If the depictions are found, the following steps will be taken.</Paragraph> <Paragraph position="15"> First, for each X~. (=Y,i.) attribute hal-he, the attribute value or condition for attribute name Y4. in the semantic depiction is tested to determine whether it implies the attribute condition of X 4. (=YPS). If Y 4, does not exist in the semantic depiction, or if the test fails, Match tries to find the next lexical depiction.</Paragraph> <Paragraph position="16"> Each Y i. value or condition is set as the XC/ (=Y4.) value or condition if and only if all X~. (=Y4.) attributes satisfy the above test, as well as the tests for all YPS case markers in the semantic depiction are completed. Otherwise, Match continues to search for a suitable lexical depiction.</Paragraph> <Paragraph position="17"> Second, for attribute Y i in the semantic depiction, which is not tested by the above, Match adds newly MOD=Y4. attribute with the value or condition of Y'~..</Paragraph> <Paragraph position="18"> .Consequently, the result appears simultaneously into the selected iexical depiction. The lexical depiction $DRIVE in Fig.3 is an example of Match results, which comes from a semantic depiction *TRANS 00l in Fig.lb and a lexical depiction $ DRIVE in Fig.2. Thus, the Match result has morphological information about DRIVE and local information about the surface and semantic case structure induced by $ DRIVE.</Paragraph> <Section position="1" start_page="239" end_page="239" type="sub_section"> <SectionTitle> Unify Operation </SectionTitle> <Paragraph position="0"> In general) a depictee under a path in a sentence style indicator or intermediate syntax structure) must be transformed to one comforming to legitimate syntax structures. A 242 K. MURAKI syntax structure generation rule determines such legitimate structures according to the condition along the path. Such permissable structures are plural, so Unify must select the one appropriate to the lexical depiction obtained by Match.</Paragraph> <Paragraph position="1"> Figure ~ shows a simple sentence style indicator which specifies that an instance depictee *TRAN5.00I must be transformed to the syntax structure appropriate to the path The FD syntax generation rule is shown in Figure 5a. The rule specifies that an instance semantic depictee just below the partial path < EVENT > is able to have the syntax structure specified by the right hand side of the rule. Figure 5b is a slightly extended form although it has basically the same function as the former. This includes variables !GR, !SR and \[CASE, each of which has a distinct domain. For example, !GR (Grammatical Roles) can bind an element of a set { SUB) DOB) IOB etc.}. !SR (Semantic Roles) can bind that of \[ ACTOR, OB3ECT, INSTR etc. }. !CA5E has a domain of 3apanese postpositions { GA (surface CASE for SUB)) WO (surface CASE for OB3ECT etc. ~ .</Paragraph> <Paragraph position="2"> The FD syntax generation rule means that an instance depictee specified by a partial path which is an instance of <IGR=ISR> can be transformed to the structure indicated by the right hand side of the rule, as long as the depictee is prefixed by (-!CASE). Such a variable !CASE, as on ~he right) is replaced by the value if the rule is successfully applied. Now, assume a lexical depictee A obtained by Match under a path < a= =a= ..... a~> .</Paragraph> <Paragraph position="3"> Generally) generation rules {R;.~ exist with path . \[ . > specifications <aj ..... an , 1~ j~ n. Unify fails if any R~. is not found. Here, each R {. candidate generation rule is to be unified with depiction A in turn, starting from the rule with the longest path specification until a sound generation rule is found. Successful Unify is defined as follows. Let the attribute name set for depiction A be B= \[<b, =b= ..... b~> ~ , and the set for all partial paths in R~. rule be C= {<c I =c~ ..... cx>} * i) For each <b~ ..... bj>,thereexlstsa <c~ =c 2 .... b~ ..... b~EC , l- k~.~ j. Each attribute value of *b I ..... bj > is set to the value of extended path <c~ =c 2 = .... b t ..... bj > (or equal to <c, =c~ ..... c x =b=,~ ..... b i > ).</Paragraph> <Paragraph position="4"> ii) All attribute values in depiction A must be assigned to the appropriate paths.</Paragraph> <Paragraph position="5"> If an R rule is verified unsatisfiable, a new one is tried. If no other candidate is found, the generation process fails and terminates.</Paragraph> <Paragraph position="6"> Thus, given a sentence style indicator (or an intermediate syntax structure), Match is applied to a semantic depictee and Unify to the lexlcal depiction resulting from Match. After this one primitive cyle, a new intermediate syntax structure is produced, which has morphological and syntactic information in greater detail than the previous one. Application of these two primitive operations continues until instance semantic depictees disappear from intermediate syntax structure.</Paragraph> <Paragraph position="7"> A SEMANTIC MODEL FOR MULTI-LINGUAL PARAPHRASING 243 t+. Surface Structure Generation - its knowledge and processing The TLG model for surface structure generati~ is defined on a set of pattern-directed production rules written in extended FD structures) each of which specifies a source structure on the left hand side and target structure on the right.</Paragraph> <Paragraph position="8"> The transformations required for surface structure generation can be roughly classified into two sub-classes. One is for global transformations, such as voice-transformation, nominalization) adjectivation etc. Another is mainly for morphological generation concerning tense) inflexion, gender etc.. In general, these two sub-classes have inherent application ordering. This holds not only between the above -two sub-classes, but also holds among the members of the former. To support flexible rule application controllability) the TLG model is defined on an adaptive production system, in which rules are categorized and rule application order is determined by tags of rules and categories.</Paragraph> <Paragraph position="9"> Figure 6 exemplifies a voice-transformation rule. Any rule has a rule number, a matching pattern including variables) a Boolean formula, a pattern-program and a tag.</Paragraph> <Paragraph position="10"> Variables prefixed with $ or # in a FD matching pattern can bind a substructure or a path) respectively. The Boolean formula is the LISP S-expression with these variables in a matching pattern) with a value T signifying that the rule application conditions have been satisfied. A pattern program is basically an FD structure with embedded functions such as (FUNC ENT ($Z)). In this example9 FUN indicates that ENT is one place function with SZ as an argument. The last part of any rule is a tag which is a pointer to subsequent target categories.</Paragraph> <Paragraph position="11"> Additionally) there are also tags in each category. A rule tag will specify that the control jump to the category specified by the tag) if the rule is applied. Control goes to the next rule in the same category if a tag is nil or the rule fails. On the other hand) a category tag will specify that control jump to the category specified if none of rules in the category can be applied. If a control tag is nil and none of the rules in the category can be applied) control goes back to the caller.</Paragraph> </Section> </Section> <Section position="5" start_page="239" end_page="239" type="metho"> <SectionTitle> 5. Sentence Style 5election </SectionTitle> <Paragraph position="0"> Sentence style selection is a most difficult problem in linguistic generation. In the DG model) input sentence style indicator) generation control and surface structure generation rules directly contribute to sentence style selection.</Paragraph> <Paragraph position="1"> Sentence style indicator roughly guides the style into which input depictions are paraphrased by placing the instance semantic depictees in the FD structure values. Values other than instance semantic depictees also determine how these depictees are paraphrased by application of surface structure generation rules) because the values can influence invocation of these generation rules. Consequently) sentence style selection must be &quot;accomplished while satisfying the contextual requirement in paraphrasing.</Paragraph> <Paragraph position="2"> Generation controls have several kinds of information, which in reality can be consulted by Boolean formulas) and embedded functions in surface&quot; structure generation rules. Accordingly, generation control) more precisely) surface structure generation rules, can play a great role in determining which sentence style will be selected. The 3apanese newspapers is composed of about 160 semantic depictions and 130 lexical depictions, Rules are composed of #0 syntax generation rules and 50 surface structure generation rules. The latter are classified into 6 categories, which are GLOBal, CON3unctive, CORE, PHRASE, LOCAL and MORPHological.</Paragraph> <Paragraph position="3"> In addition, seven kinds of embedded function for surface structure generation rules are utilized, mainly for morphological generation.</Paragraph> <Paragraph position="4"> In Figure 7, a portion of a simple example of a FD syntax structure obtained by the MU process of 3AS5 generation system is given, and is transformed from the event wherein 34 SAI</Paragraph> </Section> <Section position="6" start_page="239" end_page="239" type="metho"> <SectionTitle> NO TAKUSHI UNTENSHU GA KUROI KURAUN WO NUSUMU is described, us- </SectionTitle> <Paragraph position="0"> ing an input description set. It means 34 year old taxi driver X steals a large black luxury car.</Paragraph> <Paragraph position="1"> ) Fig.7 A portion of a syntax structure for an example sentence</Paragraph> </Section> class="xml-element"></Paper>