MWEs as Non-propositional Content Indicators 
Kosho Shudo, Toshifumi Tanabe, Masahito Takahashi

 and Kenji Yoshimura 
     Fukuoka University                      

Kurume Institute of Technology                       
8-19-1, Nanakuma, Fukuoka,                   66-2228, Kamitsu, Kurume,  
814-0180 JAPAN                                    830-0052 JAPAN 
{shudo,tanabe,yosimura}@tl.fukuoka-u.ac.jp  taka@cc.kurume-it.ac.jp 
 
Abstract 
We report that a proper employment of MWEs 
concerned enables us to put forth a tractable 
framework, which is based on a multiple 
nesting of semantic operations, for the 
processing of non-inferential, Non-
propositional Contents (NPCs) of natural 
Japanese sentences. Our framework is 
characterized by its broad syntactic and 
semantic coverage, enabling us to deal with 
multiply composite modalities and their 
semantic/pragmatic similarity. Also, the 
relationship between indirect (Searle, 1975) 
and direct speech, and equations peculiar to 
modal logic and its family (Mally, 1926; Prior, 
1967) are treated in the similarity paradigm. 
1 Introduction 
While proper treatment of the Propositional 
Content (PC) of a sentence is undoubtedly 
important in natural language processing (NLP), 
the Non-propositional Content (NPC) also plays a 
critical role in tasks such as discourse 
understanding, dialogue modeling, detecting 
speaker’s intension. We refer generically to the 
information which is provided by auxiliaries, 
adverbs, sentence-final particles or specific 
predicative forms in Japanese sentences as NPC. It 
is concerned with notions such as polarity, tense, 
aspect, voice, modality, and illocutionary act, 
which incorporate temporal, contingent, subjective, 
epistemic or attitudinal information into the PC.   
Though the inferential NPC e.g., implicature 
(Grice, 1975), has been discussed in semantics or 
pragmatics, it lies beyond the state-of-the-art 
technology of NLP. Besides, no systematic attempt 
to connect linguistic forms in the sentence with the 
non-inferential NPCs has been reported in NLP 
community. In this paper, we present a framework 
for the treatment of NPC of a sentence on the basis 
of the extensive, proper employment of multiword 
expressions (MWEs) indicating the NPCs in 
Japanese. In Japanese, which is a so-called SOV 
language, NPCs are typically indicated in the V-
final position by auxiliaries, particles and their 
various alternative multiword expressions. We 
have extracted extensively these expressions from 
large-scale Japanese linguistic data. We refer to 
these, including auxiliaries and ending-particles, as 
NPC indicators (NPCIs). The number of NPCIs 
amounts to 1,500, whereas that of auxiliaries and 
ending-particles is about 50, which is apparently 
insufficient for practical NLP tasks.  
Our model leads to dealing not only with some of 
illocutionary acts (Austin, 1962) but also with the 
logical operations peculiar to the family of modal 
logic, i.e., deontic (Mally, 1926) and temporal 
logic (Prior, 1967). 
We also present, in this paper, the idea of the 
similarity among NPCs within our framework. 
This is essential for text retrieval, paraphrasing, 
document summarization, example-based MT, etc. 
Some of the indirect speech acts (Searle, 1975) and 
axioms proper to the family of modal logic are 
treated formally in the similarity paradigm. 
In Section 2, we introduce an overview of our 
ongoing MWE resource development for general 
Japanese language processing. In Section 3, we 
introduce a framework for the treatment of NPC. A 
set of primitive functions to compose NPC is 
explained in Section 4. In Section 5, first, the 
relationship between the framework and Japanese 
syntax, and second, methods to identify NPCs of 
Japanese sentences and to apply them to a 
translation task are described. In Section 6, we 
formalize the similarity among NPCs within the 
framework. In Section 7, we present conclusions 
and comment on future work. 
2 Background MWE Resources 
The authors have been concerned with how to 
select atomic expressions of the sentence 
construction in NLP based on the semantic 
compositionality. Morphosyntactically, this 
problem is also serious for the processing of the 
agglutinative, space-free language like Japanese. 
Our research on this subject started in ‘70s by 
extracting manually multiword expressions as 
MWEs from large-scale Japanese linguistic data in 
the general domain. We estimate that the amount of 
data examined is 200,000 sentences. 
Second ACL Workshop on Multiword Expressions: Integrating Processing, July 2004, pp. 32-39
In this Section, we present an overview of our 
ongoing development of Japanese MWE resources. 
We have extracted multiword expressions that take 
at least one of the following three features;  
 
f
1
: idiomaticity (semantic non-decomposability),  
f
2
: lexical rigidity (non-separability),  
f
3
: statistical boundness.  
 
The expression which causes the difficulty in 
composing its overall meaning from normal 
meanings of component words has f
1
.
1
  f
2
 includes 
the feature to allow other words to cut in between 
the component words. The expression whose 
components are bound each other with high 
conditional probability has f
3
. Each multiword 
expression selected as a MWE was endowed with a 
binary-valued basic triplet (f
1
, f
2
, f
3
). For example, 
an idiomatic, separable and not-statistically-bound 
expression, “ ·� ·
��  hone·wo·oru” ‘make an 
effort (lit. break bone)’ is endowed with (1,0,0) and 
compositional, separable and statistically-bound 
expression, “Ylb� ·��  gussuri·nemuru” 
‘sleep soundly’, with (0,0,1). A dot ‘·’ denotes a 
conventional word-boundary, hereafter. 
Fixed expressions, decomposable idioms, 
institutionalized phrases, syntactically-idiomatic 
phrases, light verb constructions discussed in (Sag 
et al., 2002) and proverbs might correspond 
roughly to the triplets, (1,1,0), (1,0,0), (0,0,1), 
(0,x,1), (0,x,1) and (1,1,1), respectively. 
MWEs, whose number amounts to 64,800 at 
present, are classified by their overall, grammatical 
functions as follows. Examples with a triplet and 
the current number of expressions are also given in 
the following. Compound nouns and proper nouns 
are excluded in the present study. 
 
Conceptual MWEs:  
nominal<10,000>:“ 
z · w · 
  aka·no·tanin” 
(1,1,0) ‘complete stranger (lit. red stranger)’; 
“� ·w ·�
`  turu·no·hitokoe” (1,1,0) ‘the voice 
of authority (lit. one note of crane)’ ;  etc. 
verbal-nominal<1,700>:“ ��M · }V 
morai·naki” (1,1,0)  ‘weeping in sympathy (lit. 
received crying)’; “ ��� · ��  
rappa·nomi”(1,1,0) ‘drinking direct from the 
bottle (lit. trumpet drink)’;  etc. 
verbal<34,000>: “ T�· �� 
kami·simeru ”(1,1,0)  ‘chew well (lit. bite and 
fasten)’; “� ·g�� ni·tumeru” (1,1,0) ‘boil 
down (lit. boil and pack in)’;  etc. 
adjectival<4,300>: “ V�· l�M 
okorip·poi ”(0,1,0)  ‘irritable (lit. anger-ish)’; 
                                                      
1
 At present f
1
 and presumably f
2 
will not be decided by any statistical method. 
“�� ·
M  chuui·bukai” (1,1,0) ‘careful (lit. 
deep in caution)’;  etc. 
adjectival-nominal<2,000>:“�� ·w ·	4��   
ikkan·no·owari” (1,1,0) ‘the very end (lit. the 
end of a roll)’; “ �	{V · �� 
sujigaki·doori”(0,1,0) ‘as just planned (lit. just 
as a plot)’;  etc. 
adverbial<5,200>: “ qX · b� · q  
waruku·suru·to” (1,1,0) ‘if the worst happens (lit. 
if it worsens)’;  “ Olq� · q   
uttori·to”(0,1,0)‘abstractedly’;  etc. 
adnominal<2,600>: “j ·w ·�M  taai·no·nai” 
(1,0,1)‘inconsiderable (lit. with no altruism)’; 
“ �{ ·h�   danko·taru”(0,1,0) ‘firm’; etc. 
connective<300>: “ fw · AL   sono·kekka” 
(1,1,0) ‘consequently (lit. the result)’;  “f� ·x ·
^o ·SV   sore·ha·sate·oki” (1,1,1) ‘by the way 
(lit. setting it aside)’; etc. 
proverb-sentential<1,300>:“ xU · y · s�  
isoga·ba·maware” (1,1,1) ‘Make haste slowly. 
(lit. go round if it is in a hurry.)’; “	_� ·� ·� ·
�Q ·c  shunmin·akatuki·wo·oboe·zu” (1,1,1) ‘In 
spring one sleeps a sleep that knows no dawn.’; 
etc. 
proverb-sentential-incomplete<900>: “� ·x ·> ·
T�   yamai·ha·ki·kara” (1,1,0) ‘Fancy may kill 
or more. (lit. Illness is brought from one’s 
feeling.)’; “  · w · � · t · �� 
uma·no·mimi·ni·nenbutu” (1,1,1) ‘A nod is as 
good as a wink to blind horse. (lit. buddhist’s 
invocation to the ear of a horse)’ ; etc. 
 
Functional MWEs: 
relation-indicator(RI)<1,000>:“ t · mM · o  
ni·tui·te” (1,1,0) ‘about (lit. in touch with)’;  “t ·
�l ·o  ni·yot·te” (1,1,0)  ‘by (lit. depending 
on)’; “q ·q� ·t  to·tomo·ni” (1,1,0) ‘with (lit. 
accompanied with)’; “t ·SZ�  ni·okeru” 
(1,1,0) ‘in’ ,‘on  (lit. placed in)’;  etc. 
NPCI<1,500>: See Section 4. 
 
Nominals listed above are those marked with a 
triplet (1,1,x). We exclude compound nouns with 
(0,0,x) and proper nouns, whose number amounts 
to quite large, in this study. They should be treated 
in some other way in NLP. A treatment of those 
compound nouns for Japanese language processing 
is reported in (Miyazaki et al., 1993). 
Formally, the triplet is expanded in the lexicon to a 
partly multi-valued 7-tuple (f
1
, f
2
, f
3
, f
4
, f
5
, f
6
, f
7
). 
The augmented features are as follows; 
 
f
4
: grammatical class (shown above) 
f
5
: syntactical, original internal-structure 
f
6
: morphosyntactical variation: (m
1
, m
2
, ... , m
9
) 
m
1
: possibility to be modified by adnominal 
m
2
: possibility to be modified by appredicative 
m
3
: auxiliaries insertable in between its words  
m
4
: particles insertable in between its words 
m
5
: deletable particles 
m
6
: particles by which those in it are replaced 
m
7
: constituents which can be reordered 
m
8
: possibility to be nominalized by inversion 
m
9
: possibility to be passivized 
f
7
: estimated relative frequency 
 
f
6
 was adopted to ensure the flexibility of MWEs, 
while controlling the number of headings.  
Thus, our lexicon is not simply a list of MWEs but 
designed as a resource proliferous to a total variety 
of idiosyncratic expressions. (Shudo et al., 1980, 
1988; Shudo, 1989; Yasutake et al., 1997). 
The present study focuses on a set of NPCIs and its 
relationship to the non-propositional structure of 
natural sentences. Some of our multiword NPCIs 
are treated in the general, rewriting framework for 
MT in (Shirai et al., 1993). 
3 Non-propositional Structures (NPSs) 
Let us consider the meaning of a sentence; 
 
(1) “t ·x ·f\ ·t ·�� ·�Vp ·sTl ·h  
kare·ha·soko·ni·iru·bekide·nakat·ta” ‘He should 
not have been there’,  
 
where a verb  “�� iru” ‘be’  is followed by 
three auxiliaries, “�Vi   bekida” ‘should’, “sM  
nai”   ‘not’ and “ h  ta” ‘-ed’ which mean 
obligation, negation and past-tense, respectively, in 
the sentence-final position
2
. According to the 
occurrences of them, the solely literal paraphrase 
of (1) would be something like; 
 
(2) “t ·x ·f\ ·t ·�� ·�Vi ·q ·MO ·\q ·x ·  
sTl ·h  
kare·ha·soko·ni·iru·bekida·to·iu·koto·ha·nakat·ta
” ‘It was not necessary for him to be there’, 
 
However, this reading is not correct for (1). Rather, 
in contrast, its regular reading should be something 
like; 
 
(3) “t ·U ·f\ ·t ·� ·h ·w ·x ·�cM  
kare·ga·soko·ni·i·ta·no·ha·mazui” ‘It is evaluated 
in the negative that he was there’, 
 
By the way, it will be reasonable to think sentences 
                                                      
2
 “�Vi bekida” and “sM   nai ”  are inflected as  “�Vp bekide” and “s
Tl nakat”, respectively, in (1). 
(2) and (3) share a kernel sentence  “t ·U ·f\ ·
t ·��  kare·ga·soko·ni·iru” ‘He is there’, into 
which NPCs are incorporated successively, i.e., 
first - obligation, second - negation, third - past-
tense, in the case of (2), and first - past-tense, 
second - speaker’s-negative-evaluation, in the case 
of (3). Moreover, each stage of this incorporation 
would be regarded as mapping the utterance’s 
meaning from one to another, in parallel with a 
syntactic form being mapped from one to another.  
Hence, by introducing Non-propositional Primitive 
Functions (NPFs), e.g., OBLIGATION
2
, 
NEGATION
1, 
PAST-TENSE, and NEG-EVAL, we 
can explain the Non-propositional Structure (NPS) 
of (2) as; 
 
(4)PAST-TENSE [NEGATION
1
 
[OBLIGATION
2
[“ t · U · f\ · t · ��  
kare·ga·soko·ni·iru” ‘He is there’] ] ] 
 
and NPS of (3), hence, of (1) as, 
 
(5)NEG-EVAL[PAST-TENSE[“t ·U ·f\ ·t ·�
�  kare·ga·soko·ni·iru” ‘He is there’] ].
3
 
 
Here, a problem is that (4) is wrong for (1). In 
order to cope with this, while adopting a MWE, 
“�Vp ·sTl ·h bekide·nakat·ta” as a NPCI 
with a triplet (1,0,0) which has a composite NPF, 
NEG-EVAL[PAST-TENSE[x]]
4
, we have designed 
our segmenter to prefer a longer segment by the 
least-cost evaluation.
 
 
It should be noted that a composite of NPFs like 
this could be associated with a single NPCI. 
5
 This 
is caused by its idiomaticity, i.e., by the difficulty 
in decomposing it into semantically consistent sub-
forms. 
Investigating a reasonably sized set of Japanese 
linguistic data, keeping the strategy exemplified 
above in mind, revealed that NPS of a natural 
Japanese sentence can be generally formulated as a 
nested functional form; 
 
(6) M
n
[M
n-1
…[M
2
[M
1
[S]]]…],  
 
where S is a propositional, kernel sentence; M
i
   
(1� i� n), a NPF. In the following, we use the 
                                                      
3
 We use lower-suffixes to distinguish NPFs by the subtle differences in 
meaning, degree, etc. 
4
 Another choice could be, first, to adopt a shorter MWE, “�Vp ·sM  bekide· 
nai” ‘should not’ as a NPCI indicating PROHIBITION
2
, second, to build a NPS, 
PAST-TENSE[PROHIBITION
2
[“t ·U ·f\ ·t ·��  kare·ga·soko·ni·iru” ‘He 
is there’]], and last, to apply the following similarity rule in order to obtain (5), 
unless it yields the overgeneralization; 
PAST-TENSE[PROHIBITION
2
[x]] /� NEG-EVAL[PAST-TENSE[x]].   
The similarity rules are discussed in Section 6. 
5
 Another typical example is“�M  mai” which is a single auxiliary but has the 
meaning of ‘will not’, i.e., GUESS
2
[NEGATION
1
[x]]. 
notation for a composite function,  
M
n
◦ M
n-1
…◦ M
2
◦ M
1
, where M
n
◦ M
n-1
…◦ M
2
◦ M
1
[S]
 
= 
M
n
[M
n-1
…[M
2
[M
1
[S]]]…]. 
4 NPCIs, NPFs 
We have settled a set of 150 basic NPFs by 
classifying 1,500 NPCIs which had been extracted 
from the large-scale data. After manually 
extracting them, the data has been continuously 
checked and updated by comparing with various 
dictionaries and linguistic literature such as 
(Morita et al.,1989).  
They are subclassified as follows, though the 
boundaries between subclasses are partly subtle. It 
should be noted that some NPCIs are semantically 
ambiguous, being included in different subclasses 
below. Examples of NPCIs and the number of 
NPFs are given in brackets, in the following list. 
 
F
1
:polarity <3>: 
NEGATION
1
(“sM  nai” ‘not’ ; “w ·p ·x ·sM
no·de·ha·nai”(1,0,0) ‘not’ ; etc.), 
NEGATION
2
(“ q · MO ·  · p · x · sM
to·iu·wake·de·ha·nai”(1,0,0) ‘not’ ; etc.),etc. 
F
2
:tense <1>: 
PAST-TENSE(“h  ta” V-ed ; “i  da” V-ed)  
F
3
:aspect-observational <9>: 
IMMEDI-AFT-TERMINATING (“h ·q\� ·i
ta·tokoro·da”(1,1,0) ‘have just V-en’;  “h ·yT
� ·w ·q\� ·i ta·bakari·no·tokoro·da” (1,1,0)  
‘have just V-en’ ; etc.), 
IMMEDI-BEF-BEGINING(“O ·q ·` ·o ·M�   
u·to·si·te·iru” (1,0,0)  ‘be about to’; “�O ·q ·` ·
o ·M� you·to·si·te·iru”   (1,0,0) ‘be about to’ ; 
etc.), 
PROGRESSING(“o ·M� te·iru” (1,0,0)  ‘be V-
ing’;  “mm ·K� tutu·aru”(1,1,0) ‘be V-ing’;  
etc.), etc. 
F
4
:aspect-action <8>: 
INCHOATIVE(“xa��  hajimeru” ‘begin to’;  
“ ib dasu” ‘begin to’; etc.), 
TERMINATIVE(“S��  owaru” ‘finish V-
ing’ ; “SQ�  oeru” ‘finish V-ing’ ; etc.), 
CONTINUATIVE(“Z�  tuzukeru” ‘continue 
to’ ;  “��Q�   nagaraeru” ‘continue to’;  etc), 
etc. 
F
5
:voice <10>: 
PASSIVE(“��   reru” ‘be V-en’ ;  “���  
rareru” ‘be V-en’), 
CAUSATIVE(“d�   seru” ‘make…V…’ ;  “^
d�   saseru” ‘make…V…’), 
PAS-SUFFERING(“��   reru” ‘have…V-en’ ; 
“���   rareru” ‘have…V-en’ ;  etc.), 
PAS-BENE-TAKING
1
 (“ o · ��O
te·morau”(1,0,0)‘ask …V’; “ o · MhiX
te·itadaku” (1,0,0) ‘ask… V’ ;  etc.),  
BENE-TAKING(“o ·X�� te·kureru” (1,0,0) 
‘V... for (someone)...’ ; etc.), etc.  
F
6
:politeness-operator <3>: 
POLITENESS
1
 (“�b   masu” ; etc.) ,etc. 
F
7
:predicate-suffix <30>: 
TRIAL(“o ·�� te·miru” (1,0,0) ‘try to’ ;  
etc.),etc. 
F
8
:modality <60>: 
NEG-EVAL(“�V ·p ·sM beki·de·nai” (1,0,0) 
‘should not’ ; “w ·x ·�X ·sM no·ha·yoku·nai” 
(1,0,0) ‘should not’ ; etc.),  
OBLIGATION
2
(“�A ·U ·K� hituyou·ga·aru ” 
(1,0,0)  ‘need’, “�Vi  bekida” ‘should’, etc. ),  
OBLIGATION
1
(“ sZ� · y · s� · sM
nakere·ba·nara·nai” (1,1,1) ‘have to’ ;  etc.), 
PROHIBITION(“ o · x · s� · sM
te·ha·nara·nai”(1,0,1) ‘should not’, etc.), 
CAPABILITY(“��  uru” ; “\q ·U ·pV�   
koto·ga·dekiru”(1,0,0) ‘be able to’; etc.), 
GUESS
1
(“O   u” ‘will’), etc. 
F
9
:illocutionary-act <28>: 
IMPERATIVE(imperative-form of verb 
‘imperative form’),  
INTERROGATIVE (“ T ka” ‘interrogative 
form’;  “w ·T  no·ka”(1,1,0) ‘interrogative 
form’ ;  etc.),  PROHIBITIVE(“ s  na” 
‘Don’t... ’), PERMISSIVE(“ o · �M (1,1,0)  
te·yoi”  ‘You may...’ ; “o ·� ·T�� ·sM   
te·mo·kamawa·nai”(1,0,0) ‘You may...’  ; etc.), 
REQUESTING(“ o · X�   te·kure”(1,1,0) 
‘Please...’; “o ·�`M  te·hosii”(1,1,0) ‘I want 
you to...’ ;  etc.), etc. 
5 Treatment of NPSs 
5.1 Sentence-final Structure in Japanese 
Employing MWEs as NPCIs enabled us to 
describe the outermost structure of a Japanese 
sentence by the following production rules; 
 
(7) S
0
→ BP
*
·PRED,
 
(8) S
i
→ S
i-1
·m
i
, (1� i� n), 
 
where S
0 
denotes a kernel sentence; BP, a basic 
phrase called bunsetsu; PRED, a predicate of the 
kernel sentence; S
i
, a sentence, m
i, 
a NPCI and a 
symbol ‘*’, closure operator on the concatenation, 
‘·’. In the following, we use predicative parts, 
PRED· m
1
· m
2 
· ··· · m
n
 instead of full sentences, for 
simplicity. 
Our morphology model was developed so as to fit 
for the general semantic processing, adopting 
MWEs. It is a probabilistic finite automaton with 
150 states that prescribes minutely the internal 
structure of each BP and the predicative part. We 
leave its detail to (Shudo et al., 1980). 
5.2 Identifying NPS 
Based on our morphological analyzer, we have 
developed a segmenter (SEG) that segments the 
input predicative part into a PRED and each NPCI, 
and a NPS-constructor (NPSC) that constructs 
NPSs. For example, an input;  
 
(9) “ ��sZ�ys�sMi�O
yomanakerebanaranaidarou”  ‘will have to 
read’   
 
is first segmented into 
 
(10) “�� /·sZ� ·y ·s� ·sM /·i� ·O  
yoma/·nakere·ba·nara·nai/·daro·u ” 
 
by SEG. Here, a slash ‘/’ denotes a segment-
boundary identified by SEG. Then, NPSC 
evaluates a function nps defined below. 
 
(11) nps(S
0
)=S
0
, 
nps(S
0
/m
1
/m
2
.…/m
i
)=M
i
k
[…M
i
2
[M
i
1
[nps(S
0
/ 
m
1
/m
2
.…/m
i-1
)]]],(1� i� n), 
 
where M
i
k
[…M
i
2
[M
i
1
[x]]] is a NPF (if k =1) or a 
composite of NPFs (if k� 2) associated with m
i
. 
Hence, the computation of nps for (10) is; 
 
(12) nps(“�� /·sZ� ·y ·s� ·sM /·i� ·O
yoma/·nakere·ba·nara·nai/·daro·u”) 
=GUESS
2
[nps(“�� /·sZ� ·y ·s� ·sM  
yoma/·nakere·ba·nara·nai” ‘have to read’)] 
=GUESS
2
[OBLIGATION
1
[nps(“��   yomu” 
‘read’)]] 
=GUESS
2
[OBLIGATION
1
[“ ��  yomu”  
‘read’]],  
 
where GUESS
2
 and OBLIGATION
1
 are associated 
with “i� ·O daro·u” ‘will’ and “sZ� ·y ·s� ·
sM nakere·ba·nara·nai ”  ‘have to’, respectively.
  
In order to examine the adequacy and sufficiency 
of NPFs, we evaluated outputs of NPSC for 4,083 
input predicative parts, which had been taken 
randomly as a test set from newspaper articles and 
segmented by SEG. It produced a recall of 97.4% 
and a precision, 41.8%. The score of the recall 
seems to imply the sufficiency of the set of NPFs 
and NPCIs. Relatively low score of the precision is 
due to the system’s over-generation caused by the 
semantic ambiguities of NPCIs. Among various 
measures to be taken, firstly, semantic constraints 
to control the composition operation ‘◦ ’ may be 
effective to produce a better precision. The 
complete disambiguation measure is left to future 
work. 
5.3 Application to J/E Machine Translation 
We introduce here another experimental system, 
referred to as ENGL, whose input is the NPS of a 
sentence and whose output is its English forms, to 
demonstrate the usefulness of our formalism. 
ENGL simply realizes NPFs within English syntax.  
We assumed each NPF for English could be 
accomplished by applying rewriting rules of two 
types; i) V �  x · V
v 
· y and ii) S�  x · S
v 
· y , where 
V is a verb or an auxiliary; V
v
 is V, a null string, or 
a variant of V; S, a sentence; S
v
, a variant of S; and 
x, y, a null string or a string of specific words.  
Basically, a single rewriting rule is applied for a 
single NPF. However, occasionally, a NPF requires 
several rules to be applied successively. Also we 
may have no NPCI corresponding to a given NPF 
within the target language.  For example, 
POLITENESS, which is common in colloquial 
Japanese, has mostly no NPCI in English. 
For example, the computation for (12) is 
 
GUESS
2
 [OBLIGATION
1
 
[“�� yomu”]] 
= GUESS
2
 [OBLIGATION
1 
[read]] 
= GUESS
2
 [have to · read] 
=will · have to · read, 
 
where the rewriting rules associated with 
NECESSITY
1
 and GUESS
2
 are V� have to · V
root
 
and V� will · V
root
 , respectively. 
We give four more I/O examples  In (14), the 
instantaneous aspect of aruki uhajimeru ; begin 
walk-ing excludes the possibility of the 
interpretations, PROGRESSING
1
, 
PROGRESSING
2
 and STATE-OF-THINGS of 
teiru, which remain in (13) or (15). This is because 
the system deals with concatenatability rules based 
on aspect features of the predicate.  (ENGL simply 
denotes the verb’s inflected form by -ed, -en, etc.)  
 
(13) nps(“�� /·p ·M�  ; manan/·de·iru” ) 
=
1 
PROGRESSING
1
[study]= be study-ing, 
=
2 
PROGRESSING
2
[study]= have be-en study-
ing, 
=
3 
COMPLETED
1
[study]=have study-en 
 
(14)nps(“ 2V /· �� · o · M� ; 
aruki/·hajime·te·iru”) 
=COMPLETED
1
[INCHOATIVE[walk]]= have 
begin-en walk-ing 
 
(15)nps(“j` /·o ·M�  ; aisi/·te·iru” ) 
= STATE-OF-THINGS[love] = love 
 
(16)nps(“�T` /·o ·� /·o ·� ·�M /·w ·p`� /·
O /· T  ; 
ugokasi/·te·mi/·te·mo·yoi/·no·desho/·u/·ka”) 
=NTERROGATIVE[GUESS
1
[DECLARATION
[PERMISSIVE[TRIAL[move]]]]] 
= Will it be allowed that...try to move....? 
 
A small-scale experiment, for 300 NPSs extracted 
from sentences in technical papers has shown that 
ENGL produced a precision of 86% and a recall, 
80%. While these relatively high scores implies the 
fundamental validity of the NPF framework, more 
extensive tests will be required to make more 
reliable evaluation for the general domain, since 
technical papers tend to have less-complicated 
NPFs. In addition, further correction and 
refinement of synthesis rules for English will be 
necessary to obtain higher scores.  
6 Similarity between NPSs 
In this section, we show that our framework for the 
NPS description can be used properly to formalize 
some semantic or pragmatic relationship between 
non-propositionalized sentences. 
6.1 Logical Rules 
First, we discuss, here, the logical similarity 
relation, /�  � ((� F
i
)
*
)
2
,  (1� i� 8), which seems 
crutial for NLP tasks such as text retrieval or 
paraphrasing.
6
 We prefer the term, ‘similarity’ to 
‘equivalence’ here since it should be based on truth 
values taken in ‘most situations’, or in some 
‘similar’ worlds.
 7
 
There are basic rules such as; 
 
(17) NEG-EVAL◦  NEGATION
1
  
/� OBLIGATION
2
 
(18) NEGATION
1
◦ PERMISSION 
/� PROHIBITION,  
(19) NEGATION
1,2
◦ NEGATION
1,2
 
/�   (identity function), 
(20) N◦ /�   ◦ N /� N for
�
N� (� F
i
)
* 
, (1� i� 8),  
(21) POLITENESS /�   , 
  
(17) asserts that, for example, an utterance, “He 
has to go there.”  is similar to “It is evaluated in 
the negative that he does not go there.”. Besides 
these basic rules, there is a set of logically notable 
rules. For example, from the observation that “�
M /· o · yT� · M� /·  · p · x · sM   
                                                      
6
 While the NPF in F
i
, (1� i� 7) produces a truth conditional sentence, the NPF 
in F
9 
does not. The NPF in F
8
 produces a truth conditional sentence, unless it is 
used for the speaker’s epistemic judgment. 
7
 But we do not enter further theoretical arguments here. 
hatarai/·te·bakari·iru/·wake·de·ha·nai”    ‘do not 
always work’  is similar to  “�T ·sM /·� ·U ·K
�   hataraka·nai/·toki·ga·aru” ‘It happens 
occasionally that…do not work’ the following rule 
will be induced; 
 
(22) NEGATION
2
◦ HIGHEST-FREQUENCY 
/� LOW-FREQUENCY◦ NEGATION
1
. 
 
Also, “ �T /· sX /· o · � · �M 
hataraka/·naku/·te·mo·yoi” ‘need not work’; ‘It is 
allowed that...do not work…’ and “�T /·sZ� ·
y · s� · sM /· � · x · sM 
hataraka/·nakere·ba·nara·nai/·koto·ha·nai” ‘It is 
not obligatory that …work…’, will induce a rule; 
 
(23) PERMISSION◦ NEGATION
1
   
/� NEGATION
1
◦ OBLIGATION. 
 
These rules can be generalized as (24), (24’) by 
introducing a ‘duality’ function, d defined below; 
 
 
        M, d(N)                          d(M), N 
----------------------------------------------------------- 
POSSIBILITY                     NECESSITY, 
HIGHEST-PROBABILITY, 
                                     HIGHEST-CERTAINTY 
LOW-FREQUENCY   HIGHEST-FREQUENCY, 
                                      HIGHEST-USUALITY 
PERMISSION                    OBLIGATION, 
                                  HIGHEST-INEVITABILITY  
----------------------------------------------------------- 
 
(24) NEGATION
1,2
◦ M /�  d(M) ◦ NEGATION
1,2
, 
 
(24’) M /� NEGATION
1,2
◦  d(M) ◦ NEGATION
1,2
. 
 
We show two more examples; 
 
(22’) HIGHEST-FREQUENCY◦ NEGATION
1
 
/� NEGATION
2
◦ LOW-FREQUENCY. 
nps(“ �T /· sM · p · yT� · M�  
hataraka/·nai·de·bakari·iru” ‘It is always 
that…do not work…’ ) /� nps(“�X /·\q ·U ·K
� /· q · x · tQ· sM 
hataraku/·koto·ga·aru/·to·ha·ie·nai”  ‘It does not 
happen that…sometimes work…’ ). 
 
(23’) OBLIGATION◦ NEGATION
1
 
/� NEGATION
1
◦ PERMISSION. 
nps(“ �M /· o · x · s� · sM 
hatarai/·te·ha·nara·nai” ‘must not work’ )  
/� nps(“�M /·o ·�M /·q ·MO ·� ·x ·sM   
hatarai/·te·yoi/·to·iu·koto·ha·nai” ‘It is not 
permissible that…work…’ ). 
 
Rule (24) corresponds to the axiom, ��%��
�% , in modal logic and its variants, e.g., deontic 
(Mally, 1926) or temporal (Prior, 1967) logic, 
where �  and �  are the necessity and possibility 
operator, respectively. 
6.2 Pragmatic Rules 
The similarity relation among the speaker’s 
attitude or intention toward the hearer is defined as 
a set, �� { (a,b) | a,b /;  (F
1
� F
2
…� F
9
)
*
 �  ((
�
i, 
1� i� l� f
i
/; F
9
)� (
�
j,  1� j� m� g
j
/; F
9
)), where 
a=f
1
◦ f
2
…◦ f
l
, b=g
1
◦ g
2
…◦ g
m
}. 
Some of the indirect speech acts (Searle, 1975) can 
be formulated as the similarity within our 
framework. Examples of the rules and their 
instances are; 
 
(25) REQUESTING  
� INTERROGATIVE◦ NEGATION
1
 
◦ CAPABILITY, 
� INTERROGATIVE◦ CAPABILITY, 
� POLITENESS◦ IMPERATIVE, 
� INTERROGATIVE◦ NEGATION
1
 
◦ BENE-TAKING, 
� INTERROGATIVE◦ NEGATION
1
 
◦ CAPABILITY◦ PASS-BENE-TAKING, 
� DESIRING◦ PASS-BENE-TAKING, 
� DESIRING◦ PASSIVE. 
nps(“_ /·o ·X�   mi/·te·kure”  ‘Look at …’ ), 
� nps(“_� /·\q ·U ·	ZR /·sM /·T   
miru/·koto·ga·deki/·nai/·ka”  ‘Can’t you look 
at …?’ ), 
� nps(“_� /·\q ·U ·	ZR� /·T  
miru/·koto·ga·dekiru/·ka” ‘Can you look at 
…?’  ), 
� nps(“_ /·s^M   mi/·nasai”  ‘Please look at 
…’ ),  
� nps(“ _ /· o · X� /· sM /· T   
mi/·te·kure/·nai/·ka”  ‘Don’t you look at … for 
me …?’ ), 
� nps(“ _ /· o · �� /· Q /· sM /· T   
mi/·te·mora/·e/·nai/·ka”  ‘Can’t I have you 
look at… for me…?’ ), 
� nps(“_ /·o ·��M /·hM   mi/·te·morai/·tai ” 
‘I want you to look at … for me…’ ),
 
� nps(“_ /·�� /·hM   mi/·rare/·tai”  ‘I want 
you to look at …’ ). 
 
With respect to prohibition, invitation, 
permission and assertion, we have; 
 
(26) PROHIBITIVE 
� PROHIBITION, 
� NEGATION
1
◦ CAPABILITY. 
nps(“�� /·s   hairu/·na”  ‘Do not enter…’ ) 
� nps(“ �l /· o · x · s� · sM   
hait/·te·ha·nara·nai”  ‘You must not enter…’ ) , 
� nps(“ �� /· � · U · 	ZR /· sM    
hairu/·koto·ga·deki/·nai”  ‘You can not 
enter…’ ),  
 
(27)INVITING 
� INTERROGATIVE◦  INVITING, 
� INTERROGATIVE◦ NEGATIVE
1
. 
nps(“	��� /·O   tabeyo/·u”  ‘Let’s eat…’ ) 
� nps(“	��� /·O /·T   tabeyo/·u/·ka” 
      ‘Will you eat…?’  ), 
� nps(“	�� /·sM /·T   tabe/·nai/·ka” 
     ‘Don’t you eat…?’  ). 
 
(28)PERMISSIVE 
� POSSIBILITY. 
nps(“� /·o ·�M   ki/·te·yoi”  ‘You may wear…’ ) 
� nps(“ �� /· \q · U · 	ZR�  
kiru/·koto·ga·dekiru”  ‘You can wear…’ ). 
 
(29)ASSERTING◦ PAST-TENSE◦  NEGATION
1
 
� INTERROGATIVE◦  PAST-TENSE 
nps(“	�� /·sTl /·h /·�  tabe/·nakat/·ta/·yo”;  
‘I did not eat... ’ ), 
� nps(“	�� /·h /·TM  tabe/·ta/·kai”; ‘Did I 
eat ...? ’ ). 
 
 
We have obtained approximately 30 pragmatic 
rules concerned with the NPCIs in Japanese. In the 
realistic tasks of NLP, application of these rules 
should be controlled by rather complicated 
conditions settled for each of them. For example, 
conditions for rules (25) ~ (28) will include that 
the agent of their complement sentence should be 
the second person, and for (29), the first.  Although 
the principle underlying these rules were discussed 
in a lot of literature, e.g., felicity condition in 
(Searle, 1975), etc., the whole picture has not been 
clarified for computational usage. 
7 Conclusions 
We have shown that as far as the non-inferential, 
Non-Propositional Content (NPC) in Japanese 
sentence is concerned, its semantic 
compositionality can be secured, provided 
sentence-final MWEs are adopted properly as 
NPCIs. Although the functional treatment of NPCs 
is not particularly new in the theoretical domain, 
our model is characterized by its broad 
syntactic/semantic coverage and its tractability in 
NLP. It connects syntax with semantics by actually 
defining 150 non-propositional functions (NPFs) 
for 1500 NPC indicators through a large-scale 
empirical study. The similarity equations presented 
here might lead to some formal system of 
‘calculations’ on the set of NPFs, which might be 
available for NLP in future.   
The syntactic coverage of our semantic/pragmatic 
model will surely be further broadened by 
investigating non-final parts of Japanese sentences. 
This research should focus on the sentence 
embedment whose main verb is epistemic or 
performative (Austin, 1962), and adverbs that take 
part in indicating NPCs.  
While the list of NPFs introduced in this paper will 
provide, we believe, a basis for analyzing the NPC 
of natural sentences, it might be possible, or rather 
necessary for particular task, to refine NPFs by 
enriching them with case-elements, more detailed 
degrees or subtle differences in meaning, etc.  
We have not solved the problem of semantically 
disambiguating each NPCI. Further, we know little 
about the language-dependency, consistency of the 
similarity rules. The language-dependency of NPS 
is interesting from the viewpoint of machine 
translation or comparative pragmatics. The 
frameworks presented here could hopefully 
provide tools for those comparative studies. 

References  
John L. Austin. 1962. How to Do Things with 
Words. Oxford U.P.  
H. Paul Grice. 1975. Logic and Conversation. In P. 
Cole and J. L. Morgan, editors, Syntax and 
Semantics Vol. 3, Speech Acts: 41-57. Academic 
Press.  
Ernst Mally. 1926. Grundgesetze des Sollens: 
Elemente der Logik des Willens. Universitäts-
Buchhandlung: viii+85.  
Masahiro Miyazaki, Satoru Ikehara and Akio 
Yokoo. 1993. Combined Word Retrieval for 
Bilingual Dictionary Based on the Analysis of 
Compound Words. Trans. IPSJ 34-4: 743-754. 
(in Japanese) 
Yoshiyuki Morita and Masae Matsuki. 1989. 
Expression Pattern of Japanese. ALC Press. (in 
Japanese) 
Arthur Prior. 1967. Past, Present and Future. 
Clarendon press, Oxford.  
Iwan A. Sag, Timothy Baldwin, Francis Bond, Ann 
Copestake and Dan Flickinger. 2002. Multiword 
Expressions: A Pain in the Neck for NLP. The 
Proc. of the 3rd CICLING: 1-15. 
John R. Searle. 1975. Indirect Speech Acts. In P. 
Cole and J. L. Morgan, editors, Syntax and 
Semantics Vol. 3, Speech Acts: 59-82, Academic 
Press.   
Satoshi Shirai, Satoru Ikehara and Tsukasa  
Kawaoka. 1993. Effects of Automatic Rewriting 
of Source Language within a Japanese to 
English MT System. Fifth International 
Conference on Theoretical and Methodological 
Issues in Machine Translation: TMI-93: 226-239  
Kosho Shudo, Toshiko Narahara and Sho Yoshida. 
1980. Morphological Aspect of Japanese 
Language Processing. The Proc. of the 8th 
COLING: 1-8.  
Kosho Shudo, Kenji Yoshimura, Mitsuno Takeuti  
and Kenzo Tsuda. 1988. On the Idiomatic 
Expressions in Japanese Language – An 
Approach through the Close and Extensive 
Investigation of Non-standard Usage of Words – 
IPSJ SIG Notes, NL-66-1: 1-7. (in Japanese) 
Kosho Shudo. 1989. Fixed Collocations. Ministry 
of Education, Science, Sports and Culture, 
Grant-in-Aid for Scientific Research, 63101005. 
(in Japanese) 
Masako Yasutake, Yasuo Koyama, Kenji 
Yoshimura, Kosho Shudo. 1997. Fixed-
collocations and Their Permissible Variants. The 
Proc. of the 3rd Annual Meeting of ANLP: 449-
452. (in Japanese) 
