⇤ ← Revision 1 as of 2004-07-21 20:02:17
838
Comment: missing edit-log entry for this revision
|
11553
|
Deletions are marked like this. | Additions are marked like this. |
Line 47: | Line 47: |
[[TableOfContents]] = Idiom Implementation = This page describes how to implement idioms in a DELPH-IN Grammar, taking [wiki:JacyTop Jacy] as an example. The basic idea is that [wiki:LkbTop LKB] checks if all the constituents (or PREDs) of an idiom appear in an (MRS output of) sentence. == Overview of LKB's idiom detection mechanism == LKB has an idiom detection mechanism that is realized by making use of its Machine Translation mechanism. The idiom detection mechanism is invoked after parsing a sentence. Then 1. It examines whether the sentence is specified as {{{[IDIOM +]}}}. Unless {{{[IDIOM +]}}}, no further processing is invoked. 2. If {{{[IDIOM +]}}}, the mechanism consults with {{{idioms.mtr}}} (and {{{mtr.tdl}}}), which is a list of idioms, to see if the sentence contains all the constituents of an idiom. If so, the sentence is accepted and certified as containing an idiom. On the contrary, if the sentence does not have all the constituents of an idiom even though it is {{{[IDIOM +]}}}, it is rejected. == Example == Imagine that you want to implement a Japanese idiom, ''yaku-ni tatsu'' (part-DAT stand) "useful". All the constituents of the idiom can be used independently and represent a literal meaning, but this particular combination only represents an idiomatic meaning. Below are examples of how to use the literal meaning ''tatsu'' (a) and the idiomatic meaning ''tatsu'' (b). 1. a. ''Ken-ga butai-ni tatsu.'' (Ken-NOM stage-DAT stand) "Ken appears on stage." a. ''Ken-ga yaku-ni tatsu.'' (Ken-NOM part-DAT stand) "Ken is useful." Then, imagine that you want to have both the literal meaning ''tatsu'' ({{{tatsu_lit}}}, hereafter) and the idiomatic meaning ''tatsu'' ({{{tatsu_idiom}}}) in your grammar. And you also want {{{tatsu_idiom}}} to appear only when all the other constituents of the idiom, ''yaku-ni tatsu'' (part-DAT stand) "useful", appear in a sentence. == Implementation == You need following configuration. * List all idioms in {{{idioms.mtr}}} (and {{{mtr.tdl}}}). * For each idiom, add one of its constituents in the lexicon file. In the case of ''yaku-ni tatsu'', ''tatu'' is newly entered into the lexicon as {{{tatsu_idiom}}}. So there are {{{tatsu_lit}}} and {{{tatsu_idiom}}} in the lexicon. Note that the newly entered lexical item for an idiom introduces {{{[IDIOM +]}}} into the feature structure of sentence. * Configure relevant rules in your grammar so that the {{{[IDIOM +]}}} goes up into the syntactic structure. This configuration would be grammar-dependent. * Configure {{{roots.tdl}}}, {{{script}}}, {{{user-fns.lsp}}}, and {{{globals.lsp}}} to invoke the idiom dtection mechanism. === idioms.mtr and mtr.tdl === Here is an example of how to list idioms in {{{idioms.mtr}}} (and {{{mtr.tdl}}}). Below is the case of ''yaku-ni tatsu''. Note that ''yaku-ni'' is the {{{ARG2}}} of ''tatsu''. {{{idioms.mtr}}} {{{ yaku+ni+tatsu := np_v_idiom_mtr & [ INPUT.RELS <! [ PRED "_tatsu_v_i_rel" ], [ PRED "_yaku_n_rel" ] !> ]. }}} {{{mtr.tdl}}} {{{ np_v_idiom_mtr := monotonic_mtr & [ INPUT.RELS <! [ LBL handle, ARG0 event, ARG1 ref-ind, ARG2 ref-ind & #arg2 ], [ LBL handle, ARG0 #arg2 ] !>, OUTPUT.RELS <! !> ]. }}} === The Lexicon === Here is the lexical entry for {{{tatsu_idiom}}}. {{{ tatsu_idiom := v2-c-stem-lex & [SYNSEM.LKEYS [KEYREL [PRED '_tatsu_v_i_rel]], ORTH <! "立つ" !>, IDIOM +]. }}} Note that {{{v2-c-stem-lex}}} is the lexical type for a transitive verb that takes a dative argument, and that this introduces {{{[IDIOM +]}}}. === Configuration of Relevant Rules === Below are configurations of the grammar so that it lifts {{{[IDIOM +]}}} up into the syntax. Most of them would be Jacy-dependent. ==== matrix.tdl ==== {{{ sign := basic-sign & [ SYNSEM synsem, ARGS list, INFLECTED bool, ROOT bool, IDIOM bool]. }}} Note that the IDIOM feature is introduced. {{{ lex-rule := phrase-or-lexrule & word-or-lexrule & [ IDIOM #idiom, NEEDS-AFFIX bool, SYNSEM.LOCAL.CONT.RELS [ LIST #first, LAST #last ], DTR #dtr & word-or-lexrule & [ SYNSEM.LOCAL.CONT.RELS [ LIST #first, LAST #middle ], ALTS #alts, IDIOM #idiom], C-CONT.RELS [ LIST #middle, LAST #last ], ALTS #alts, ARGS < #dtr > ]. }}} Note {{{[IDIOM #idiom]}}}. {{{ basic-unary-phrase := phrase & [ STEM #stem, IDIOM #idiom, SYNSEM.LOCAL.CONT [ RELS [ LIST #first, LAST #last ], HCONS [ LIST #scfirst, LAST #sclast ] ], C-CONT [ RELS [ LIST #middle, LAST #last ], HCONS [ LIST #scmiddle, LAST #sclast ] ], ARGS < sign & [ STEM #stem, SYNSEM.LOCAL local & [ CONT [ RELS [ LIST #first, LAST #middle ], HCONS [ LIST #scfirst, LAST #scmiddle ] ] ], ROOT -, IDIOM #idiom] > ]. }}} Note {{{[IDIOM #idiom]}}}. {{{ basic-binary-phrase := phrase & [ IDIOM #idiom, SYNSEM.LOCAL.CONT [ RELS [ LIST #first, LAST #last ], HCONS [ LIST #scfirst, LAST #sclast ] ], C-CONT [ RELS [ LIST #middle2, LAST #last ], HCONS [ LIST #scmiddle2, LAST #sclast ] ], ARGS < sign & [ IDIOM #idiom, SYNSEM.LOCAL local & [ CONT [ RELS [ LIST #first, LAST #middle1 ], HCONS [ LIST #scfirst, LAST #scmiddle1 ] ] ], ROOT - ], sign & [ IDIOM #idiom, SYNSEM.LOCAL local & [ CONT [ RELS [ LIST #middle1, LAST #middle2 ], HCONS [ LIST #scmiddle1, LAST #scmiddle2 ] ] ], ROOT - ] > ]. }}} Note {{{[IDIOM #idiom]}}}. ==== fundamentals.tdl ==== {{{ lexical_sign-rule := lexical_sign & phrase-or-lexrule & [IDIOM #idiom, ARGS <[IDIOM #idiom]>]. }}} ==== rule-types.tdl ==== {{{ unary-type-super := phrasal_sign & [IDIOM #idiom, SYNSEM [LOCAL [CTXT [C-INDICES [ADDRESSEE #add, SPEAKER #sp], EMPATHY [EMPER #sp, EMPEE #emp]], CONT [HOOK #hook, RELS [LIST #list, LAST #last], HCONS [LIST #sclist, LAST #sclast]]], NON-LOCAL #nonloc], ORTH #stem, C-CONT mrs & [HOOK #hook, RELS diff-list & [LIST #list, LAST #middle], HCONS diff-list & [LIST #sclist, LAST #scmiddle]], ARGS < [IDIOM #idiom, SYNSEM [LOCAL [CTXT [C-INDICES [ADDRESSEE #add, SPEAKER #sp], EMPATHY [EMPER #sp, EMPEE #emp]], CONT [RELS [LIST #middle, LAST #last], HCONS [LIST #scmiddle, LAST #sclast]]], NON-LOCAL #nonloc], ORTH #stem] >]. }}} Note {{{[IDIOM #idiom]}}}. {{{ word2word-rule := j-sign & phrase-or-lexrule & [IDIOM #idiom, SYNSEM [LOCAL [CAT [HEAD #head], BAR #bar, CONT [HOOK #hook, RELS diff-list & [LIST #list, LAST #last], HCONS [LIST #hclist, LAST #hclast]], CTXT #ctxt], NON-LOCAL #nonloc, MODIFIED.PERIPH #per], ORTH #stem, INFLECTED +, J-NEEDS-AFFIX #aff, LMORPH-BIND-TYPE #lmorph, RMORPH-BIND-TYPE #rmorph, C-CONT [RELS [LIST #middle, LAST #last], HCONS [LIST #hcmiddle, LAST #hclast]], ARGS <[IDIOM #idiom, SYNSEM [LOCAL [CAT [HEAD #head], BAR #bar, CONT [HOOK #hook, RELS [LIST #list, LAST #middle], HCONS [LIST #hclist, LAST #hcmiddle]], CTXT #ctxt], MODIFIED.PERIPH #per, NON-LOCAL #nonloc], INFLECTED +, ORTH #stem, J-NEEDS-AFFIX #aff, LMORPH-BIND-TYPE #lmorph, RMORPH-BIND-TYPE #rmorph]>]. }}} Note {{{[IDIOM #idiom]}}}. === roots.tdl, script, user-fns.lsp, and globals.lsp === Just copy and paste the following descriptions. ==== roots.tdl ==== {{{ ; Used to determine on which candidate root edges to not apply the idiom checks ; (for efficiency) root_non_idiom := sign & [ IDIOM - ]. }}} ==== script ==== {{{ (read-tdl-type-files-aux (list (lkb-pathname (parent-directory) "mtr.tdl") )) (mt:read-transfer-rules (list (lkb-pathname (parent-directory) "idioms.mtr")) "Idiom Tests" :filter nil :task :idiom) }}} ==== user-fns.lsp ==== {{{ (defun idiom-complete-p (tdfs) (let* ((mrs (and (tdfs-p tdfs) (mrs::extract-mrs-from-fs (tdfs-indef tdfs)))) (transfers (and (mrs::psoa-p mrs) (mt:transfer-mrs mrs :task :idiom)))) (loop for transfer in transfers for mrs = (mt::edge-mrs transfer) thereis (loop for ep in (mrs:psoa-liszt mrs) when (idiom-rel-p ep) return nil finally (return t))))) (eval-when #+:ansi-eval-when (:load-toplevel :execute) #-:ansi-eval-when (load eval) (setf *additional-root-condition* #'idiom-complete-p)) }}} ==== globals.lsp ==== {{{ (defparameter *non-idiom-root* 'root_non_idiom ) }}} |
Please feel free to experiment here, after the four dashes below... and please do NOT create new pages without any meaningful content just to try it out!
Tip: Shift-click "HelpOnEditing" to open a second window with the help pages.
Formatting
italic bold typewriter
backtick typewriter (configurable)
preformatted
Linking
http://purl.net/wiki/moin/ [http://www.python.org/ Python]
Image Link
Lists
Bullet
- first
- nested and numbered
- numbered lists are renumbered
- second blockquote
- deeper
Glossary
- Term
- Definition
Drawing
drawing:mytest
Here is a new test
Idiom Implementation
This page describes how to implement idioms in a DELPH-IN Grammar, taking [wiki:JacyTop Jacy] as an example. The basic idea is that [wiki:LkbTop LKB] checks if all the constituents (or PREDs) of an idiom appear in an (MRS output of) sentence.
Overview of LKB's idiom detection mechanism
LKB has an idiom detection mechanism that is realized by making use of its Machine Translation mechanism.
The idiom detection mechanism is invoked after parsing a sentence. Then
It examines whether the sentence is specified as [IDIOM +]. Unless [IDIOM +], no further processing is invoked.
If [IDIOM +], the mechanism consults with idioms.mtr (and mtr.tdl), which is a list of idioms, to see if the sentence contains all the constituents of an idiom. If so, the sentence is accepted and certified as containing an idiom. On the contrary, if the sentence does not have all the constituents of an idiom even though it is [IDIOM +], it is rejected.
Example
Imagine that you want to implement a Japanese idiom, yaku-ni tatsu (part-DAT stand) "useful".
All the constituents of the idiom can be used independently and represent a literal meaning, but this particular combination only represents an idiomatic meaning.
Below are examples of how to use the literal meaning tatsu (a) and the idiomatic meaning tatsu (b).
Ken-ga butai-ni tatsu. (Ken-NOM stage-DAT stand) "Ken appears on stage."
Ken-ga yaku-ni tatsu. (Ken-NOM part-DAT stand) "Ken is useful."
Then, imagine that you want to have both the literal meaning tatsu (tatsu_lit, hereafter) and the idiomatic meaning tatsu (tatsu_idiom) in your grammar. And you also want tatsu_idiom to appear only when all the other constituents of the idiom, yaku-ni tatsu (part-DAT stand) "useful", appear in a sentence.
Implementation
You need following configuration.
List all idioms in idioms.mtr (and mtr.tdl).
For each idiom, add one of its constituents in the lexicon file. In the case of yaku-ni tatsu, tatu is newly entered into the lexicon as tatsu_idiom. So there are tatsu_lit and tatsu_idiom in the lexicon. Note that the newly entered lexical item for an idiom introduces [IDIOM +] into the feature structure of sentence.
Configure relevant rules in your grammar so that the [IDIOM +] goes up into the syntactic structure. This configuration would be grammar-dependent.
Configure roots.tdl, script, user-fns.lsp, and globals.lsp to invoke the idiom dtection mechanism.
idioms.mtr and mtr.tdl
Here is an example of how to list idioms in idioms.mtr (and mtr.tdl).
Below is the case of yaku-ni tatsu. Note that yaku-ni is the ARG2 of tatsu.
idioms.mtr
yaku+ni+tatsu := np_v_idiom_mtr & [ INPUT.RELS <! [ PRED "_tatsu_v_i_rel" ], [ PRED "_yaku_n_rel" ] !> ].
mtr.tdl
np_v_idiom_mtr := monotonic_mtr & [ INPUT.RELS <! [ LBL handle, ARG0 event, ARG1 ref-ind, ARG2 ref-ind & #arg2 ], [ LBL handle, ARG0 #arg2 ] !>, OUTPUT.RELS <! !> ].
The Lexicon
Here is the lexical entry for tatsu_idiom.
tatsu_idiom := v2-c-stem-lex & [SYNSEM.LKEYS [KEYREL [PRED '_tatsu_v_i_rel]], ORTH <! "立つ" !>, IDIOM +].
Note that v2-c-stem-lex is the lexical type for a transitive verb that takes a dative argument, and that this introduces [IDIOM +].
Configuration of Relevant Rules
Below are configurations of the grammar so that it lifts [IDIOM +] up into the syntax. Most of them would be Jacy-dependent.
matrix.tdl
sign := basic-sign & [ SYNSEM synsem, ARGS list, INFLECTED bool, ROOT bool, IDIOM bool].
Note that the IDIOM feature is introduced.
lex-rule := phrase-or-lexrule & word-or-lexrule & [ IDIOM #idiom, NEEDS-AFFIX bool, SYNSEM.LOCAL.CONT.RELS [ LIST #first, LAST #last ], DTR #dtr & word-or-lexrule & [ SYNSEM.LOCAL.CONT.RELS [ LIST #first, LAST #middle ], ALTS #alts, IDIOM #idiom], C-CONT.RELS [ LIST #middle, LAST #last ], ALTS #alts, ARGS < #dtr > ].
Note [IDIOM #idiom].
basic-unary-phrase := phrase & [ STEM #stem, IDIOM #idiom, SYNSEM.LOCAL.CONT [ RELS [ LIST #first, LAST #last ], HCONS [ LIST #scfirst, LAST #sclast ] ], C-CONT [ RELS [ LIST #middle, LAST #last ], HCONS [ LIST #scmiddle, LAST #sclast ] ], ARGS < sign & [ STEM #stem, SYNSEM.LOCAL local & [ CONT [ RELS [ LIST #first, LAST #middle ], HCONS [ LIST #scfirst, LAST #scmiddle ] ] ], ROOT -, IDIOM #idiom] > ].
Note [IDIOM #idiom].
basic-binary-phrase := phrase & [ IDIOM #idiom, SYNSEM.LOCAL.CONT [ RELS [ LIST #first, LAST #last ], HCONS [ LIST #scfirst, LAST #sclast ] ], C-CONT [ RELS [ LIST #middle2, LAST #last ], HCONS [ LIST #scmiddle2, LAST #sclast ] ], ARGS < sign & [ IDIOM #idiom, SYNSEM.LOCAL local & [ CONT [ RELS [ LIST #first, LAST #middle1 ], HCONS [ LIST #scfirst, LAST #scmiddle1 ] ] ], ROOT - ], sign & [ IDIOM #idiom, SYNSEM.LOCAL local & [ CONT [ RELS [ LIST #middle1, LAST #middle2 ], HCONS [ LIST #scmiddle1, LAST #scmiddle2 ] ] ], ROOT - ] > ].
Note [IDIOM #idiom].
fundamentals.tdl
lexical_sign-rule := lexical_sign & phrase-or-lexrule & [IDIOM #idiom, ARGS <[IDIOM #idiom]>].
rule-types.tdl
unary-type-super := phrasal_sign & [IDIOM #idiom, SYNSEM [LOCAL [CTXT [C-INDICES [ADDRESSEE #add, SPEAKER #sp], EMPATHY [EMPER #sp, EMPEE #emp]], CONT [HOOK #hook, RELS [LIST #list, LAST #last], HCONS [LIST #sclist, LAST #sclast]]], NON-LOCAL #nonloc], ORTH #stem, C-CONT mrs & [HOOK #hook, RELS diff-list & [LIST #list, LAST #middle], HCONS diff-list & [LIST #sclist, LAST #scmiddle]], ARGS < [IDIOM #idiom, SYNSEM [LOCAL [CTXT [C-INDICES [ADDRESSEE #add, SPEAKER #sp], EMPATHY [EMPER #sp, EMPEE #emp]], CONT [RELS [LIST #middle, LAST #last], HCONS [LIST #scmiddle, LAST #sclast]]], NON-LOCAL #nonloc], ORTH #stem] >].
Note [IDIOM #idiom].
word2word-rule := j-sign & phrase-or-lexrule & [IDIOM #idiom, SYNSEM [LOCAL [CAT [HEAD #head], BAR #bar, CONT [HOOK #hook, RELS diff-list & [LIST #list, LAST #last], HCONS [LIST #hclist, LAST #hclast]], CTXT #ctxt], NON-LOCAL #nonloc, MODIFIED.PERIPH #per], ORTH #stem, INFLECTED +, J-NEEDS-AFFIX #aff, LMORPH-BIND-TYPE #lmorph, RMORPH-BIND-TYPE #rmorph, C-CONT [RELS [LIST #middle, LAST #last], HCONS [LIST #hcmiddle, LAST #hclast]], ARGS <[IDIOM #idiom, SYNSEM [LOCAL [CAT [HEAD #head], BAR #bar, CONT [HOOK #hook, RELS [LIST #list, LAST #middle], HCONS [LIST #hclist, LAST #hcmiddle]], CTXT #ctxt], MODIFIED.PERIPH #per, NON-LOCAL #nonloc], INFLECTED +, ORTH #stem, J-NEEDS-AFFIX #aff, LMORPH-BIND-TYPE #lmorph, RMORPH-BIND-TYPE #rmorph]>].
Note [IDIOM #idiom].
roots.tdl, script, user-fns.lsp, and globals.lsp
Just copy and paste the following descriptions.
roots.tdl
; Used to determine on which candidate root edges to not apply the idiom checks ; (for efficiency) root_non_idiom := sign & [ IDIOM - ].
script
(read-tdl-type-files-aux (list (lkb-pathname (parent-directory) "mtr.tdl") )) (mt:read-transfer-rules (list (lkb-pathname (parent-directory) "idioms.mtr")) "Idiom Tests" :filter nil :task :idiom)
user-fns.lsp
(defun idiom-complete-p (tdfs) (let* ((mrs (and (tdfs-p tdfs) (mrs::extract-mrs-from-fs (tdfs-indef tdfs)))) (transfers (and (mrs::psoa-p mrs) (mt:transfer-mrs mrs :task :idiom)))) (loop for transfer in transfers for mrs = (mt::edge-mrs transfer) thereis (loop for ep in (mrs:psoa-liszt mrs) when (idiom-rel-p ep) return nil finally (return t))))) (eval-when #+:ansi-eval-when (:load-toplevel :execute) #-:ansi-eval-when (load eval) (setf *additional-root-condition* #'idiom-complete-p))
globals.lsp
(defparameter *non-idiom-root* 'root_non_idiom )