Revision 2 as of 2006-06-02 03:08:14

Clear message

Please feel free to experiment here, after the four dashes below... and please do NOT create new pages without any meaningful content just to try it out!

Tip: Shift-click "HelpOnEditing" to open a second window with the help pages.


Formatting

italic bold typewriter

backtick typewriter (configurable)

preformatted

Linking

HelpOnEditing InterWiki

http://purl.net/wiki/moin/ [http://www.python.org/ Python]

someone@the.inter.net

http://c2.com/sig/wiki.gif

Lists

Bullet

Glossary

Term
Definition

Drawing

drawing:mytest

Here is a new test

TableOfContents

Idiom Implementation

This page describes how to implement idioms in a DELPH-IN Grammar, taking [wiki:JacyTop Jacy] as an example. The basic idea is that [wiki:LkbTop LKB] checks if all the constituents (or PREDs) of an idiom appear in an (MRS output of) sentence.

Overview of LKB's idiom detection mechanism

LKB has an idiom detection mechanism that is realized by making use of its Machine Translation mechanism.

The idiom detection mechanism is invoked after parsing a sentence. Then

  1. It examines whether the sentence is specified as [IDIOM +]. Unless [IDIOM +], no further processing is invoked.

  2. If [IDIOM +], the mechanism consults with idioms.mtr (and mtr.tdl), which is a list of idioms, to see if the sentence contains all the constituents of an idiom. If so, the sentence is accepted and certified as containing an idiom. On the contrary, if the sentence does not have all the constituents of an idiom even though it is [IDIOM +], it is rejected.

Example

Imagine that you want to implement a Japanese idiom, yaku-ni tatsu (part-DAT stand) "useful".

All the constituents of the idiom can be used independently and represent a literal meaning, but this particular combination only represents an idiomatic meaning.

Below are examples of how to use the literal meaning tatsu (a) and the idiomatic meaning tatsu (b).

    1. Ken-ga butai-ni tatsu. (Ken-NOM stage-DAT stand) "Ken appears on stage."

    2. Ken-ga yaku-ni tatsu. (Ken-NOM part-DAT stand) "Ken is useful."

Then, imagine that you want to have both the literal meaning tatsu (tatsu_lit, hereafter) and the idiomatic meaning tatsu (tatsu_idiom) in your grammar. And you also want tatsu_idiom to appear only when all the other constituents of the idiom, yaku-ni tatsu (part-DAT stand) "useful", appear in a sentence.

Implementation

You need following configuration.

idioms.mtr and mtr.tdl

Here is an example of how to list idioms in idioms.mtr (and mtr.tdl).

Below is the case of yaku-ni tatsu. Note that yaku-ni is the ARG2 of tatsu.

idioms.mtr

yaku+ni+tatsu := np_v_idiom_mtr &
  [ INPUT.RELS <! [ PRED "_tatsu_v_i_rel" ],
                  [ PRED "_yaku_n_rel" ] !> ].

mtr.tdl

np_v_idiom_mtr := monotonic_mtr &
  [ INPUT.RELS <! [ LBL handle,
                    ARG0 event,
                    ARG1 ref-ind,
                    ARG2 ref-ind & #arg2 ],
                [ LBL handle,
                  ARG0 #arg2 ] !>,
    OUTPUT.RELS <! !> ].

The Lexicon

Here is the lexical entry for tatsu_idiom.

tatsu_idiom := v2-c-stem-lex &
 [SYNSEM.LKEYS [KEYREL [PRED '_tatsu_v_i_rel]],
  ORTH <! "立つ" !>,
  IDIOM +].

Note that v2-c-stem-lex is the lexical type for a transitive verb that takes a dative argument, and that this introduces [IDIOM +].

Configuration of Relevant Rules

Below are configurations of the grammar so that it lifts [IDIOM +] up into the syntax. Most of them would be Jacy-dependent.

matrix.tdl

sign := basic-sign &
  [ SYNSEM synsem,
    ARGS list,
    INFLECTED bool,
    ROOT bool,
    IDIOM bool].

Note that the IDIOM feature is introduced.

lex-rule := phrase-or-lexrule & word-or-lexrule &
  [ IDIOM #idiom,
    NEEDS-AFFIX bool,
    SYNSEM.LOCAL.CONT.RELS [ LIST #first,
                             LAST #last ],
    DTR #dtr & word-or-lexrule &
        [ SYNSEM.LOCAL.CONT.RELS [ LIST #first,
                                   LAST #middle ],
          ALTS #alts,
          IDIOM #idiom],
    C-CONT.RELS [ LIST #middle,
                  LAST #last ],
    ALTS #alts,
    ARGS < #dtr > ].

Note [IDIOM #idiom].

basic-unary-phrase := phrase &
  [ STEM #stem,
    IDIOM #idiom,
    SYNSEM.LOCAL.CONT [ RELS [ LIST #first,
                               LAST #last ],
                        HCONS [ LIST #scfirst,
                                LAST #sclast ] ],
    C-CONT [ RELS [ LIST #middle,
                    LAST #last ],
             HCONS [ LIST #scmiddle,
                     LAST #sclast ] ],
    ARGS < sign & [ STEM #stem,
                    SYNSEM.LOCAL local &
                                 [ CONT [ RELS [ LIST #first,
                                                 LAST #middle ],
                                          HCONS [ LIST #scfirst,
                                                  LAST #scmiddle ] ] ],
                    ROOT -,
                    IDIOM #idiom] > ].

Note [IDIOM #idiom].

basic-binary-phrase := phrase &
  [ IDIOM #idiom,
    SYNSEM.LOCAL.CONT [ RELS [ LIST #first,
                               LAST #last ],
                        HCONS [ LIST #scfirst,
                                LAST #sclast ] ],
    C-CONT [ RELS [ LIST #middle2,
                    LAST #last ],
             HCONS [ LIST #scmiddle2,
                     LAST #sclast ] ],
    ARGS < sign & [ IDIOM #idiom,
                    SYNSEM.LOCAL local &
                                 [ CONT [ RELS [ LIST #first,
                                                 LAST #middle1 ],
                                          HCONS [ LIST #scfirst,
                                                  LAST #scmiddle1 ] ] ],
                    ROOT - ],
           sign & [ IDIOM #idiom,
                    SYNSEM.LOCAL local &
                                 [ CONT [ RELS [ LIST #middle1,
                                                 LAST #middle2 ],
                                          HCONS [ LIST #scmiddle1,
                                                  LAST #scmiddle2 ] ] ],
                    ROOT - ] > ].

Note [IDIOM #idiom].

fundamentals.tdl

lexical_sign-rule := lexical_sign & phrase-or-lexrule &
 [IDIOM #idiom,
  ARGS <[IDIOM #idiom]>].

rule-types.tdl

unary-type-super :=   phrasal_sign &
               [IDIOM #idiom,
                SYNSEM [LOCAL [CTXT [C-INDICES [ADDRESSEE #add,
                                                SPEAKER #sp],
                                     EMPATHY [EMPER #sp,
                                              EMPEE #emp]],
                               CONT [HOOK #hook,
                                     RELS [LIST #list,
                                            LAST #last],
                                     HCONS [LIST #sclist,
                                             LAST #sclast]]],
                        NON-LOCAL #nonloc],
                ORTH #stem,
               C-CONT mrs & [HOOK #hook,
                              RELS diff-list & [LIST #list,
                                                   LAST #middle],
                              HCONS diff-list & [LIST #sclist,
                                                    LAST #scmiddle]],
                ARGS <  [IDIOM #idiom,
                         SYNSEM [LOCAL [CTXT [C-INDICES [ADDRESSEE #add,
                                                                   SPEAKER #sp],
                                                           EMPATHY [EMPER #sp,
                                                                    EMPEE #emp]],
                                                     CONT [RELS [LIST #middle,
                                                                  LAST #last],
                                                           HCONS [LIST #scmiddle,
                                                                   LAST #sclast]]],
                                              NON-LOCAL #nonloc],
                                      ORTH #stem] >].

Note [IDIOM #idiom].

word2word-rule := j-sign & phrase-or-lexrule &
            [IDIOM #idiom,
             SYNSEM [LOCAL [CAT [HEAD #head],
                            BAR #bar,
                            CONT [HOOK #hook,
                                  RELS diff-list & [LIST #list,
                                                    LAST #last],
                                  HCONS [LIST #hclist,
                                          LAST #hclast]],
                            CTXT #ctxt],
                     NON-LOCAL #nonloc,
                     MODIFIED.PERIPH #per],
             ORTH #stem,
             INFLECTED +,
             J-NEEDS-AFFIX #aff,
             LMORPH-BIND-TYPE #lmorph,
             RMORPH-BIND-TYPE #rmorph,
             C-CONT [RELS [LIST #middle,
                            LAST #last],
                     HCONS [LIST #hcmiddle,
                              LAST #hclast]],
             ARGS <[IDIOM #idiom,
                    SYNSEM [LOCAL [CAT [HEAD #head],
                                   BAR #bar,
                                   CONT [HOOK #hook,
                                         RELS [LIST #list,
                                                LAST #middle],
                                         HCONS [LIST #hclist,
                                                 LAST #hcmiddle]],
                                   CTXT #ctxt],
                            MODIFIED.PERIPH #per,
                            NON-LOCAL #nonloc],
                    INFLECTED +,
                    ORTH #stem,
                    J-NEEDS-AFFIX #aff,
                    LMORPH-BIND-TYPE #lmorph,
                    RMORPH-BIND-TYPE #rmorph]>].

Note [IDIOM #idiom].

roots.tdl, script, user-fns.lsp, and globals.lsp

Just copy and paste the following descriptions.

roots.tdl

; Used to determine on which candidate root edges to not apply the idiom checks
; (for efficiency)
root_non_idiom := sign &
  [ IDIOM - ].

script

(read-tdl-type-files-aux
     (list (lkb-pathname (parent-directory) "mtr.tdl") 
))

(mt:read-transfer-rules 
 (list
  (lkb-pathname (parent-directory) "idioms.mtr"))
 "Idiom Tests"
 :filter nil :task :idiom)

user-fns.lsp

(defun idiom-complete-p (tdfs)
  (let* ((mrs (and (tdfs-p tdfs)
                   (mrs::extract-mrs-from-fs (tdfs-indef tdfs))))
         (transfers (and (mrs::psoa-p mrs)
                         (mt:transfer-mrs mrs :task :idiom))))
    (loop
        for transfer in transfers
        for mrs = (mt::edge-mrs transfer)
        thereis (loop
                    for ep in (mrs:psoa-liszt mrs)
                    when (idiom-rel-p ep) return nil
                    finally (return t)))))

(eval-when #+:ansi-eval-when (:load-toplevel :execute)
           #-:ansi-eval-when (load eval)
  (setf *additional-root-condition* #'idiom-complete-p))

globals.lsp

(defparameter *non-idiom-root*
    'root_non_idiom )
(The DELPH-IN infrastructure is hosted at the University of Oslo)