Skip to content

SaarlandMweDiscussion

FrancisBond edited this page Aug 1, 2013 · 1 revision

Discussion of MWEs, inspired by Ann's participation in PARSEME.

We have started a new page on MWEs (MweTop) to which we will link various relevant things.

What do we do now

  • things with spaces

    • some interfacing with morphology
  • things made into a single predicate

    • look up
  • things recognized as larger things with idiom matchin

    • determiner-less PPs in hospital

      • also occurs outside/slightly genericy
      • semi-lexicalized
    • idiom thingies keep tabs on

      • some work at NTU/CSLI on possessive idioms

      + note: not marked as a unit in the MRS output + supported by LKB and ACE

    • different types of idiomaticity detless_pp vs flexible idioms

    • we have paraphrase rules for many of these

      • but not perfect out of your tiny mind
  • how does the interface with chart mapping/tokenization

  • what about the idiomatic/non-idiomatic distinction

    • we don't enforce it perfectly
  • we maybe have more examples of MWEs with structure than anyone else

    • although we don't have as many examples as e.g. in wordnet
  • SRG: words with spaces, verb+particle, idioms (take into account)

  • Matrix: no idioms (FCB: there is documentation on the wiki)

  • NorSource: not yet

  • Burger: some types for verb+complement

  • Jacy: all kinds, even documentation

  • Hegram: nothing

  • MCG: nothing

    • Chengyu (four character idioms)
      • treat them as non-compositional
      • NTU has a list of these with some more information (with help from Mike and Ning)
      • there are also non-Chengyu idioms
  • we can have both internal and external modification (for some idioms)

    • the cat kicked all nine buckets (Mike)
  • a lot of regional use

  • treat proverbial the same as fucking (can go anywhere)

  • in general adding MWEs adds ambiguity so we tend not to add them

    • if they help in parse-selection it would be worth putting them in

    • even very common things like Thank you and good morning

Things we don't have an account for:

  • institutionalized phrases traffic light/traffic signal

  • light verbs/light verby idioms give a rat's arse [about]

  • proverbs --- how do we handle these a stitch in time saves nine)

    • interestingly cross-lingually
    • often contains frozen bits of older grammars
  • fixed foreign phrases (que sera sera)

    • interesting to see if there are differences
      • in flexibility between old English vs foreign
  • NPIs are on the edge of this phenomenon

  • things like you may wish to -> you should (post-process)

  • If you like currently words-with-space in ERG

Other projects

  • MWEs with structure in wordnets
  • Lots of work in Japan, e.g. on idiom/literal (Chikara)
Clone this wiki locally