MatrixTDB is the regression test facility for the Grammar Matrix and the Matrix customization system. It allows us to create gold standard tsdb++ profiles on demand for language types defined in choices files.

High Level Overview

There are three main things you might want to do with MatrixTDB: put data in or get data out. Add new strings, add a language type, extract a profile for a language type. These three high-level tasks break down into smaller sub-tasks. The breakdown into sub-tasks is displayed here, while the Detailed Processes section of this page breaks each of those down into smaller tasks.

Adding New Strings Breakdown

Adding A Language Type Breakdown

Extracting a Profile Breakdown

Detailed Processes

This section describes step-by-step instructions on how to perform various tasks and sub-tasks with MatrixTDB.

Database Dump

If you're not sure of the effect of what you are about to do, you may want to make a dump of the database so that the data can be quickly restored if what you do doesn't go the way you want. To do this:

Note: This won't actually back up the data per se, but it will create a (very large) file full of SQL statements that can be used to restore the database to its state at the time of the dump.

Restore From Database Dump

If you want to revert the database to a previous point:

Create A Source Profile

Source profiles (sometimes also called 'original source profiles') are what are used to bring the big, hairy mrs semantics into the database. To create one:

Import A Source Profile

Source profiles (sometimes also called 'original source profiles') are what are used to bring the big, hairy mrs semantics into the database. When you have a [incr_tsdb()] profile that was created by processing a flat file of items you can use that to import a source profile into the database. To do so:

Add Permutes

At this point you will have imported a profile with its harvester strings. But a harvester string just yields to potentially millions of other possible strings with the same semantics. Specifically, each harvester string gives rise to seed strings which are then permuted and added to the database as string to be run through specific filters. (Earlier versions of MatrixTDB added all permutations which were run through universal filters and then specific filters, but more recently only those string/semantic tag combos that pass all universal filters are being added to the database.) Seed strings are stored in a canonical form: words in alphabetic order followed by prefixes in alphabetic order followed by suffixes in alphabetic order. Permutations are then every possible permutation of the words with every possible permutation of prefixes and suffixes on every word in every one of those permutations. Seed strings are generated from harvester strings by the stringmods in stringmod.py. Here is how to generate all the permutations for an imported original source profile:

Run Specific Filters

At this point you will have inserted every permutation that passes all universal filters for your source profile into the item_tsdb, parse, and result tables. At this point we need to record how the string/mrs combos fare when run through specific filters so that we can generate profiles for language types based on the the results of those runs through filters. Here's how:

Import A Language Type

Generate A Profile

Evaluate A Generated Profile In [incr_tsdb()]

Write A Stringmod

In stringmod.py, the subclasses of the StringMod class define ways in which a harvester string can be modified to create a seed string. All the stringmod instantiations that are applied to harvester strings are defined in the string_mods list near the end of the file. To create a new stringmod, you need to instantiate a new StringMod subclass and put it in that string_mods list. Note that every possible combination of the string_mods list is run to create seeds from harvester strings. For example, there is a mod that adds the word p-nom and another that adds the word p-acc. If these were the only two modifications, each harvester string would result in four seed strings: one with p-nom, one with p-acc, one with both, and one with neither.)

To create a new instance of a StringMod subclass, you need to determine the subclass to instantiate as well as how to set the member values. Here is an overview of the subclasses:

StringModAddWord - adds a word to the string StringModAddAff - adds an affix to the string (both as a prefix and in another string as an affix) StringModDropWord - removes a word from a string if present StringModChangeWord - changes a word in a string to another word. Not currently used.

Once you've chosen the subclass to instantiate, call its constructor in the string_mods list, settings its arguments as appropriate: mrs_id_list - a list of mrs_tags to which this stringmod applies. Common groupings of these tags are defined in g.py. word1 - for AddWord, AddAff, or DropWord, this is the word to add, the affix to add, or the word to drop. For ChangeWord it is the word to change. word2 - for ChangeWord only, this is the word to replace word1 with

Write A Filter

When you write a filter, first consider whether it is a universal filter or a specific filter. A universal filter is something that can rule out a string regardless of language type. A specific filter is something that can rule out a string but is dependent on language type. For example, ruling out any sentence where the subject or object follows the verb in an sov language type.

To create a new filter, you need to create a new instance of a Filter subclass in either u_filters.filter_list or s_filters.filter_list.

To create a new instance of a Filter subclass, you need to determine the subclass to instantiate as well as how to set the member values. Here is an overview of the subclasses and how they interact with their re1 and re2 regular expression members. These classes are defined in filters.py: FalseFilter - always returns fail MatchFilter - passes strings that contain re1, fails all others NotMatchFilter - fails strings that contain re1, passes all others IfFilter - fails strings that contain re1 but not re2, passes all others IfNotFilter - fails strings that contain both re1 and re2, passes all others OrFilter - passes strings that contain either re1, re2, or both, fails all others other existing subclasses have been deprecated due to redundancies and should not be used, though more may be created later

After setting the Filter subclass and the regular expression members re1 and re2, if appropriate, set the other members of the Filter:

name - Just make sure it's unique among all filters

mrs_id_list - This is a list of the mrs tags that this filter should apply to. For example, a filter that ensures that neg appears as a prefix or a suffix if inflectional negation is obligatroy would only apply to sentences with negative semantics. You can either create your own list of strings that are mrs tags or use one of the many lists created in g.py.

comment - Describe what the Filter does

fv - Only relevant to specific filters. The 'fv' stands for 'feature/value' and this is a formatted list of features and values that have to be set in a language type for a filter to be relevant to that language type. It has many aspects to it:

Definitions

item - Almost always used to mean a string representing a sentence. Sometimes used specifically to mean the item file in a profile or its MatrixTDB table counterpart, item_tsdb.

item/parse/result - A phrase used to mean a pairing of a sentence and meaning

profile - Refers to a [incr_tsdb()] profile

result - Has two meanings. The first relates to the result file in a [incr_tsdb()] profile and means the string/mrs pair that is a sentence and its meaning. The second meaning is the answer received (e.g., pass, fail) of running the first definition of a result through a filter.

source profile - Specifically refers to a profile that was created by processing a list of items in order to get the mrs semantics into MatrixTDB

MatrixTDBProcedures (last edited 2011-10-08 21:12:16 by localhost)

(The DELPH-IN infrastructure is hosted at the University of Oslo)