TSQL: Test Suite Query Language
TSQL is generally used for selecting rows or columns of data from test suites themselves, although it also has faculties for inspecting or setting [incr tsdb()] variables and for inserting data into test suites. Some examples of select queries include:
Find the item IDs and grammatical sentences that didn't parse:
select i-id i-input where i-wf = 1 and readings = 0
Pair input sentences with their MRS outputs:
select i-input mrs
Find sentences with fewer than 10 words:
select i-input where i-length < 10
1 Query := ( Info | Set | Retrieve | Insert ) ( '.' | $ ) 2 Info := 'info' ( 'all' | 'relations' | Relation | TsdbConstant | TsdbVariable ) 3 Set := 'set' TsdbVariable ( Integer | String | ':on' | ':off' ) 4 Retrieve := ( 'retrieve' | 'select' ) SelectBody 5 SelectBody := ( Attribute+ | '*' ) 6 [ 'from' Relation+ ] 7 [ 'where' Disjunction ] 8 [ 'report' FormatString ] 9 Insert := 'insert' 'into' Relation [ Attribute+ ] 10 'values' ( Integer | String | DateTime )+ 11 12 TsdbConstant := 'home' 13 | 'tsdb_home' 14 | 'relations-file' 15 | 'tsdb_relations_file' 16 | 'data-path' 17 | 'tsdb_data_path' 18 TsdbVariable := 'result-path' 19 | 'tsdb_result_path' 20 | 'result-prefix' 21 | 'tsdb_result_prefix' 22 | 'max-results' 23 | 'tsdb_max_results' 24 | 'uniquely-project' 25 | 'tsdb_uniquely_project' 26 27 Relation := Identifier 28 Attribute := Identifier 29 30 Disjunction := Conjunction ( '|' | '||' | 'or' Conjunction )* 31 Conjunction := Condition ( '&' | '&&' | 'and' Condition)* 32 Condition := Attribute ( '=' | '==' | '!=' | '~' | '!~' ) String 33 | Attribute ( '=' | '==' | '!=' | '<' | '>' | '<=' | '>=' ) ( Integer | DateTime ) 34 | ( '!' | 'not' ) Condition 35 | '(' Disjunction ')' 36 37 FormatString := String 38 39 Integer := /[+-]?[0-9]+/ 40 Digit := /[0-9]+] 41 42 String := /"([^"\\]|\\.)*"/" 43 | /'([^'\\]|\\.)*'/ 44 45 DateTime := Date [ ( Time | '(' Time ')' ) ] 46 | [ ':' ] 'today' 47 | [ ':' ] 'now' 48 Date := FullYear Month [ '-' Day ] 49 | [ Day '-' ] Month '-' Year 50 FullYear := Digit Digit Digit Digit 51 Year := [ Digit Digit ] Digit Digit 52 Month := [ Digit ] Digit 53 | MonthName 54 MonthName := 'jan' | 'feb' | 'mar' | 'apr' | 'may' | 'jun' 55 | 'jul' | 'aug' | 'sep' | 'oct' | 'nov' | 'dec' 56 Day := [ Digit ] Digit 57 Time := Digit Digit ':' Digit Digit [ ':' Digit Digit ] 58 59 Identifier := Character ( Character | Digit | '-' | '_' )+ 60 Character := /[a-zA-Z]+/
Notes on the Syntax
Different publications and implementations have some differences in the syntax. Some of those differences are listed here, excepting obvious bugs in the BNFs, as well as some general notes about the query language.
Some literals in the syntax (e.g., select, info, or from) are case-insensitive, but relation (table) names, attributes (column names), and some other literals (e.g., all, relations, tsdb_home, etc.) are case-sensitive. The names of months in dates are case-insensitive.
The TSNLP version of TSQL does not define or accept YYYY-MM-DD dates, nor does it define MonthName forms (although it appears to accept them). Note that MonthName may be locale-dependent.
FormatString does not appear to be defined or even described anywhere in prior literature. From my experiments, it seems to be a printf string with format specifiers (%s, %d, or %i) used to insert column values (though it does not appear to matter which is used; e.g., %d could be used for a :string value, etc.). For instance, a query like:
select i-id i-input i-date
...could use up to three format specifiers (any more may trigger an error), such as:
If fewer than three format specifiers are used, the remaining data is output in [incr tsdb()] table format.
Order of Operations in Conditions
The TSNLP definition of TSQL syntax does not specify an order of operations for boolean operations on conditions. In the BNF above, I specifically scoped disjunctions over conjunctions, which seems to follow the behaviour of LOGON's tsdb -query utility. Thus, the following query would match items with i-id = 10 whether or not its i-input matched [Dd]og:
select i-input where i-id = 10 or i-id = 20 and i-input ~ "[Dd]og"
Parentheses can be used to override this order:
select i-input where (i-id = 10 or i-id = 20) and i-input ~ "[Dd]og"
The PyDelphin implementation of TSQL as of 2018-10-05 only includes select queries without report clauses. It differs from the LOGON tsdb -query utility in some ways:
select * requires a from clause
select * from item result does not also include columns from parse (the "intervening" table joined in order to join item and result)
select * from item where readings > 0 does not include columns from parse (where readings is defined)
PyDelphin also adds a couple features:
Attributes in the projection (column list) and where clause of the query may optionally use qualified names to specify both the table and column (e.g., result.parse-id)
Multiple where clauses may be used which act as a conjunction scoped over disjunctions. Thus, the following is the same as the parenthesized form in the "Order of Operations in Conditions" section above.
select i-input where i-id = 10 or i-id = 20 where i-input ~ "[Dd]og"
Multiple where clauses are useful for assembling queries. A user may add an additional global constraint by appending a new where clause without having to worry about altering the order of operations of existing conditions.
1. adding support for left/right/inner joins. See https://github.com/delph-in/pydelphin/issues/321.