Week 23

Monitoring

Milen attended an USIT-internal workshop on monitoring and come home with great ideas for how to set up alerts when critical LAP services are in jeopardy; we still hope to see at least an initial set of monitors implemented before the summer break.

Job Resources

Milen & oe just put the new scheme for per-tool job defaults and custom resource specifications into production. The USIT-specific drmaa_usit.py now resides outside of the Galaxy tree, and the resource specifications from tools/resources.xml and tools/jobs.xml appear to be taking effect. One way of testing these is via the new ‘driver’ tool in the ‘test’ category, which simply invokes the top-level LAP driver and makes it report the run-time environment, e.g. LAP and SLURM environment variables.

OBT Action Plan

We (milen and eman) have met and decided on action plan in order to investigate the errors in OBT due to wrong and strange input. The goal of this plan is to pinpoint failures in one or more of the following components:

1) MTag 2) MTag wrapper 3) Rule Based Disambiguator 4) Rule Based Disambiguator wrapper 5) Statistical Disambiguator 6) Statistical Disambiguator wrapper.

To do so we have decided to use the Storting Data and parts of AVIS corpus as a test data. We have discussed also the possibility of using Norwegian Wikipedia as a source of more strange text input but decided to postpone this step till the moment that all of the test infrastructure in place and we have processed the other corpora. The action plan consists of the following steps:

1) Export to CG3 the input and output of each of the aforementioned components. 2) Log the output and error messages corresponding to each step. 3) Customize the lap script in order to handle a more verbose output of relevant information.

Week 24

Eman and Milen worked jointly on the OBT debugging sketched in week 23. We ran a full OBT workflow on some 15000 input files from the storting dataset and were able to find a new feature of OBT: for closing parentheses at the end of a sentence, the pos tag occupies the slot usually reserved for features, tricking LAP into thinking that the end-of-sentence symbol "<<<" is a pos and breaking sentence / token alignment.

LAP is now patched to work with this feature, with several improvements both in the obt wrappers and the Library (see commit messages for info).

We also started discussing avenues for overcoming the single mongodb bottleneck in LAP/Galaxy proper. For our backdoor LAP runs, we have now decided to launch N mongos (for instance 10) to improve performance. Milen is launching a new batch of jobs as I am writing this, we are hopeful we won't discover new OBT features on Monday.

Week 25

During this week the efforts were concentrated in 3 lines 1) OBT Debugging; 2) Implementing Monitoring tools; 3) Visualization

OBT Action Plan

We have finished the obt testing and concluded the following:

1) OBT does not report correctly the EOS tag in cases when the POS tag is missing. A bug fix for this problem was implemented and tested. It is not yet uploaded to the SVN until the other problem is fixed as well.

2) OBT stubles when processing the symbol U+85. Bug for this problem is not yet implemented. Milen and Stephan decided that we should replace it with the CR symbol before giving the text to OBT as input.

Overall we are satisfied with OBT robustness. The first bug was the cause of most of the errors in Storting corpus. On 15,000 examples it appeared around 300 times mainly due to the nature of the text. The second appeared seldom only 4 times out of 200,000 examples.

Monitoring Services

With the assistance of USIT-gid group we have managed to setup the monitoring infrastructure on at.hpc and lap-test.hpc.

Milen have created a template called "Monitor LAP" a template for monitoring a LAP installation. Inside this template we have created an Item called "Galaxy Server" that reports the presence of the Galaxy service by requesting an output when accessing the port 8080 on the corresponding machine. Accordingly a trigger "Galaxy Status" was set to produce an error in case the item reports that the server is not responding and an email notification "Galaxy Server is Down" was created in case there are failures. The whole procedure was tested on lap-test. The work proceeds here according to the plan set in Monitoring task

Visualization

The visualization of receipts has moved forward.

1) We have integrated brat in the Galaxy infrastructure

2) We have created a template page file that resides in the library/python/lap/visualization/ folder.

3) We have ported various CSS styles scripts from WESearch Infrastructure

4) We have visualized the morphology using BRAT.

5) We have decided to use CONLL-X style brat visualization for the morphology and dependency layers.

6) We have decided to use export similar style of creating the CONLL-X export of the structures in the mongo DB.

7) Fixed various bugs in lafdb.py. The bugs were in legacy code and did not influence the lap performance.

The action plan for this task was created in this rt thread Receipt Visualization

Week 26-27

Milen was still in vacation. I assume less effort was dedicated on the project during this time.

Week 28

Stephan and Eman in vacation. The graphical representation of the Tickets was completed during this period. Though still up for testing most of the code was finished during this week. Some considerations:

If we consider this version a good approximation we can push it to production.

Week 29

During this week Milen worked on service monitoring, diff and some code review. Stephan and Eman in vacation

Service Monitoring

We have implemented the following monitoring items: 1) Apache; 2) Galaxy; 3) Gold; 4) Memcached(Memcookie); 5) Mongo and 6) Mount of /projects

The first 5 are implemented by requesting their presence on the assigned ports. I have thought about some functional tests and started implementing them. The mount test is done by accessing the following file:

/projects/lap/production/galaxy/database/info.txt

I have investigated the issue of automatically restarting the services by sending commands to the server. This is a standard functionality of zabbix. But it is disabled by default by the zabbix administrators for security reasons. When I tried to enable it on lap-test the admin daemon turned it off. Overall I consider this a good safeguard. If a person gain access to the web interface he can control remotely the server using this functionality. If we consider that it is a good functionality to have I will explore other options.

Diff

I have made another diff of the version of Galaxy at at.hpc and Vanila Galaxy. The details is in email with subject "Diff of Changes in Galaxy"

AT.HPC Reboot

On Wednesday 20/07 I have executed a reboot of at.hpc to verify the fitness of our service scripts. The reboot went smoothly all the services came back online without problems.

Code Consolidation

I have looked at the task of unifying the code of the export (tsv.py) and visualization (receipt.py) though they are essentially performing similar task I have found that sufficient changes exists in order not to merge them in a common code for now. The main differences are: 1) Visualization uses CONLL-U vs export CONLL-X; 2) Handing the pos tags; 3) handing the head column (conll-x is _ conll-u is 0). 4) Stopping visualization after the 5th structure.The unification can happen but it will complicate the code more at this time. This will change once the final version of the visualization is in place.

Week 30

Stephan and Eman in vacation. No major events this week. Milen have done some code review and finally fixed the issue with OBT and the malicious character 85. The USIT specific galaxy code is now under git. Note to self I must grant Stephan access to it.

Nikolay was installing Lifeportal with Galaxy 16.01 on the new machine and I witness a bit of his struggle with it. Some notes:

* The 10.110.0.* network is not required for access to Abel. What you need is the host to see nilshenrik (nh.hpc). * Galaxy 16.01 was a major release with a lot of changes and a lot of problems (python path, new drmaa implementation). I would stay away from it in the future. If we upgrade galaxy I would go for the later releases this year.

On the monitoring side. I have made some functional tests (not only checking if the service is on but if it responds correctly) that will add to the others.

Week 31

Stephan and Eman are back from vacation. Milen have done some code review. The production version of the system was patched with the obt changes. Some coding changes were done to the receipt visualization. The monitoring interface had web request added to it. The week was full about what to do of discussions:

*) We have decided to retire the development server ps.hpc (old production) and once the new production comes online in September replace it with a new installation. *) In the mean time the ps.hpc will be used by Eman and Milen for test aimed at understanding the performance limitation of collections. *) Basic review of the visualization was performed. It was decided to visualize everything with brat except sentence (maybe). No decision was taken for what to visualize instead of pos when token are the only annotation. When lemmas are available and pos are not the lemmas will be visualized.

Week 32

Emanuele has been working on language identification.

Milen has worked on the the graphical representation:

Milen has also started studying the resa aligning code following a performance issue reported by Stephan in: RT Ticket

Week 33

Milen has continued his work on visualization. The new version of Brat with per element style visualization is now integrated. The performance issue of resa that we mentioned last week was investigated in the beginning if this week and postpone as not a priority.

eman:

Week 34

eman:

milen:

Milen have completed the visualization (except the dependency relations). The new visualization is based on traversing the LAF graf. The procedure is the following:

A day was wasted in discovering the aforementioned by Eman bug. But I a can safely declare that now I have a full understanding of the laf objects.

Week 35

Eman:

Milen:

Week 36

This week LAP group had a visitor: Richard Eckart de Castilho that presented the world of DKPro to us. DKPro is a community of projects focussing on re-usable Natural Language Processing software. We had two days meeting during which we discussed a cooperation between LAP and DKPro. We established the following:

Week 37 & 38

This week our effort was focused on two major tasks:

1) Preparation for public launch of the LAP portal. 2) Integration of DKPro in LAP.

The work is summarized as follows:

Eman:

Week 39

This week we worked on following activities:

Week 40

This week we worked on following activities:

Week 41

Milen completed the integration of LXF in DKPRO codebase:

Milen have also worked on RT Ticket. The following language identifier was used and produced reasonable results.

de.tudarmstadt.ukp.dkpro.core.langdetect.LangDetectLanguageIdentifier

Week 42

Milen have create a presentation for Gruppe for datafangst og samlingsforvaltning (DS). Their interest was in our usage of mongo inside LAP. Milen have also downloaded and started studying the metawrapper provided by Richard.

Week 43

Milen have worked on testing additional tools of DKPro and investigated their representation. Created a test for StanfordLammatizer in lap DKPro script. Fixed a bug in the DKPRO2LXF representation of LAP. Milen have started studying Groovy a python based language used by Richard in the dkpro-meta project.

Week 44

Milen has worked on meta representation creating the first dump of the Metadata for DKPro tools using the dkpro-meta project provided by Richard.

Eman: * Design and implementation of a word cloud-generator tool

Week 45

Milen has worked on meta representation. Some testing of the new export tool was done conclusions:

Stephan and Milen met and discussed the Meta Data representation of DKPRO and decided we will use temporary Milen's Solution for handling dependencies:

Week 46

Milen created a new dump of the metadata and sent report to Richard (Richard responded on Sunday).

Mongo performance. Milen looked at Mongo performance in the current version. The mongo default profiler was used. It work in the following way:

1) The profile is activated on the database. 2) It logs all the queries that took longer than 100 milliseconds to execute.

Milen have exported the 277625 tokens corpus with the profiler on using lap-test. The conclusions are the following:

Few queries went above this threshold. I will give their performance problem more on garbage collection than performance problems. A new variant of the export tool should be created that does not inquire the database so often. Milen is preparing a new version of the export tool that loads all the data in the memory.

Recurrent dependency analysis. Milen have investigated the recurrent analysis and created a small patch of the receipt visualization to abiltate it. A working version is on lap-test. The DKPRO interface delivered correctly the analysis. And lafdb did not have problems storing them. Test sentence:

Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas.

A new format for export is required for such analysis as discussed the previous week with Stephan.

Milen have started studying the metadata in DKPro but at the moment is more confused by it. Will try to formulate decent set of questions to Richard today.

Milen have fixed a bug in the drmaa usit script that did not handle correctly user request for resources above the allowed limit. We should probably have this limit visible in some way.

Week 47

Milen have worked on the dkpro metadata. Based on the discussion with Stephan from week 45 the tools were separated in directories with the corresponding jars.

Milen worked also on TSV export. Milen created a graph traversal common API. The API is used by both the visualization interface and the new tsv headless export. Two implementations of this interface were added: 1) Direct mongo access using queries. 2) Memory access - all the data from the receipt is loaded in the memory and stored in hashes. Preliminary inconclusive tests were made about performance.

Week 48

Milen have studied in more details how models are created in DKPRO. Milen have worked created Metadata and a model for Norwegian in UDIPipe. The change was committed together with some bug fixes to the DKPro git and approved by Richard. Milen started building a compile script for dkpro tools into laptree.

Week 47

Milen have finished the compile script for DKPro tools. Milen have prepared and submitted a pull request to dkpro for bugfixes in the DKPro build scripts of the tree-tagger models.

Week 48

Milen have worked on some refactoring of the UDIPipe model and updates to the compile script in order to accept the recent changes of the dkpro-meta.

============================================================================================================================================================

Happy 2017

============================================================================================================================================================

Week 3

Milen has worked not DKPRO Meta Data. An extension was made that associated a model to more than one class. This increased the correctly recognized tool model pairs.

With the updated meta scripts Milen have managed to make the link between dkpro_splitter-tokenizer and the dkpro tools. The prototype is running on lap-test. The compile scripts runs smoothly though a download of groovy package binary is needed as the one provided by RedHat 6 did not behaved in the expected manner and milen did not manage to make it behave. Milen will study the ways of Maven Groovy interaction to discover the secret ways in order to avoid the RH6 groovy.

The new Notur project N9452k for LAP is configured on lap-test. FYI the lap user is given access to project based on set of hack scripts that make a exception for it. The procedure is similar for the galaxy user (the lifeportal user) but the galaxy user is configured using a little bit more complicated sets of scripts as the it runs some of the user projects.

Milen have also investigated the trans object available for visualization of receipts. The trans object as pointed out by Stephan contains a various data inside (I assume even all of it). The names of the tools corresponding to tool_ids was extracted and updated in the visualization.

Week 4 and Week 5

Fixing issues in the DKPro Scripts as well worked on the issues described in the previous weeks.

Week 6

There has been a new version of the database schema in the making, developed by eman in a private branch. we would like to merge back these changes and put them into production during the scheduled Abel maintenance on february 27 and 28.

* Eman has identified necessary change in calling pyMongo and mergeed into the trunk * Milen has upgraded MongoDB on ‘lap-test’ to the Wired Tiger * Eman completeed library revisions and updates to all tool wrappers in the branch * Milen have made changes are required to visualization and exporting * All changes are put together in a version running on lap-test. * Consider development status on separate ticket for receipt revisions;

Week 7

Milen have worked on the DKPro integration. More than 30 new tools were installed and tested on lap-test. The following issues came in view:

1) TreeTagger is not working because of the kernel in RHEL6 is too old. Maybe I can find an older version or recompile the tree tagger binnary if i find a sorce

2) The tools from the following languages are used:

en no sv fi da de es it bg nl el hu ru sh pl pt ro cs hr sa tr sk

3) Tools like UdipPipe poss tagger provide both course tags and simple tags. At the moment we do not use the course tags. I assume some of it is needed for the parser although it worked without it. I propose to add the extra annotation to the morphology annotation.

4) ClearNlpPraser and UdPipe parser require that the tagger have produced lemma as well. In the moment the data that we pass to DKPro is the morphology node. If the lemma is in a separate node it will not be passed. I know from the metadata that this tools require lemma. So I can make the wrappers with special treatment for these two parsers. We must discuss.

5) What are we going to do with UdPipeSegmenter. In DKpro it is not finished as it does not contain offsets. Should I try to implement some aligner in DKPro ? The tool is assigned to me from Richard so we are supposed to propose a solution sooner or later.

Week 8

A preliminary version of the new reciept was implemented by Milen. It was a bit pointless as we decided later to change it so ... Anyway. Some improvements were done on the tools from the previous days. Mainly bugtesting. A new version of the export tool was officially installed on the production server. The new version does not require head annotation on the dependency nodes. The new version is sharing the same library with the visualization interface. Milen Stephan and Emanuele met. It was decided to overhaul the receipts to support multiple annotations from the same tool. More details later.

Week 9

Milen worked on DKPro. A new version of the of the DKPro UdipipeSegmenter wrapper was created inspired by some of the code of Richard. The code for now resides in the Milen's repository waiting for the rest of the DKPro changes. Milen started working on the reporting tool

Week 10

Milen installed the Nikolay reporting tool and started the galaxy reporting tools on lap-test. Both of them worked seamlessly. Some time was send in trying to correctly configure it as it was particular difficult between the version of postgrsql, python and pgsql. Milen tired to create his own reporting server. The idea is to use the reporting code to in order to use the Galaxy library in order to access information about jobs and users.

Week 11

Milen has switched his efforts from the reporting server to the real galaxy api. (https://docs.galaxyproject.org/en/latest/lib/galaxy.webapps.galaxy.api.html) The api allows you to access basic information from Galaxy through instantiation of objects and get requests.

Week 12-14

Milen have worked on development of a script for reporting. The script is installed in /home/laportal/report/generate-report.py

The script has the following options:

usage: reporting script [-h] [--start START] [--end END] [--date DATE]

optional arguments:

--config - Configuration file that gives the basic parameters: api_key and galaxy_key. The first key corresponds to the key of the administrator that will run the script. The second key corresponds to the key used by galaxy to encrypt the ids of the returned by the api. We need the second key to align the job ids returned by the api with the job ids stored in gold. There is a default config file that is stored in /home/laportal/report/config.ini. This file is accessible only to the laportal user.

--output specifies the file in which the report is written. If none is specified the system out is used.

The period for which a report is generated is determined by the --start, --end and --date options. If --start and --end are specified the report will be generated for the explicit period defined by them:

Example Queries:

'python generate_report.py' - generates the report for today.

'python generate_report.py --week=true' - generates the report for the current week

'python generate_report.py --month=true' - generates the report for the current month

'python generate_report.py --week=true --date=2017-03-1'- generates the report for the week that contains 1st of March 2017.

'python generate_report.py --month=true --date=2017-03-1'- generates the report for the month that contains 1st of March 2017.

'python generate_report.py --start=2017-03-1 --end=2017-03-5'- generates the report for the period from 1st of March 2017 to 5th of March 2017.

'python generate_report.py -month= true --end=2017-03-15'- generates the report for the period from 1st of March 2017 to 15th of March 2017.

Week 15

Small refinements of the report script was done. As well some minor tweaks on at.hpc to fix bugs:

1) The newly introduced export script did not correctly function on OBT.

2) The visualization interface did not correctly visualize a message when the job was in progress.

The patch for both bugs is done on at.hpc. They should be included in the version control.

Week 16

Milen worked on integrating the new version of UDIPipe in DKpro.

Week 17-19

Milen continue to work on the reporting system. The new script was separated into two scripts:

The reporting/fail script:

usage: reporting script [-h] [--start START] [--end END] [--date DATE]

optional arguments:

period:

reference:

If the script is used with the fail option will give a report of the failed jobs. The period for which the report is generated is the either selected by a start date and end date (the default end date is the day the script is called), or by a period related to a reference date where the (the default reference date is the day the script is called). The output of the report script is a json file that contains a set of all users and their activity in the reference period. Example data for an user is the fallowing json fragment:

"Gunn.Lyse@nsd.no": {

}

The record contains the basic information when the user was created and when his last activity was performed. The information of the slurm jobs performed by the user is stored into the gold part of the fragment. N.B. This list is the comprehensive list of all jobs submitted by the user. This is stored only for logging purposes and can be removed in the future. The jobs part of the fragment is separated into local and slurm jobs. The jobs here are only relevant to the period selected by the user of the reporting script.

If the reporting script is used with the --fail option it will output a list of fail jobs for the selected period. The output is similar to the default one but the jobs have additional "dump" field that contains the output/error log of the job.

The summary script works on the output of the report script. It generates a txt summaries about the tools and users in selected period.

usage: summary script [-h] [--start START] [--end END] [--date DATE] [--month]

positional arguments:

optional arguments:

period:

reference:

The script have the same period/reference specification of the period of interest. The idea is that the reporting script will generate an information for an extended period for example the last year and the summary script will give summary for each month/week etc. The summary script gives information about the users and the tools used in the reference period. An example activity report is the following text fragment.

id: 1 email: oe@ifi.uio.no last login: 2017-03-10T16:10:45.127999 created on: 2016-09-23T12:31:27.121080 This is a new user! For project gx_default Used time: 9:25:46(68 jobs) Remaining time: 416 days, 6:34:14 Local Jobs:

Remote Jobs:


The jobs are grouped by tool and are separated between the local and remote jobs. Th remote jobs are organized by project. In Lap we have the default project 'gx_default' only. There are two reports of the project. One from gold jobs and one from the jobs in the period according to galaxy. This is the reason why the two number are different. In the file attached they are also different as some of the jobs in the gold database come from ps.hpc.

An example for a summary for tools is the following text fragment:

tool: obt_disambiguator jobs: 119 users: 9 Local Jobs:

Remote Jobs:

The usage of the tool is summarized from all the jobs recorded for all the users. N.B. the two times reported for the remote jobs correspond to the total time used. This shows that often the users will wait more time for their job to start than the job execution itself.

LapDevelopment/Blog (last edited 2017-05-18 09:14:23 by MilenKouylekov)

(The DELPH-IN infrastructure is hosted at the University of Oslo)