Overview

The LOGON Benchmarking Suite is a collection of scripts to execute and time various tasks that are representative of the LOGON MT system (and the larger collection of DELPH-IN NLP tools). The suite is intended as a cross-platform performance meter.

Benchmarking Tasks

The benchmark suite comprises the following tasks:

To date, there are no tasks that are heavy on (disk) i/o, nor are we executing any large-core processes. The above were designed so as to run on relatively dated equipment with small-ish amounts of physical memory. However, we should probably add a large-core SVM run (for select nodes) and possibly also something reading and writing a disk-based (AllegroCache) BTree of at least several gigabytes. Finally, it would seem tempting to include end-to-end processing through the LOGON pipeline, say on the base test suite, because that will exercise context switching heavily (and is likely to benefit from hyperthreading and multi-core cpus).

Running the Suite

Execution of individual tasks, parallelization, and general control are all implemented in a simple shell script. To run the complete benchmark suite (see below), use:

  $LOGONROOT/uio/test/run

Alternatively, the script accepts a command-line option --count to enable parallelization and an option --32 to run 32-bit binaries on 64-bit kernels; furthermore, a task identifier can be provided as the final command-line argument. To execute a four-way run of the 32-bit tadm task, for example, use:

  $LOGONROOT/uio/test/run --32 --count 2 tadm

By default, each task is executed four times, using a round-robin strategy: thus, execution of other tasks is interleaved with the repetition of a token task. Upon completion of each task, its output is validated against the expected result, and the log file is marked invalid in case validation fails.

The default parallelization tests will execute all tasks (though skipping the 32-bit runs on 64-bit installations) again, running two, four, eight, et al. processes (executing the same task) concurrently. The maximum parallelization factor is determined from /proc/cpuinfo as the number of cpus reported by the kernel. Thus, for a dual-cpu (single-core) node with hyperthreading enabled, parallel tests will execute both two-way and four-way runs.

Benchmarking Results

All timing results are in wall-clock seconds, i.e. reflecting real time for the completion of each task. Execution times reported are averaged over four runs for each task. Typically, the benchmarking suite should be run on otherwise unloaded nodes, so as to make sure that real time for task execution is not affected by other, simultaenous demands on the node.

Following are timing results for parallel execution, again averaged over four runs runs per task. In this set-up, the wall-clock time for a parallel task is counted until the completion of all sub-tasks.

Finally, looking further at a somewhat unusual hardware configuration:

Discussion

Acknowledgements

LogonTest/BenchmarkingSuite (last edited 2011-10-08 21:12:10 by localhost)

(The DELPH-IN infrastructure is hosted at the University of Oslo)