Skip to content
BerndKiefer edited this page May 16, 2006 · 45 revisions

Overview

The PET system for efficient processing of unification-based grammars is an industrial strength implementation of the typed feature structure formalism used in [http://www.delph-in.net DELPH-IN] grammars. PET reads the exact same source files (modulo some configuration options) as the [http://www.delph-in.net/lkb/ LKB] grammar development environment and produces identical results. In a nutshell, PET can be viewed as a high-efficiency batch processing and application delivery engine, while the LKB mainly targets interactive grammar development.

PET was originally developed by UlrichCallmeier at DFKI GmbH and Saarland University, and some of its design is documented in his [http://www.coli.uni-sb.de/~uc/thesis/thesis.pdf 2001 MSc thesis]. The software subsequently served to build a commercial email auto response product (by YY Technologies, Mountain View, CA), ported to Windows NT, generally `hardened' (eliminating memory leakage, increasing robustness to exceptional situations, et al.), and extended in functionality and interfaces (including UniCode support, unknown word support, server and API library modes, lattice input, and initial MRS support); most of this work was done by Ulrich with help from StephanOepen and BerndKiefer (of DFKI). As part of the EU-funded [http://www.project-deepthought.net/ Deep Thought] project, Ulrich and Stephan later added support for subsumption-based ambiguity factoring (giving a significant improvement in parsing efficiency for long inputs), facilities to rank alternate parses according to a statistical (Maximum Entropy) parse selection model (which, typically, one would obtain using the [http://www.delph-in.net/redwoods Redwoods] tools and a hand-constructed treebank), and the ability to compile in the (Common-Lisp) MRS code base also used in the LKB, thus enabling output of (R)MRSs in various standard formats.

Towards the end of 2003, Ulrich retired from active PET development, and Bernd has since been the main developer (with occasional help from others, specifically FrederikFouvry of Saarland University and Stephan). PET has seen a range of substantial additions in functionality since, including the ability to add (leaf) types at run-time, output fragmentary analysis hypotheses in case of parse failures, and an XML-based input format that generalizes the lattice-oriented YY input mode.

Obtaining PET

The different branches of PET source code that existed until lately have now been merged into one main branch. Instead of the old perforce repository, there is now a trec development directory at [http://pet.opendfki.de]. There you can find the most recent tarball distribution as well as the current state of the development tree. Instructions on how to access the subversion repository directly using a subversion client can also be found there.

Please use the trec page to report bugs and feature request via the integrated ticketing system.

Compiling and Installing PET

As of February 2005, there are no binary distributions of PET, and users should expect to compile their own executable files from source code (with the one exception of the flop binary for the older main and oe branches). Current PET development is always exclusively carried out on Linux (x86.32) environments, hence most (reasonably) recent Linux distributions should work well. PET ports for Solaris (sparc, using gcc) and Windows (x86.32, using either CygWin or Borland C++) used to be supported, and in principle any platform for which a suitable C++ compiler is available (and for which external libraries used by PET exist) should allow successful compilation. Your mileage may vary.

In order to compile PET with complete functionality, a number of external packages (PetDependencies) need to be installed; in general, see the documentation for each of these packages, but some coarse instructions on versions that are known to work are available from the PetDependencies page. Compiling without some of these packages should also be possible (giving up, for example, UniCode support, [incr tsdb()] integration, or the embedded MRS code), although these configurations have not been tested for quite some time. See the compile-time parameters -DICU, -DTSDBAPI, and -DECL).

The current main source branch uses GNU autoconf and hence requires less manual configuration than earlier versions. Call the configure script as follows:

 ./configure --with-tsdb=~/lkb --with-eclmrs=/usr/local --with-icu=/usr/local --with-xml=/usr/local --with-mrsdir=~/lkb/

Configure should tell you whether all the necessary libraries can be found, and then you can compile flop and cheap with make. There is no need for a seperate make mrs.

For the oe branch, in order to compile the cheap parser executable, editing the Makefile in the cheap/ sub-directory will be needed. Near the top of the file are three variables that need to be adjusted to reflect the installation directories for PetDependencies packages, e.g.

  #
  # site-specific directory settings
  #
  ICUROOT = /usr/local
  ECLROOT = /usr/local
  LKBROOT = /home/oe/src/delphin/lkb

Once these settings corresponds to the local directory structure, the following should re-build the compiled library of MRS (Common-Lisp) code from the LKB (within the LKB directory tree, e.g. in lkb/lib/linux.x86.32 on Linux x86.32) and generate a link rmrs.h (in the current directory, i.e. `cheap/' of the PET tree), pointing to the auto-generated header file for the library:

  make mrs

Watch compilation messages (from ECL and gcc) carefully and confirm that the library and header file were correctly built before moving on to compiling cheap itself:

  make depend
  make

The latter step should result in a dynamically-linked binary cheap that implements the PET run-time parser.

Running PET

The PET software has been used in a range of projects (and one commercial product), using grammars of several languages. There is a relatively large number of options and run-time parameters that allow customization of PET behavior to various tasks. Maybe the biggest factor of variation is in (a) how input to the cheap parser is prepared for PET-internal processing and in (b) what form analysis results are output (or returned to the caller) after parsing; these are discussed on separate PetInput and PetOutput pages, respectively. Many other aspects of PET run-time behavior can be controlled using command-line options (see the PetOptions page), given to the flop or cheap binaries upon invocation, and grammar-specific settings (see the PetParameters page), supplied in TDL syntax as part of each grammar. Finally, when using PET as a processing client to the [incr tsdb()] [http://www.delph-in.net/itsdb/ profiler], some of the options and parameters are controlled from within the [incr tsdb()] environment.

Tips and Tricks

The PET build process attempts to set appropriate mmap setting for your architecture. However, this automation is not always successful. If on running flop or cheap you get an error message like

alloc: no space (up = b7f35000d, down = b7f35000d)
terminate called after throwing an instance of 'tError'
Aborted

then you should try changing your mmap settings, followed by recompilation. If you look in common/chunk-alloc.cpp, you will find a section like:

#define _MMAP_ANONYMOUS
#define _CORE_LOW  0x50000000
#define _CORE_HIGH 0xbf429fff
#define _MMAP_DOWN

Simply removing the line

#define _MMAP_DOWN

works on an IBM T41 laptop running SuSE 9.3. But trial and error may be necessary!

Mmap errors are likely kernel-specific, rather than tied to a particular linux-distro. The above SuSE 9.3 setting also works under Ubuntu 5.10 with kernel >= 2.6.10 and has been tested on an IBM Thinkpad T42 and a Dell Precision 4100.

PET Bugs

  • PET 0.99.11 + ECL 0.9h Compiler Warnings (GCC 4.0.2, Ubuntu 5.10)

32-bit machine

dag-tomabechi.cpp: In function ‘void dag_print_rec_safe(FILE*, dag_node*, int, bool, int)’:
dag-tomabechi.cpp:1573: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘long int’
dag-tomabechi.cpp:1574: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘long int’
dag-tomabechi.cpp: In function ‘dag_node* dag_expand_rec(dag_node*)’:
dag-tomabechi.cpp:1710: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘long int’
dag-tomabechi.cpp: In function ‘bool dag_valid_rec(dag_node*)’:
dag-tomabechi.cpp:1744: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘long int’
dag-tomabechi.cpp:1752: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘long int’
dag-tomabechi.cpp:1769: warning: format ‘%x’ expects type ‘unsigned int’, but argument 4 has type ‘long int’

petmrs.c: In function ‘ecl_decode_string’:
petmrs.c:36: warning: pointer targets in return differ in signedness

64-bit machine

flop.cpp: In function ‘void mem_checkpoint(char*)’:
flop.cpp:72: 警告: format ‘%d’ expects type ‘int’, but argument 3 has type ‘long unsigned int’
flop.cpp:72: 警告: format ‘%d’ expects type ‘int’, but argument 4 has type ‘long unsigned int’

full-form.cpp: In function ‘void read_morph(std::string)’:
full-form.cpp:202: 警告: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’

../common/chunk-alloc.cpp: In member function ‘void chunk_allocator::print_check()’:
../common/chunk-alloc.cpp:139: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 2 has type ‘size_t’
../common/chunk-alloc.cpp:139: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
../common/chunk-alloc.cpp: In member function ‘void* chunk_allocator::_core_alloc(int)’:
../common/chunk-alloc.cpp:288: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 5 has type ‘size_t’
../common/chunk-alloc.cpp:288: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 6 has type ‘size_t’
../common/chunk-alloc.cpp:312: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
../common/chunk-alloc.cpp:312: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 4 has type ‘size_t’

morph.cpp: In member function ‘void morph_lettersets::print(FILE*)’:
morph.cpp:502: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
morph.cpp: In member function ‘void morph_subrule::print(FILE*)’:
morph.cpp:600: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
morph.cpp: In member function ‘void trie_node::print(FILE*, int)’:
morph.cpp:674: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
morph.cpp: In member function ‘void morph_trie::print(FILE*)’:
morph.cpp:852: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
morph.cpp: In member function ‘void tMorphAnalyzer::print(FILE*)’:
morph.cpp:1022: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’

dag-tomabechi.cpp: In function ‘void dag_print_rec_safe(FILE*, dag_node*, int, bool, int)’:
dag-tomabechi.cpp:1573: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
dag-tomabechi.cpp:1574: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
dag-tomabechi.cpp: In function ‘dag_node* dag_expand_rec(dag_node*)’:
dag-tomabechi.cpp:1710: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
dag-tomabechi.cpp: In function ‘bool dag_valid_rec(dag_node*)’:
dag-tomabechi.cpp:1744: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
dag-tomabechi.cpp:1752: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
dag-tomabechi.cpp:1769: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 4 has type ‘size_t’

qc.cpp:88: 警告: format ‘%d’ expects type ‘int’, but argument 6 has type ‘size_t’
qc.cpp:102: 警告: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’
qc.cpp:105: 警告: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’
qc.cpp:140: 警告: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’
qc.cpp: In function ‘void compute_qc_sets(FILE*, const char*, std::map<list_int*, int, list_int_compare, std::allocator<std::pair<list_int* const, int> > >&, double)’:
qc.cpp:211: 警告: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’
qc.cpp:313: 警告: format ‘%d’ expects type ‘int’, but argument 4 has type ‘size_t’

../common/chunk-alloc.cpp: In member function ‘void chunk_allocator::print_check()’:
../common/chunk-alloc.cpp:139: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 2 has type ‘size_t’
../common/chunk-alloc.cpp:139: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
../common/chunk-alloc.cpp: In member function ‘void* chunk_allocator::_core_alloc(int)’:
../common/chunk-alloc.cpp:288: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 5 has type ‘size_t’
../common/chunk-alloc.cpp:288: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 6 has type ‘size_t’
../common/chunk-alloc.cpp:312: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
../common/chunk-alloc.cpp:312: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 4 has type ‘size_t’
  • Can't find ecl.h on some 64-bit machines
  • Can't find LKB source on some 64-bit machines
Clone this wiki locally