Skip to content
StephanOepen edited this page Sep 20, 2007 · 45 revisions

TableOfContents

Overview

The PET system for efficient processing of unification-based grammars is an industrial strength implementation of the typed feature structure formalism used in [http://www.delph-in.net DELPH-IN] grammars. PET reads the exact same source files (modulo some configuration options) as the [http://www.delph-in.net/lkb/ LKB] grammar development environment and produces identical results. In a nutshell, PET can be viewed as a high-efficiency batch processing and application delivery engine, while the LKB mainly targets interactive grammar development.

Some features of PET include:

  • - unknown word support, instantiating generic lexical entries at run-time - subsumption-based ambiguity factoring (giving a significant improvement in parsing efficiency for long inputs) - parse ranking according to a statistical parse selection model - compilation of the (Common-Lisp) MRS LKB code base, enabling output of (R)MRSs - output of fragmentary analysis hypotheses in case of parse failures - lattice input (yy-mode) - a variety of XML-based input formats that generalize the lattice-oriented YY input mode

When installed, PET comprises the executable files cheap (bottom-up chart parser), flop (the grammar compiler) and fspp (tokenizer).

Obtaining PET

The different branches of PET source code that existed until lately have now been merged into one main branch. Instead of the old perforce repository, there is now a trec development directory at [http://pet.opendfki.de]. There you can find the most recent tarball distribution as well as the current state of the development tree. Instructions on how to access the subversion repository directly using a subversion client can also be found there.

Please use the [http://pet.opendfki.de trac] page to report bugs and feature request via the integrated ticketing system. There is also an ongoing discussion on a PET API that shall replace the cheap binary one day, cf. FeforPetApi.

Compiling and Installing PET

As of February 2005, there are no binary distributions of PET, and users should expect to compile their own executable files from source code branches). Current PET development is always exclusively carried out on Linux (x86.32) environments, hence most (reasonably) recent Linux distributions should work well. PET ports for Solaris (sparc, using gcc) and Windows (x86.32, using either CygWin or Borland C++) used to be supported, and in principle any platform for which a suitable C++ compiler is available (and for which external libraries used by PET exist) should allow successful compilation. Your mileage may vary.

In order to compile PET with complete functionality, a number of external packages (PetDependencies) need to be installed; in general, see the documentation for each of these packages, but some coarse instructions on versions that are known to work are available from the PetDependencies page. Compiling without some of these packages should also be possible (giving up, for example, UniCode support, [incr tsdb()] integration, or the embedded MRS code), although these configurations have not been tested for quite some time. See the configuration parameters --with-icu, --with-ecl, --with-tsdb, etc.

The current main source branch uses GNU autoconf and hence requires less manual configuration than earlier versions. If you use a freshly checked out version from the svn repository, you should call

aclocal && automake -a && autoconf

in the directory containing configure.ac.

If you have the boost, xerces-c, icu and ecl packages installed in system default locations, you should call the the configure script as follows:

 ./configure --with-tsdb=~/lkb --with-xml --enable-qccomp

The last two options are disabled by default, this is why they are listed here. Configure should tell you whether all the necessary libraries can be found, and then you can compile flop and cheap with make. There is no need for a seperate make mrs.

For the oe branch, in order to compile the cheap parser executable, editing the Makefile in the cheap/ sub-directory will be needed. Near the top of the file are three variables that need to be adjusted to reflect the installation directories for PetDependencies packages, e.g.

  #
  # site-specific directory settings
  #
  ICUROOT = /usr/local
  ECLROOT = /usr/local
  LKBROOT = /home/oe/src/delphin/lkb

Once these settings corresponds to the local directory structure, the following should re-build the compiled library of MRS (Common-Lisp) code from the LKB (within the LKB directory tree, e.g. in lkb/lib/linux.x86.32 on Linux x86.32) and generate a link rmrs.h (in the current directory, i.e. `cheap/' of the PET tree), pointing to the auto-generated header file for the library:

  make mrs

Watch compilation messages (from ECL and gcc) carefully and confirm that the library and header file were correctly built before moving on to compiling cheap itself:

  make depend
  make

The latter step should result in a dynamically-linked binary cheap that implements the PET run-time parser.

As of December 2006, a patch is necessary in order to use the PET svn repository version with the latest version of LKB. See the following thread in the developers mailing list:

[http://lists.delph-in.net/archive/developers/2006/000691.html]

Compiling a grammar

One needs to preprocess the grammar files (for example english.tdl for the ERG grammar) to be used with pet:

 flop english.tdl

This command generates the compiled grammar english.grm.

Running PET

The PET software has been used in a range of projects (and one commercial product), using grammars of several languages. There is a relatively large number of options and run-time parameters that allow customization of PET behavior to various tasks. Maybe the biggest factor of variation is in (a) how input to the cheap parser is prepared for PET-internal processing and in (b) what form analysis results are output (or returned to the caller) after parsing; these are discussed on separate PetInput and PetOutput pages, respectively. Many other aspects of PET run-time behavior can be controlled using command-line options (see the PetOptions page), given to the flop or cheap binaries upon invocation, and grammar-specific settings (see the PetParameters page), supplied in TDL syntax as part of each grammar. Finally, when using PET as a processing client to the [incr tsdb()] [http://www.delph-in.net/itsdb/ profiler], some of the options and parameters are controlled from within the [incr tsdb()] environment.

Tips and Tricks

The PET build process attempts to set appropriate mmap setting for your architecture. However, this automation is not always successful. If on running flop or cheap you get an error message like

alloc: no space (up = b7f35000d, down = b7f35000d)
terminate called after throwing an instance of 'tError'
Aborted

then you should try changing your mmap settings, followed by recompilation. If you look in common/chunk-alloc.cpp, you will find a section like:

#define _MMAP_ANONYMOUS
#define _CORE_LOW  0x50000000
#define _CORE_HIGH 0xbf429fff
#define _MMAP_DOWN

Simply removing the line

#define _MMAP_DOWN

works on an IBM T41 laptop running SuSE 9.3. But trial and error may be necessary!

Mmap errors are likely kernel-specific, rather than tied to a particular linux-distro. The above SuSE 9.3 setting also works under Ubuntu 5.10 with kernel >= 2.6.10 and has been tested on an IBM Thinkpad T42 and a Dell Precision 4100.

History

PET was originally developed by UlrichCallmeier at DFKI GmbH and Saarland University, and some of its design is documented in his [http://www.coli.uni-sb.de/~uc/thesis/thesis.pdf 2001 MSc thesis]. The software subsequently served to build a commercial email auto response product (by YY Technologies, Mountain View, CA), ported to Windows NT, generally `hardened' (eliminating memory leakage, increasing robustness to exceptional situations, et al.), and extended in functionality and interfaces (including UniCode support, unknown word support, server and API library modes, lattice input, and initial MRS support); most of this work was done by Ulrich with help from StephanOepen and BerndKiefer (of DFKI). As part of the EU-funded [http://www.project-deepthought.net/ Deep Thought] project, Ulrich and Stephan later added support for subsumption-based ambiguity factoring (giving a significant improvement in parsing efficiency for long inputs), facilities to rank alternate parses according to a statistical (Maximum Entropy) parse selection model (which, typically, one would obtain using the [http://www.delph-in.net/redwoods Redwoods] tools and a hand-constructed treebank), and the ability to compile in the (Common-Lisp) MRS code base also used in the LKB, thus enabling output of (R)MRSs in various standard formats.

Towards the end of 2003, Ulrich retired from active PET development, and Bernd has since been the main developer (with occasional help from others, specifically FrederikFouvry of Saarland University and Stephan). PET has seen a range of substantial additions in functionality since, including the ability to add (leaf) types at run-time, output fragmentary analysis hypotheses in case of parse failures, and an XML-based input format that generalizes the lattice-oriented YY input mode.

In 2006 YiZhang (Saarland University) added the ability to do [wiki:PetSelectiveUnpacking selective unpacking], greatly decreasing the memory consumption for n-best parsing.

Clone this wiki locally