-
Notifications
You must be signed in to change notification settings - Fork 4
PetTop
The PET system for efficient processing of unification-based grammars is an industrial strength implementation of the typed feature structure formalism used in [http://www.delph-in.net DELPH-IN] grammars. PET reads the exact same source files (modulo some configuration options) as the [http://www.delph-in.net/lkb/ LKB] grammar development environment and produces identical results. In a nutshell, PET can be viewed as a high-efficiency batch processing and application delivery engine, while the LKB mainly targets interactive grammar development.
PET was originally developed by UlrichCallmeier at DFKI GmbH and Saarland University, and some of its design is documented in his [http://www.coli.uni-sb.de/~uc/thesis/thesis.pdf 2001 MSc thesis]. The software subsequently served to build a commercial email auto response product (by YY Technologies, Mountain View, CA), ported to Windows NT, generally `hardened' (eliminating memory leakage, increasing robustness to exceptional situations, et al.), and extended in functionality and interfaces (including UniCode support, unknown word support, server and API library modes, lattice input, and initial MRS support); most of this work was done by Ulrich with help from StephanOepen and BerndKiefer (of DFKI). As part of the EU-funded [http://www.project-deepthought.net/ Deep Thought] project, Ulrich and Stephan later added support for subsumption-based ambiguity factoring (giving a significant improvement in parsing efficiency for long inputs), facilities to rank alternate parses according to a statistical (Maximum Entropy) parse selection model (which, typically, one would obtain using the [http://www.delph-in.net/redwoods Redwoods] tools and a hand-constructed treebank), and the ability to compile in the (Common-Lisp) MRS code base also used in the LKB, thus enabling output of (R)MRSs in various standard formats.
Towards the end of 2003, Ulrich retired from active PET development, and Bernd has since been the main developer (with occasional help from others, specifically FrederikFouvry of Saarland University and Stephan). PET has seen a range of substantial additions in functionality since, including the ability to add (leaf) types at run-time, output fragmentary analysis hypotheses in case of parse failures, and an XML-based input format that generalizes the lattice-oriented YY input mode.
As of February 2005, PET distributions still somewhat reflect its development history, and there are three branches of source code (in principle) available to users: the [http://www.dfki.de/~kiefer/pet-0.99.11.tar.gz main] branch (maintained by Bernd) is the current head revision and includes all of the latest code additions; a branch called stable preserves the state of the world as of November 2003 (for the transitory phase); and a branch called oe (maintained by Stephan) provides some conservative extensions over stable but does not include any of the more recent additions. Source [http://www.delph-in.net/ftp/pet/ snapshots] of the latter two branches are available, but users are encouraged to migrate to the main revision. We plan to further consolidate branches and eventually have all PET users work off the head revision. For the time being, development of the German and Japanese grammars is against the main version of PET already, while the ERG continues to be developed on the oe branch for a little while; although, in principle, all versions implement the same formalism, individual grammars may take advantage of facilities only available in specific PET revisions. Hence, we suggest users choose appropriately, but encourage all new users (of grammars other than the ERG) to start off from the main branch. The descriptions of individual revisions provide more detailed information on PetEvolution at the source code level.
As of February 2005, there are no binary distributions of PET, and users should expect to compile their own executable files from source code (with the one exception of the flop binary for the older main and oe branches). Current PET development is always exclusively carried out on Linux (x86.32) environments, hence most (reasonably) recent Linux distributions should work well. PET ports for Solaris (sparc, using gcc) and Windows (x86.32, using either CygWin or Borland C++) used to be supported, and in principle any platform for which a suitable C++ compiler is available (and for which external libraries used by PET exist) should allow successful compilation. Your mileage may vary.
In order to compile PET with complete functionality, a number of external packages (PetDependencies) need to be installed; in general, see the documentation for each of these packages, but some coarse instructions on versions that are known to work are available from the PetDependencies page. Compiling without some of these packages should also be possible (giving up, for example, UniCode support, [incr tsdb()] integration, or the embedded MRS code), although these configurations have not been tested for quite some time. See the compile-time parameters -DICU, -DTSDBAPI, and -DECL).
The current main source branch uses GNU autoconf and hence requires less manual configuration than earlier versions. Call the configure script as follows:
./configure --with-tsdb=~/lkb --with-eclmrs=/usr/local --with-icu=/usr/local --with-xml=/usr/local --with-mrsdir=~/lkb/
Configure should tell you whether all the necessary libraries can be found, and then you can compile flop and cheap with make. There is no need for a seperate make mrs.
For the oe branch, in order to compile the cheap parser executable, editing the Makefile in the cheap/ sub-directory will be needed. Near the top of the file are three variables that need to be adjusted to reflect the installation directories for PetDependencies packages, e.g.
#
# site-specific directory settings
#
ICUROOT = /usr/local
ECLROOT = /usr/local
LKBROOT = /home/oe/src/delphin/lkb
Once these settings corresponds to the local directory structure, the following should re-build the compiled library of MRS (Common-Lisp) code from the LKB (within the LKB directory tree, e.g. in lkb/lib/linux.x86.32 on Linux x86.32) and generate a link rmrs.h (in the current directory, i.e. `cheap/' of the PET tree), pointing to the auto-generated header file for the library:
make mrs
Watch compilation messages (from ECL and gcc) carefully and confirm that the library and header file were correctly built before moving on to compiling cheap itself:
make depend
make
The latter step should result in a dynamically-linked binary cheap that implements the PET run-time parser.
The PET software has been used in a range of projects (and one commercial product), using grammars of several languages. There is a relatively large number of options and run-time parameters that allow customization of PET behavior to various tasks. Maybe the biggest factor of variation is in (a) how input to the cheap parser is prepared for PET-internal processing and in (b) what form analysis results are output (or returned to the caller) after parsing; these are discussed on separate PetInput and PetOutput pages, respectively. Many other aspects of PET run-time behavior can be controlled using command-line options (see the PetOptions page), given to the flop or cheap binaries upon invocation, and grammar-specific settings (see the PetParameters page), supplied in TDL syntax as part of each grammar. Finally, when using PET as a processing client to the [incr tsdb()] [http://www.delph-in.net/itsdb/ profiler], some of the options and parameters are controlled from within the [incr tsdb()] environment.
The PET build process attempts to set appropriate mmap setting for your architecture. However, this automation is not always successful. If on running flop or cheap you get an error message like
alloc: no space (up = b7f35000d, down = b7f35000d)
terminate called after throwing an instance of 'tError'
Aborted
then you should try changing your mmap settings, followed by recompilation. If you look in common/chunk-alloc.cpp, you will find a section like:
#define _MMAP_ANONYMOUS
#define _CORE_LOW 0x50000000
#define _CORE_HIGH 0xbf429fff
#define _MMAP_DOWN
The settings below were successful on an IBM T41 laptop running SuSE 9.3. But trial and error may be necessary!
#define _MMAP_ANONYMOUS
#define _CORE_LOW 0x50000000
#define _CORE_HIGH 0xbff00000
#define _MMAP_DOWN
Mmap errors are likely kernel-specific, rather than tied to a particular linux-distro. The above SuSE 9.3 setting also works under Ubuntu 5.10 with kernel >= 2.6.10 and has been tested on an IBM Thinkpad T42 and a Dell Precision 4100.
- PET 0.99.11 + ECL 0.9h Compiler Warnings (GCC 4.0.2, Ubuntu 5.10)
32-bit machine
dag-tomabechi.cpp: In function ‘void dag_print_rec_safe(FILE*, dag_node*, int, bool, int)’:
dag-tomabechi.cpp:1573: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘long int’
dag-tomabechi.cpp:1574: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘long int’
dag-tomabechi.cpp: In function ‘dag_node* dag_expand_rec(dag_node*)’:
dag-tomabechi.cpp:1710: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘long int’
dag-tomabechi.cpp: In function ‘bool dag_valid_rec(dag_node*)’:
dag-tomabechi.cpp:1744: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘long int’
dag-tomabechi.cpp:1752: warning: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘long int’
dag-tomabechi.cpp:1769: warning: format ‘%x’ expects type ‘unsigned int’, but argument 4 has type ‘long int’
petmrs.c: In function ‘ecl_decode_string’:
petmrs.c:36: warning: pointer targets in return differ in signedness
64-bit machine
flop.cpp: In function ‘void mem_checkpoint(char*)’:
flop.cpp:72: 警告: format ‘%d’ expects type ‘int’, but argument 3 has type ‘long unsigned int’
flop.cpp:72: 警告: format ‘%d’ expects type ‘int’, but argument 4 has type ‘long unsigned int’
full-form.cpp: In function ‘void read_morph(std::string)’:
full-form.cpp:202: 警告: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’
../common/chunk-alloc.cpp: In member function ‘void chunk_allocator::print_check()’:
../common/chunk-alloc.cpp:139: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 2 has type ‘size_t’
../common/chunk-alloc.cpp:139: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
../common/chunk-alloc.cpp: In member function ‘void* chunk_allocator::_core_alloc(int)’:
../common/chunk-alloc.cpp:288: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 5 has type ‘size_t’
../common/chunk-alloc.cpp:288: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 6 has type ‘size_t’
../common/chunk-alloc.cpp:312: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
../common/chunk-alloc.cpp:312: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 4 has type ‘size_t’
morph.cpp: In member function ‘void morph_lettersets::print(FILE*)’:
morph.cpp:502: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
morph.cpp: In member function ‘void morph_subrule::print(FILE*)’:
morph.cpp:600: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
morph.cpp: In member function ‘void trie_node::print(FILE*, int)’:
morph.cpp:674: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
morph.cpp: In member function ‘void morph_trie::print(FILE*)’:
morph.cpp:852: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
morph.cpp: In member function ‘void tMorphAnalyzer::print(FILE*)’:
morph.cpp:1022: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
dag-tomabechi.cpp: In function ‘void dag_print_rec_safe(FILE*, dag_node*, int, bool, int)’:
dag-tomabechi.cpp:1573: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
dag-tomabechi.cpp:1574: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
dag-tomabechi.cpp: In function ‘dag_node* dag_expand_rec(dag_node*)’:
dag-tomabechi.cpp:1710: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
dag-tomabechi.cpp: In function ‘bool dag_valid_rec(dag_node*)’:
dag-tomabechi.cpp:1744: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
dag-tomabechi.cpp:1752: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
dag-tomabechi.cpp:1769: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 4 has type ‘size_t’
qc.cpp:88: 警告: format ‘%d’ expects type ‘int’, but argument 6 has type ‘size_t’
qc.cpp:102: 警告: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’
qc.cpp:105: 警告: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’
qc.cpp:140: 警告: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’
qc.cpp: In function ‘void compute_qc_sets(FILE*, const char*, std::map<list_int*, int, list_int_compare, std::allocator<std::pair<list_int* const, int> > >&, double)’:
qc.cpp:211: 警告: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’
qc.cpp:313: 警告: format ‘%d’ expects type ‘int’, but argument 4 has type ‘size_t’
../common/chunk-alloc.cpp: In member function ‘void chunk_allocator::print_check()’:
../common/chunk-alloc.cpp:139: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 2 has type ‘size_t’
../common/chunk-alloc.cpp:139: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
../common/chunk-alloc.cpp: In member function ‘void* chunk_allocator::_core_alloc(int)’:
../common/chunk-alloc.cpp:288: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 5 has type ‘size_t’
../common/chunk-alloc.cpp:288: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 6 has type ‘size_t’
../common/chunk-alloc.cpp:312: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 3 has type ‘size_t’
../common/chunk-alloc.cpp:312: 警告: format ‘%x’ expects type ‘unsigned int’, but argument 4 has type ‘size_t’
- Can't find ecl.h on some 64-bit machines
- Can't find LKB source on some 64-bit machines
Home | Forum | Discussions | Events