Release Sparv Pipeline v4.0.0 · spraakbanken/sparv-pipeline

Read the documentation: https://spraakbanken.gu.se/sparv/docs

Changelog:

This version contains a complete make-over of the Sparv pipeline!
- Everything is written in Python now (no more Makefiles or bash code).
- Increased platform independence
- This facilitates creating new modules, debugging, and maintenance.
Easier installation process, Sparv is now on pypi!
- New plugin system facilitates installation of Sparv plugins (like FreeLing).
New format for corpus config files
- The new format is yaml which is easier to write and more human readable than makefiles.
- There is a command-line wizard which helps you create corpus config files.
- You no longer have to specify XML elements and attributes that should be kept from the original files. The XML
  parser now parses all existing elements and their attributes by default. Their original names will be kept and
  included in the export (unless you explicitely override this behaviour in the corpus config).
Improved interface
- New command line interface with help messages
- Better feedback with progress bar instead of illegible log output (log output is still available though)
- More helpful error messages
New corpus import and export formats
- Import of plain text files
- Export to csv (a user-friendly, non-technical column format)
- Export to (Språkbanken Text version of) CoNNL-U format
- Export to corpus statistics (word frequency lists)
Updated models and tools for processing Swedish corpora
- Sparv now uses Stanza with newly trained models and higher accuracy for POS-tagging and dependency parsing on
  Swedish texts.
Better support for annotating other (i.e. non-Swedish) languages
- Integrated Stanford Parser for English analysis (POS-tags, baseforms, dependency parsing, named-entity recognition).
- Added named-entity recognition for FreeLing languages.
- If a language is supported by different annotation tools, you can now choose which tool to use.
Improved code modularity
- Increased independence between modules and language models
- This facilitates adding new annotation modules and import/export formats.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparv Pipeline v4.0.0