Tokenizer¶ ↑

DESCRIPTION¶ ↑

A simple multilingual tokenizer – a linguistic tool intended to split a text into tokens for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (human language technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals.

Use it for tokenization of German, English and French texts.

INSTALLATION¶ ↑

Tokenizer is provided as a .gem package. Simply install it via RubyGems.

To install tokenizer issue the following command:

$ gem install tokenizer

If you want to do a system wide installation, do this as root (possibly using sudo).

Alternatively use your Gemfile for dependency management.

SYNOPSIS¶ ↑

You can use Tokenizer in two ways.

As a command line tool:

$ echo 'Hi, ich gehe in die Schule!. | tokenize

As a library for embedded tokenization:

$ require 'tokenizer'
$ de_tokenizer = Tokenizer::Tokenizer.new
$ de_tokenizer.tokenize('Ich gehe in die Schule!')
$ => ["Ich", "gehe", "in", "die", "Schule", "!"]

See documentation in the Tokenizer::Tokenizer class for details on particular methods.

SUPPORT¶ ↑

If you have question, bug reports or any suggestions, please drop me an email :) Any help is deeply appreciated!

CHANGELOG¶ ↑

For details on future plan and working progress see CHANGELOG.rdoc.

CAUTION¶ ↑

This library is work in process! Though the interface is mostly complete, you might face some not implemented features.

Please contact me with your suggestions, bug reports and feature requests.

LICENSE¶ ↑

Tokenizer is a copyrighted software by Andrei Beliankou, 2011-

You may use, redistribute and change it under the terms provided in the LICENSE.rdoc file.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
bin		bin
lib		lib
test		test
.gitignore		.gitignore
.rvmrc		.rvmrc
.yardopts		.yardopts
CHANGELOG.rdoc		CHANGELOG.rdoc
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE.rdoc		LICENSE.rdoc
README.rdoc		README.rdoc
Rakefile		Rakefile
tokenizer.gemspec		tokenizer.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tokenizer¶ ↑

DESCRIPTION¶ ↑

INSTALLATION¶ ↑

SYNOPSIS¶ ↑

SUPPORT¶ ↑

CHANGELOG¶ ↑

CAUTION¶ ↑

LICENSE¶ ↑

About

Releases

Packages

License

daalft/tokenizer

Folders and files

Latest commit

History

Repository files navigation

Tokenizer¶ ↑

DESCRIPTION¶ ↑

INSTALLATION¶ ↑

SYNOPSIS¶ ↑

SUPPORT¶ ↑

CHANGELOG¶ ↑

CAUTION¶ ↑

LICENSE¶ ↑

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages