Skip to content

Commit

Permalink
Merge pull request #87 from bact/add-wheels-list
Browse files Browse the repository at this point in the history
Add binary wheels table
  • Loading branch information
bact authored Nov 11, 2024
2 parents 6322f30 + 7f82ba6 commit a64f59d
Show file tree
Hide file tree
Showing 3 changed files with 146 additions and 52 deletions.
50 changes: 25 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,14 @@ pip install nlpo3
## Table of contents

- [Features](#features)
- [Dictionary file](#dictionary-file)
- [Usage](#usage)
- [Use](#use)
- [Node.js binding](#nodejs-binding)
- [Python binding](#python-binding)
- [Rust library](#rust-library)
- [Command-line interface](#command-line-interface)
- [Dictionary](#dictionary)
- [Build](#build)
- [Development](#development)
- [Develop](#develop)
- [License](#license)

## Features
Expand All @@ -48,25 +48,7 @@ pip install nlpo3
[tcc]: https://dl.acm.org/doi/10.1145/355214.355225
[benchmark]: ./nlpo3-python/notebooks/nlpo3_segment_benchmarks.ipynb

## Dictionary file

- For the interest of library size, nlpO3 does not assume what dictionary the
user would like to use, and it does not come with a dictionary.
- A dictionary is needed for the dictionary-based word tokenizer.
- For tokenization dictionary, try
- [words_th.tx][dict-pythainlp] from [PyThaiNLP][pythainlp]
- ~62,000 words
- CC0-1.0
- [word break dictionary][dict-libthai] from [libthai][libthai]
- consists of dictionaries in different categories, with a make script
- LGPL-2.1

[pythainlp]: https://github.com/PyThaiNLP/pythainlp
[libthai]: https://github.com/tlwg/libthai/
[dict-pythainlp]: https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt
[dict-libthai]: https://github.com/tlwg/libthai/tree/master/data

## Usage
## Use

### Node.js binding

Expand Down Expand Up @@ -151,6 +133,24 @@ echo "ฉันกินข้าว" | nlpo3 segment

See more at [nlpo3-cli](./nlpo3-cli/).

### Dictionary

- For the interest of library size, nlpO3 does not assume what dictionary the
user would like to use, and it does not come with a dictionary.
- A dictionary is needed for the dictionary-based word tokenizer.
- For tokenization dictionary, try
- [words_th.tx][dict-pythainlp] from [PyThaiNLP][pythainlp]
- ~62,000 words
- CC0-1.0
- [word break dictionary][dict-libthai] from [libthai][libthai]
- consists of dictionaries in different categories, with a make script
- LGPL-2.1

[pythainlp]: https://github.com/PyThaiNLP/pythainlp
[libthai]: https://github.com/tlwg/libthai/
[dict-pythainlp]: https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt
[dict-libthai]: https://github.com/tlwg/libthai/tree/master/data

## Build

### Requirements
Expand Down Expand Up @@ -179,13 +179,13 @@ cargo build --release

Check `target/` for build artifacts.

## Development
## Develop

Development document:
### Development document

- [Notes on custom string](src/NOTE_ON_STRING.md)

Issues:
### Issues

- Please report issues at <https://github.com/PyThaiNLP/nlpo3/issues>

Expand Down
146 changes: 119 additions & 27 deletions nlpo3-python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,22 @@ SPDX-License-Identifier: Apache-2.0

Python binding for nlpO3, a Thai natural language processing library in Rust.

To install:

```bash
pip install nlpo3
```

## Table of Contents

- [Features](#features)
- [Use](#use)
- [Dictionary](#dictionary)
- [Build](#build)
- [Issues](#issues)
- [License](#license)
- [Binary wheels](#binary-wheels)

## Features

- Thai word tokenizer
Expand All @@ -24,31 +40,7 @@ Python binding for nlpO3, a Thai natural language processing library in Rust.
[tcc]: https://dl.acm.org/doi/10.1145/355214.355225
[benchmark]: ./notebooks/nlpo3_segment_benchmarks.ipynb

## Dictionary file

- For the interest of library size, nlpO3 does not assume what dictionary the
user would like to use, and it does not come with a dictionary.
- A dictionary is needed for the dictionary-based word tokenizer.
- For tokenization dictionary, try
- [words_th.txt][dict-pythainlp] from [PyThaiNLP][pythainlp]
- ~62,000 words
- CC0-1.0
- [word break dictionary][dict-libthai] from [libthai][libthai]
- consists of dictionaries in different categories, with a make script
- LGPL-2.1

[pythainlp]: https://github.com/PyThaiNLP/pythainlp
[libthai]: https://github.com/tlwg/libthai/
[dict-pythainlp]: https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt
[dict-libthai]: https://github.com/tlwg/libthai/tree/master/data

## Install

```bash
pip install nlpo3
```

## Usage
## Use

Load file `path/to/dict.file` to memory
and assign a name `dict_name` to it.
Expand Down Expand Up @@ -83,6 +75,24 @@ for text with lots of ambiguous word boundaries:
segment("สวัสดีครับ", dict_name="dict_name", safe=True)
```

### Dictionary

- For the interest of library size, nlpO3 does not assume what dictionary the
user would like to use, and it does not come with a dictionary.
- A dictionary is needed for the dictionary-based word tokenizer.
- For tokenization dictionary, try
- [words_th.txt][dict-pythainlp] from [PyThaiNLP][pythainlp]
- ~62,000 words
- CC0-1.0
- [word break dictionary][dict-libthai] from [libthai][libthai]
- consists of dictionaries in different categories, with a make script
- LGPL-2.1

[pythainlp]: https://github.com/PyThaiNLP/pythainlp
[libthai]: https://github.com/tlwg/libthai/
[dict-pythainlp]: https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt
[dict-libthai]: https://github.com/tlwg/libthai/tree/master/data

## Build

### Requirements
Expand Down Expand Up @@ -111,9 +121,9 @@ To install a wheel from a local directory:
pip install dist/nlpo3-1.3.1-cp311-cp311-macosx_12_0_x86_64.whl
```

## Test
### Test

To run the Python unit test:
To run a Python unit test:

```bash
cd tests
Expand All @@ -129,3 +139,85 @@ Please report issues at <https://github.com/PyThaiNLP/nlpo3/issues>
nlpO3 Python binding is copyrighted by its authors
and licensed under terms of the Apache Software License 2.0 (Apache-2.0).
See file [LICENSE](./LICENSE) for details.

## Binary wheels

A pre-built binary package is available from [PyPI][pypi] for these platforms:

[pypi]: https://pypi.org/project/nlpo3/

|Python|OS|Architecture|Has binary wheel?|
|-|-|-|-|
|3.13|Windows|x86||
||Windows|AMD64||
||macOS|x86_64||
||macOS|arm64||
||manylinux|x86_64||
||manylinux|i686||
||musllinux|x86_64||
|3.12|Windows|x86||
||Windows|AMD64||
||macOS|x86_64||
||macOS|arm64||
||manylinux|x86_64||
||manylinux|i686||
||musllinux|x86_64||
|3.11|Windows|x86||
||Windows|AMD64||
||macOS|x86_64||
||macOS|arm64||
||manylinux|x86_64||
||manylinux|i686||
||musllinux|x86_64||
|3.10|Windows|x86||
||Windows|AMD64||
||macOS|x86_64||
||macOS|arm64||
||manylinux|x86_64||
||manylinux|i686||
||musllinux|x86_64||
|3.9|Windows|x86||
||Windows|AMD64||
||macOS|x86_64||
||macOS|arm64||
||manylinux|x86_64||
||manylinux|i686||
||musllinux|x86_64||
|3.8|Windows|x86||
||Windows|AMD64||
||macOS|x86_64||
||macOS|arm64||
||manylinux|x86_64||
||manylinux|i686||
||musllinux|x86_64||
|3.7|Windows|x86||
||Windows|AMD64||
||macOS|x86_64||
||macOS|arm64||
||manylinux|x86_64||
||manylinux|i686||
||musllinux|x86_64||
|PyPy 3.10|Windows|x86||
||Windows|AMD64||
||macOS|x86_64||
||macOS|arm64||
||manylinux|x86_64||
||manylinux|i686||
|PyPy 3.9|Windows|x86||
||Windows|AMD64||
||macOS|x86_64||
||macOS|arm64||
||manylinux|x86_64||
||manylinux|i686||
|PyPy 3.8|Windows|x86||
||Windows|AMD64||
||macOS|x86_64||
||macOS|arm64||
||manylinux|x86_64||
||manylinux|i686||
|PyPy 3.7|Windows|x86||
||Windows|AMD64||
||macOS|x86_64||
||macOS|arm64||
||manylinux|x86_64||
||manylinux|i686||
2 changes: 2 additions & 0 deletions nlpo3-python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ classifiers = [
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Programming Language :: Python :: Implementation :: CPython",
"Programming Language :: Python :: Implementation :: PyPy",
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Natural Language :: Thai",
Expand Down

0 comments on commit a64f59d

Please sign in to comment.