QCRI Farasa package for tokenizing Arabic text.
QCRI FARASA. Latest release You may check-out the latest version from the githup repository: https://github.com/Qatar-Computing-Research-Institute/Farasa
Package content, files, folders
1 - Command line:
Limux/Mac OS
set FarasaDataDir=<FarasaData>\
java -Dfile.encoding=UTF-8 -jar dist/Farasa.jar -i InputFile -o OutputFile
WINDOWS
set FarasaDataDir=<FARASADATADIR>/
java -Dfile.encoding=UTF-8 -jar dist/Farasa.jar -i InputFile -o OutputFile
Parameters:
Farasa.sh|Farasa.bat <--help|-h> [--input|-i inputFile] [--output|-o outputFile]
* options:
* --help display help information
* --input inputfile
* --output outputfile
*
Example:
FarasaDataDir=<FarasaDataDirectory>/ java -Dfile.encoding=UTF-8 -jar dist/Farasa.jar < testfile.txt
For Windows Environment: You may require to explicitly specify the library path:
set FarasaDataDir=<FARASADATADIR>/
java -Dfile.encoding=UTF-8 -jar dist/Farasa.jar < testfile.txt
Build the jar:
ant jar
Deploy the package to other direcotory:
ant deploy -Do=<Dest Dir>
If you have any problem, question please contact kdarwish@qf.org.qa, aabdelali@qf.org.qa or hmubarak@qf.org.qa
URL for the project and the latest news and downloads http://alt.qcri.org/tools/farasa
Where to download the latest version, https://github.com/Qatar-Computing-Research-Institute/Farasa
QCRI FARASA package for tokenizing Arabic text is being made
public for research purpose only.
For non-research use, please contact:
Kareem M. Darwish <kdarwish@qf.org.qa>
Hamdy Mubarak <hmubarak@qf.org.qa>
Ahmed Abdelali <aabdelali@qf.org.qa>
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Copyright 2015 QCRI