Skip to content

Assem's Arabic Light Stemmer is a snowball-based stemming algorithm for Arabic aimed mainly to improve search.

License

Notifications You must be signed in to change notification settings

HeshamSaleh/arabicstemmer

 
 

Repository files navigation

Assem's Arabic Stemmer DOI

This is an algorithm for Arabic stemming written on Snowball framework language. If offers light stemming and text normalization.

@article{Chelli2018,
author = "Assem Chelli",
title = "{Assem's Arabic Stemmer}",
year = "2018",
month = "11",
url = "https://figshare.com/articles/Assem_s_Arabic_Stemmer/7295690",
doi = "10.6084/m9.figshare.7295690.v1"
}

This is a sample of results:

Word Light Stemmer Root-Based Stemmer
طفل طفل طفل
اطفال اطفال طفل
الاطفال اطفال طفل
اطفالكم اطفال طفل
فأطفالكم اطفال طفل
اطفالهم اطفال طفل
والاطفال اطفال طفل
فاطفالهم اطفال طفل
وطفل طفل طفل
الطفولة طفول طفل
والطفلتين طفل طفل
طفلتان طفل طفل

Requirements:

They are already attached as git submodules so just run:

$ git submodule update --init --recursive

Build:

$ make build

Run:

  • Light Stemmer
$ make run
الطالب
طالب
  • Root-Based Stemmer
$ make run_root
الطالب
طلب

Test:

We configured tests to run against snowball-data arabic sample to test speed, grouping factor and precision.

$ make test

Distributions:

  • dist light stemmer to available languages:
$ make dist

About

Assem's Arabic Light Stemmer is a snowball-based stemming algorithm for Arabic aimed mainly to improve search.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 58.3%
  • Makefile 41.7%