Skip to content

Egyptian / Modern Standard Arabic language identification system

License

Notifications You must be signed in to change notification settings

motazsaad/egy-arb-dialect-id

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comparable Wikipedia Coprus

Comparable Wikipedia Corpus (aligned documents)

Corpus extracts from 20-01-2017 Wikipedia dumps

License: CC BY-SA 4.0

This corpus is aligned by WikiDocsAligner

Lanugage pairs list (20-01-2017):

  • Arabic-Egyptian

In the future, other language pairs will be included

Corpus Information

Arabic Wikipedia Egyptian Wikipedia
documents 10,197 10,197
words 8,397,154 1,543,516
vocabulary 740,055 215,659

To cite this resource:

Motaz Saad and Basem Alijla (2017). WikiDocsAligner: an off-the-shelf Wikipedia Documents Alignment Tool. in The Second Palestinian International Conference on Information and Communication Technology (PICICT 2017).

Releases

No releases published

Packages

No packages published

Languages