ArabicSearchEngine ARS

Summary of work Done

This project went through two phases. The first was designing and implementing an experimental system based on Latent Semantic Indexing (LSI) Model.

The second was measuring the retrieval performance of this system applied to the Arabic language, trying to improve its performance. This improvement of the performance involved determining the problems faced and trying to handle them using the computational linguistics techniques.

An experimental IR system (ARS) was designed and implemented based on the LSI model. It was the first time to apply the LSI retrieval model to Arabic. In order to measure the impact of adding linguistic techniques to the LSI model, three experiments were conducted. The Indexing size was calculated and the retrieval performance was measured using precision, recall and Van Rijsbergen combined measure.

The first experiment was the core-system (i.e. LSI model only without any linguistic features). In this experiment, the size of indexing was a total of 7.69 MB of the disk space. The retrieval performance resulted in a high precision but a low recall. This means that, only small numbers of relevant documents were retrieved. Two problems aroused which are inflection and synonymy. The system achieved poor retrieval results concerning these two problems. Regarding the query-length, the retrieval performance of the system degraded gracefully as the query length increased. In the second experiment, an attempt to overcome the problem of inflection was made by adding some morphological features to the system using the morphological-normalization technique. In this experiment, the size of indexing was decreased to a 31% of the original size in the first experiment. Expanding the system with the morphological-normalization of keywords led to achieve high levels of precision and recall. Regarding the query length, the retrieval performance of the system decreased as the query length increased, especially for the precision measure. In the third experiment, an attempt to overcome the problem of word synonymy was made by adding some semantic features to the system using semantic-normalization technique. In this experiment, the size of indexing was decreased to 55% of the original size in the first experiment. Expanding the system with the semantic-normalization of keywords compared to the second experiment, led to an increase in recall (i.e. more relevant documents were retrieved) for an insubstantial average decrease in precision. Regarding the query length, the retrieval performance of the system decreased as the query length increased, especially for the precision measure. Finally, the model was evaluated in the light of the results of the three experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Documentation		Documentation
my corpus		my corpus
res		res
ARS.APS		ARS.APS
ARS.cpp		ARS.cpp
ARS.h		ARS.h
ARS.plg		ARS.plg
ARS.rc		ARS.rc
ARS.reg		ARS.reg
ARS.sln		ARS.sln
ARS.vcxproj		ARS.vcxproj
ARS.vcxproj.filters		ARS.vcxproj.filters
ARS.vcxproj.user		ARS.vcxproj.user
ARSDB.accdb		ARSDB.accdb
ARSDB.mdb		ARSDB.mdb
ARSDoc.cpp		ARSDoc.cpp
ARSDoc.h		ARSDoc.h
ARSDocument.cpp		ARSDocument.cpp
ARSDocument.h		ARSDocument.h
ARSView.cpp		ARSView.cpp
ARSView.h		ARSView.h
Adocument.cpp		Adocument.cpp
Adocument.h		Adocument.h
CommonWord.cpp		CommonWord.cpp
CommonWord.h		CommonWord.h
DBManpulation.cpp		DBManpulation.cpp
DBManpulation.h		DBManpulation.h
FilePrepare.cpp		FilePrepare.cpp
FilePrepare.h		FilePrepare.h
GetFolderDlg.cpp		GetFolderDlg.cpp
GetFolderDlg.h		GetFolderDlg.h
IR.cpp		IR.cpp
IR.h		IR.h
IndexTermsInDoc.cpp		IndexTermsInDoc.cpp
IndexTermsInDoc.h		IndexTermsInDoc.h
Indexer.cpp		Indexer.cpp
Indexer.h		Indexer.h
KeyWord.cpp		KeyWord.cpp
KeyWord.h		KeyWord.h
LSI.cpp		LSI.cpp
LSI.h		LSI.h
MainFrm.cpp		MainFrm.cpp
MainFrm.h		MainFrm.h
PrintResultsDlg.cpp		PrintResultsDlg.cpp
PrintResultsDlg.h		PrintResultsDlg.h
Query.cpp		Query.cpp
Query.h		Query.h
README.md		README.md
ReadMe.txt		ReadMe.txt
SVD.cpp		SVD.cpp
SVD.h		SVD.h
SVDDel.cpp		SVDDel.cpp
SVDDel.h		SVDDel.h
SVDupdate.cpp		SVDupdate.cpp
SVDupdate.h		SVDupdate.h
SearchEngine.cpp		SearchEngine.cpp
SearchEngine.h		SearchEngine.h
StdAfx.cpp		StdAfx.cpp
StdAfx.h		StdAfx.h
UpgradeLog.htm		UpgradeLog.htm
UpgradeLog2.htm		UpgradeLog2.htm
UpgradeLog3.htm		UpgradeLog3.htm
resource.h		resource.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArabicSearchEngine ARS

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Mo-Hamail/ArabicSearchEngine

Folders and files

Latest commit

History

Repository files navigation

ArabicSearchEngine ARS

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages