The Moogle! search engine

What problem does this software solve?

Given a query, return the most relevant files from the Content folder.

Requirements

Dotnet 6.0

Portability

It 'should' work on Windows. (not tested)

Quick guide

Start the server

make dev
Enter http://localhost:5000/
Write something in the search bar
Click on 'Buscar'
???
Profit

FAQ

Taking too long

The first query takes 10s to 1min (depending on your hardware). The following queries will be faster.

Can I put X file in Content?? Will Moogle! load it??

Moogle! only supports .cs, .txt, and .py files. But you can add whatever extension you want to the code manually editing the following line.

static public string[] PATTERNS = new string[]

In MoogleServer/Settings.cs

Can I add new files after the first query?

Yes. It will load after <5 minutes

Why after 5 minutes?

It would be too slow to check every time.

It just shows a black screen after I try to serch for [word]!

The word is not in the files.

Some weird "json" files just appeared out of nowhere

Those are cache files. After the first query Moogle! saves all the data it requires in those files to speed up searches. Feel free to delete them.

What is tf-idf and why does Moogle! use it?

One of the most naive ways to get a criteria to sort relevant files given a set of words is to see how many times those words are repeated in a text. However, common words like "the" or "is" will have inflated scores and they are not really relevant for our needs. A smart way to filter those words is to get how many times it's found in all files and divide the number of times the word is found in every file by this value. That's exactly what tf-idf is. With this value we can have a more sensible crteria to sort the results of a query.

I noticed that when I make a typo Moogle! gives me a suggestion. How does it work?

The code is ported almost ad verbatim from the difflib python module. Basically, we iterate through all the words. For every word we check if they have substrings in common recursively and then we count the characters of all the substring and calculate the similarity. The main difference with the python version is that we use recursivity (because it looks cleaner this way) and we don't check for "junk" characters (a feature of the difflib module), and some handling for edge cases that are not that relevant for our needs (optimization for very long strings, joining substrings, etc). I deemed those unnecessarily complex for our use-case.

How does it work?

Startup (during first query)

Moogle! searches all valid files inside the Content folder
After all the filenames are stored, Moogle! counts the words on each file and assigns that count to their corresponding file
Calculates the tf-idf for each word and stores only the top 5 files with the highest tf-idf score for that word
Dumps the cache.
Parses the query and get the candidates (files)-- 5 per word of the query.
Calculate the total score (sum of the tf-idf of the file for each matching word of the query) and find a highlight of the text to show to the user
Sort the items and hand them to the front-end (adding a suggestion if required)

How is the project organized?

The logic is inside the MoogleEngine folder.

Moogle.cs: Wrapper for all the features we need and sending the results of the query to the front-end
DataStructures.cs: Contains classes to store data including all the utility functions they need to function. Here you will find the Ranker class with most of the core features Moogle! uses
Difflib.cs: C# port of the difflib python module
Settings.cs: self explanatory

What is Report.html.pdf

Detailed documentation explaining technical details of the implementation of the Moogle! search engine.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Content		Content
MoogleEngine		MoogleEngine
MoogleServer		MoogleServer
informe		informe
presentación		presentación
script		script
.gitignore		.gitignore
Evaluation.md		Evaluation.md
LICENSE		LICENSE
Moogle.sln		Moogle.sln
Readme.md		Readme.md
TODO.txt		TODO.txt
makefile		makefile
moogle.png		moogle.png
to_html.sh		to_html.sh
to_pdf.sh		to_pdf.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Moogle! search engine

What problem does this software solve?

Requirements

Portability

Quick guide

FAQ

Taking too long

Can I put X file in Content?? Will Moogle! load it??

Can I add new files after the first query?

Why after 5 minutes?

It just shows a black screen after I try to serch for [word]!

Some weird "json" files just appeared out of nowhere

What is tf-idf and why does Moogle! use it?

I noticed that when I make a typo Moogle! gives me a suggestion. How does it work?

How does it work?

Startup (during first query)

How is the project organized?

What is Report.html.pdf

About

Releases

Packages

Contributors 3

Languages

License

Moist-Cat/moogle-2023

Folders and files

Latest commit

History

Repository files navigation

The Moogle! search engine

What problem does this software solve?

Requirements

Portability

Quick guide

FAQ

Taking too long

Can I put X file in Content?? Will Moogle! load it??

Can I add new files after the first query?

Why after 5 minutes?

It just shows a black screen after I try to serch for [word]!

Some weird "json" files just appeared out of nowhere

What is tf-idf and why does Moogle! use it?

I noticed that when I make a typo Moogle! gives me a suggestion. How does it work?

How does it work?

Startup (during first query)

How is the project organized?

What is Report.html.pdf

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages