DocSim: Document Similarity Analysis Tool

A Python-based implementation designed to calculate semantic similarity between textual documents using Vector Space Modeling and Cosine Similarity.

Project Structure

The repository is organized to demonstrate both the initial logic and the improved modular version:

main.py: The primary entry point for executing the analysis.
preprocess.py: Dedicated module for text normalization and tokenization.
similarity.py: Core logic for vector generation and cosine calculations.
file_loader.py: Utility module for handling document input operations.
utils.py: General-purpose helper functions.
old_version.py: The initial single-script implementation (Version 1.0).

Technical Stack

Language: Python 3.x
Methodology: Vector Space Modeling (VSM)
Metric: Cosine Similarity
Domain: Natural Language Processing (NLP)

Key Features

Modular Architecture: Code is decoupled into specific modules for better readability and maintenance.
Preprocessing Pipeline: Robust handling of raw text to ensure accurate similarity scoring.
Clean Code: Follows structured programming principles to make the logic easy to follow.

How to Use

Clone the repository to your local environment.
Run the program using:
```
python main.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
new_version		new_version
README.md		README.md
old_version.py		old_version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocSim: Document Similarity Analysis Tool

Project Structure

Technical Stack

Key Features

How to Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocSim: Document Similarity Analysis Tool

Project Structure

Technical Stack

Key Features

How to Use

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages