Developed a plagiarism detection tool using the Longest Common Substring (LCS) algorithm to identify common substrings between files, focusing on substrings longer than three words to improve accuracy.
Key Features:
-
File Processing: Reads files word-by-word, ignoring punctuation, blank lines, and references to avoid false positives.
-
Dynamic Programming: Utilizes a dynamic programming approach - Longest Common Substring (LCS) to identify common substrings.
-
Plagiarism Calculation: Calculates the percentage of plagiarism, flagging files with 30% or more similarity.
-
Command Line Interface: Implements a Make command for easy file path input.