Skip to content

Commit

Permalink
Optimized the StringShinglingTool
Browse files Browse the repository at this point in the history
  • Loading branch information
SasheVuchkov committed Jan 17, 2022
1 parent bb20542 commit bc5d4eb
Show file tree
Hide file tree
Showing 11 changed files with 27 additions and 75 deletions.
6 changes: 3 additions & 3 deletions __tests__/functional/NearDuplicatesFinder.func.ts
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,10 @@ describe("Testing NearDuplicateFinder class", () => {
const expected = {
review5: [
[1, "review6"],
[0.9430284857571214, "review136"],
[0.9333333333333333, "review136"],
],
review6: [[0.9430284857571214, "review136"]],
review9: [[0.8916129032258064, "review81"]],
review6: [[0.9333333333333333, "review136"]],
review81: [[0.8853503184713376, "review9"]],
};

const finder = makeDuplicatesFinderWithMocks({
Expand Down
2 changes: 1 addition & 1 deletion __tests__/unit/NearDuplicatesFinder.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ describe("Testing NearDuplicateFinder class", () => {
"Like The Rings of The Lord, but with pink parrots",
"Like The Rings of The Lord, but with pink poodles",
],
{ text0: [[0.7647058823529411, "text1"]] },
{ text0: [[0.6666666666666666, "text1"]] },
],
[
"Test case: Totally different identical texts (score=0)",
Expand Down
36 changes: 2 additions & 34 deletions __tests__/unit/ShinglingTool/StringShinglingTool.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ describe("Testing ShinglingTool/StringShinglingTool class", () => {
[
"Test case: String with length that is bigger than the shingle length",
"Not so long ",
["Not so", "ot so ", "t so l", " so lo", "so lon", "o long", " long "],
["Not so", " long "],
],
[
"Test case: String with length that is equal the shingle length",
Expand All @@ -27,39 +27,7 @@ describe("Testing ShinglingTool/StringShinglingTool class", () => {
[
"Test case: String with non ascii symbols",
"Като игра на тронове, ама във ваната",
[
"Като и",
"ато иг",
"то игр",
"о игра",
" игра ",
"игра н",
"гра на",
"ра на ",
"а на т",
" на тр",
"на тро",
"а трон",
" троно",
"тронов",
"ронове",
"онове,",
"нове, ",
"ове, а",
"ве, ам",
"е, ама",
", ама ",
" ама в",
"ама въ",
"ма във",
"а във ",
" във в",
"във ва",
"ъв ван",
"в вана",
" ванат",
"ваната",
],
["Като и", "гра на", " троно", "ве, ам", "а във ", "ваната"],
],
];

Expand Down
6 changes: 3 additions & 3 deletions dist/__tests__/functional/NearDuplicatesFinder.func.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion dist/__tests__/functional/NearDuplicatesFinder.func.js.map

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion dist/__tests__/unit/NearDuplicatesFinder.spec.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 1 addition & 26 deletions dist/__tests__/unit/ShinglingTool/StringShinglingTool.spec.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 5 additions & 2 deletions dist/src/ShinglingTool/StringShinglingTool.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion dist/src/ShinglingTool/StringShinglingTool.js.map

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 8 additions & 2 deletions src/ShinglingTool/StringShinglingTool.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,14 @@ export default class StringShinglingTool extends BaseShinglingTool {
items.slice(startPosition, endPosition).join("")
);
callback(docId, shingle);
startPosition += 1;
endPosition += 1;
startPosition += this.shingleSize;
endPosition =
endPosition + this.shingleSize > items.length
? items.length
: endPosition + this.shingleSize;
if (startPosition >= endPosition) {
break;
}
}
}
}

0 comments on commit bc5d4eb

Please sign in to comment.