Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

details about dataset HPPT, about levenshtein distance #1

Open
RiverTre opened this issue Sep 2, 2023 · 3 comments
Open

details about dataset HPPT, about levenshtein distance #1

RiverTre opened this issue Sep 2, 2023 · 3 comments

Comments

@RiverTre
Copy link

RiverTre commented Sep 2, 2023

Hi,
could you provide the way/code you generated the levenshtein distance?
For example, in /ChatGPT-Detection-PR-HPPT/tree/main/Dataset/HPPT/val.json, I can find levenshtein distance values. But why it is in [0,1]? Did you use normalized distance? How do you count the levenshtein distance.
Thanks.

@RiverTre
Copy link
Author

RiverTre commented Sep 2, 2023

@Clement1290

@Clement1290
Copy link
Collaborator

Clement1290 commented Sep 2, 2023

Hi, could you provide the way/code you generated the levenshtein distance? For example, in /ChatGPT-Detection-PR-HPPT/tree/main/Dataset/HPPT/val.json, I can find levenshtein distance values. But why it is in [0,1]? Did you use normalized distance? How do you count the levenshtein distance. Thanks.

It was based on the the project https://github.com/life4/textdistance and we used the package textdistance to calculate the normalized levenshtein distance.

@RiverTre
Copy link
Author

Thanks! I am further exploring on this research. And I am using

!pip install rapidfuzz
from rapidfuzz.distance import DamerauLevenshtein
DamerauLevenshtein.normalized_distance(s1,s2)

But the output is different from the values in the data files. Are you currently available for checking it?I am confused and need help.
Document is here :https://maxbachmann.github.io/RapidFuzz/Usage/distance/Levenshtein.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants