Skip to content

Recognition of Text-Captcha images, and analysis of breaking it with various techniques

License

Notifications You must be signed in to change notification settings

Pushkar1853/Captcha-Breaker-project

Repository files navigation

Captcha-Breaker-project


Project on Breaking Captcha after recognising it

Text-based CAPTCHA has become one of the most popular methods for preventing bot attacks. With the rapid development of deep learning techniques, many new methods to break text-based CAPTCHAs have been developed in recent years. However, a holistic and uniform investigation and comparison of these attacks’ effects is lacking due to inconsistent choices of model structures, training datasets, and evaluation metrics.

Types of Text-based Captcha:

Some methods of breaking them:

Dataset sample images:

Label : 00AQ59V0x5

Label : 0A3A28oY8H

Web-app file as app.py: app.py

Final notebook : notebook

Check out scripts and source codes below: src and scripts

Note : The final notebook mentions the working of the model as well as the reduction in CTC loss.

CTC Loss defintion:

A Connectionist Temporal Classification Loss, or CTC Loss, is designed for tasks where we need alignment between sequences, but where that alignment is difficult - e.g. aligning each character to its location in an audio file. It calculates a loss between a continuous (unsegmented) time series and a target sequence. It does this by summing over the probability of possible alignments of input to target, producing a loss value which is differentiable with respect to each input node.

CTC Loss on the dataset:

Website deployed:

https://huggingface.co/spaces/PushkarA07/Captcha-breaker-project

Papers followed:

About

Recognition of Text-Captcha images, and analysis of breaking it with various techniques

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published