Skip to content
This repository has been archived by the owner on Oct 20, 2024. It is now read-only.

Commit

Permalink
添加了新类型验证码 dots_and_chars
Browse files Browse the repository at this point in the history
dots_and_chars's score: 96.0% (48/50)
使用 sharp 和纯 JavaScript 的 CV 算法,不依赖OpenCV
  • Loading branch information
PillarsZhang committed May 11, 2021
1 parent a4882ef commit 8fedc3b
Show file tree
Hide file tree
Showing 62 changed files with 1,163 additions and 65 deletions.
26 changes: 0 additions & 26 deletions .vscode/launch.json

This file was deleted.

45 changes: 22 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@

<p align="center"><b>使用 CV (OpenCV) 和 OCR (Tesseract) 进行验证码识别</b></p>

simplest | grids_and_equations | ...
:-: | :-: | :-:
<img src="./docs/img/simplest.jpg" height="20" alt="simplest" align=center> | <img src="./docs/img/grids_and_equations.jpg" height="20" alt="grids_and_equations" align=center> | ...
2348 | 2x6=? | ...
simplest | grids_and_equations | dots_and_chars | ...
:-: | :-: | :-: | :-:
<img src="./docs/img/simplest.jpg" height="20" alt="simplest" align=center> | <img src="./docs/img/grids_and_equations.jpg" height="20" alt="grids_and_equations" align=center> | <img src="./docs/img/dots_and_chars.gif" height="20" alt="dots_and_chars" align=center> | ...
2348 | 2x6=? | 7RVO | ...

## 快速入门

Expand All @@ -17,6 +17,8 @@ simplest | grids_and_equations | ...
npm i opencv4nodejs -g
```

第三种验证码识别改为用 [sharp](https://github.com/lovell/sharp) 和纯 JavaScript 的 CV 算法来实现,方便在树莓派上运行,但效率相比前两者很低。

第二个 Tesseract 支持模块为 [tesseract.js](https://github.com/naptha/tesseract.js)

直接安装
Expand Down Expand Up @@ -58,13 +60,16 @@ var mode = "simplest";
## 开发

### 已支持
simplest | grids_and_equations
:-: | :-:
<img src="./docs/img/simplest.jpg" height="20" alt="simplest" align=center> | <img src="./docs/img/grids_and_equations.jpg" height="20" alt="grids_and_equations" align=center>
simplest | grids_and_equations | dots_and_chars
:-: | :-: | :-:
<img src="./docs/img/simplest.jpg" height="20" alt="simplest" align=center> | <img src="./docs/img/grids_and_equations.jpg" height="20" alt="grids_and_equations" align=center> | <img src="./docs/img/dots_and_chars.gif" height="20" alt="dots_and_chars" align=center>
2348 | 2x6=? | 7RVO

### 新支持

codes下的文件夹对应着不同种类的名字(自行命名),你可以参照已有的模板与API创建新的识别库,来适配其他各种验证码。请珍惜 [opencv4nodejs](https://github.com/justadudewhohacks/opencv4nodejs)[tesseract.js](https://github.com/naptha/tesseract.js) 的文档:
codes下的文件夹对应着不同种类的名字(自行命名),你可以参照已有的模板与API创建新的识别库,来适配其他各种验证码。

### 参考文档与额外说明

- opencv4nodejs
- Github | https://github.com/justadudewhohacks/opencv4nodejs
Expand All @@ -73,18 +78,12 @@ codes下的文件夹对应着不同种类的名字(自行命名),你可以
- 主页 | https://tesseract.projectnaptha.com/
- Github | https://github.com/naptha/tesseract.js
- API | https://github.com/naptha/tesseract.js#docs

另外 C++ / Python 的 OpenCV 海量资料也非常有帮助, 相应的函数基本都能在 [opencv4nodejs 的 API 文档](https://justadudewhohacks.github.io/opencv4nodejs/docs/Mat/) 里找到

## 维护者

- [Pillars Zhang](https://github.com/PillarsZhang)

## 感谢

- [opencv4nodejs](https://github.com/justadudewhohacks/opencv4nodejs)
- [tesseract.js](https://github.com/naptha/tesseract.js)

## License

- [MIT](https://opensource.org/licenses/MIT)
- 第三种验证码如果进行训练识别效果会更好
- sharp
- Github | https://github.com/lovell/sharp
- API | https://sharp.pixelplumbing.com/api-constructor
- 另外吐槽:只有编辑图像的基础功能,而且体验不是很理想需要绕过bug
- ./lib/fakeOpenCV
- 个人仿照 OpenCV 重写了一些图像算法

C++ / Python 的 OpenCV 海量资料也非常有帮助, 相应的函数基本都能在 [opencv4nodejs 的 API 文档](https://justadudewhohacks.github.io/opencv4nodejs/docs/Mat/) 里找到
Binary file added codes/dots_and_chars/examples/0Y9Q.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/0YWF.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/25HW.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/3999.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/3PFH.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/4CE1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/54JY.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/6J2G.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/7FYI.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/7RVO.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/8A5N.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/8OL0.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/9W71.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/9Y7P.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/AHLG.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/BOB4.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/C0L5.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/DWJR.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/E7RM.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/EMZ8.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/ERTK.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/FEW3.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/FQT5.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/FR1O.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/I4WQ.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added codes/dots_and_chars/examples/I638.gif
Binary file added codes/dots_and_chars/examples/I8YO.gif
Binary file added codes/dots_and_chars/examples/KXGW.gif
Binary file added codes/dots_and_chars/examples/KY0A.gif
Binary file added codes/dots_and_chars/examples/LIK0.gif
Binary file added codes/dots_and_chars/examples/NM5E.gif
Binary file added codes/dots_and_chars/examples/OSE7.gif
Binary file added codes/dots_and_chars/examples/P57C.gif
Binary file added codes/dots_and_chars/examples/PCX0.gif
Binary file added codes/dots_and_chars/examples/R796.gif
Binary file added codes/dots_and_chars/examples/RYM1.gif
Binary file added codes/dots_and_chars/examples/S1XY.gif
Binary file added codes/dots_and_chars/examples/SJV0.gif
Binary file added codes/dots_and_chars/examples/TEPW.gif
Binary file added codes/dots_and_chars/examples/TTKO.gif
Binary file added codes/dots_and_chars/examples/WIL3.gif
Binary file added codes/dots_and_chars/examples/XGUU.gif
Binary file added codes/dots_and_chars/examples/XOTT.gif
Binary file added codes/dots_and_chars/examples/XP1R.gif
Binary file added codes/dots_and_chars/examples/XQKA.gif
Binary file added codes/dots_and_chars/examples/XSVG.gif
Binary file added codes/dots_and_chars/examples/Y1VI.gif
Binary file added codes/dots_and_chars/examples/YTCT.gif
Binary file added codes/dots_and_chars/examples/Z9Z7.gif
Binary file added codes/dots_and_chars/examples/ZUCR.gif
46 changes: 46 additions & 0 deletions codes/dots_and_chars/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
const tesseract_ocr = require("./tesseract_ocr");
const sharp_cv = require("./sharp_cv");

var ocr;
const cv = sharp_cv;

class dots_and_chars {
recognize = async (image) =>{
var timeBegin = Date.now();
var cvResult = await cv(image);
if (debugFlag) {
let cvDebugInfo = {result: "length=" + cvResult.result.length, time : cvResult.time}
console.log("cv", cvDebugInfo)
}

var charPromise = [];
cvResult.result.forEach((value, index) => {
charPromise[index] = ocr.recognize(value); //console.log(value);
})

var charList = await Promise.all(charPromise);
charList.forEach((value, index) => {
if (['1', 'I'].includes(value.result)){
console.log(`index: ${index}, char: ${value.result}, w: ${cvResult.marks[index].w}, h: ${cvResult.marks[index].h}, h/w: ${cvResult.marks[index].h/cvResult.marks[index].w}`);
if (cvResult.marks[index].h/cvResult.marks[index].w < 2.2) value.result = '1'
else value.result = 'I';
}
if (['0', 'O'].includes(value.result)){
console.log(`index: ${index}, char: ${value.result}, w: ${cvResult.marks[index].w}, h: ${cvResult.marks[index].h}, h/w: ${cvResult.marks[index].h/cvResult.marks[index].w}`);
if (cvResult.marks[index].h/cvResult.marks[index].w < 1.1) value.result = 'O'
else value.result = '0';
}
})

if (debugFlag) console.log(charList);
var chars = charList.map((value, index) => value.result);
return { result: chars.join(''), time: Date.now() - timeBegin };
}

init = async (workers = 4) =>{
ocr = new tesseract_ocr(workers)
await ocr.init();
}
}

module.exports = new dots_and_chars();
169 changes: 169 additions & 0 deletions codes/dots_and_chars/sharp_cv.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
const cv = require('sharp');
const fakeOpenCV = require("../../lib/fakeOpenCV");
const fs = require("fs");

const main = async (file) => {
var timeBegin = Date.now();
var image = [];
image[0] = cv(file);
var meta = await image[0].metadata();
//console.log(meta);
var size = {width: meta.width, height: meta.height};

//裁边
image[1] = image[0].clone().extract({ left: 1, top: 1, width: meta.width-2, height: meta.height-2})
size.width -= 2;
size.height -= 2;
//await showImage(image[1], '裁边');

//灰度
image[2] = image[1].clone().greyscale();
//debugFlag && (await showImage(image[2], '灰度'));

//二值化
image[3] = image[2].clone().threshold(255).toColourspace('b-w');
debugFlag && (await showImage(image[3], '二值化'));

//空穴填充
image[5] = image[3].clone().linear(-1, 255).convolve({
width: 3,
height: 3,
kernel: [
0, 1/4, 0,
1/4, 1, 1/4,
0, 1/4, 0
],
})
image[5] = await superDeepCopySharp(image[5]);
image[5] = image[5].threshold(128).toColourspace('b-w').linear(-1, 255);
debugFlag && (await showImage(image[5], '空穴填充'));

//形态学膨胀
var buffer = await image[5].clone().raw().toBuffer();
var element = fakeOpenCV.math.ones(3, 3);
var imageMath = fakeOpenCV.transformMath({buffer, size})
var res = fakeOpenCV.dilate(imageMath, element);
var resBuffer = fakeOpenCV.transformBuffer(res);
image[4] = await cv(resBuffer.buffer, { raw: { width: resBuffer.size.width, height:resBuffer.size.height, channels: 1 }});
debugFlag && (await showImage(image[4], '膨胀'));

//连通域
var res2 = fakeOpenCV.connectedComponents(fakeOpenCV.inverse(res));

//分割及演示
var points = [];
var index2 = 0;
var marks = [];
res2.marks.forEach((value, index) => {
var pos = value.position;
function setPoints(index2, pos){
points[index2] = [
[pos.y, pos.x],
[pos.y + pos.h - 1, pos.x],
[pos.y + pos.h - 1, pos.x + pos.w - 1],
[pos.y, pos.x + pos.w - 1],
[pos.y, pos.x]
];
marks[index2] = pos;
marks[index2].area = pos.w * pos.h;
marks[index2].cX = Math.floor(pos.x + pos.w / 2);
marks[index2].cY = Math.floor(pos.y + pos.h / 2);
}
//针对超宽的目标块二次对半分割
if (pos.w > pos.h * 1.5){
pos = {x: pos.x, y: pos.y, w: Math.floor(pos.w / 2), h: pos.h};
setPoints(index2++, pos);
pos = {x: pos.x + pos.w, y: pos.y, w: pos.w, h: pos.h};
setPoints(index2++, pos);
} else{
setPoints(index2++, pos);
}
});
//points = [[[0, 0], [0, 20], [10, 20], [10, 0], [0, 0]]]
var res3 = fakeOpenCV.rectangle(res, points);
var resBuffer = fakeOpenCV.transformBuffer(res3);
image[5] = cv(resBuffer.buffer, { raw: { width: resBuffer.size.width, height:resBuffer.size.height, channels: 1 }});
//debugFlag && (await showImage(image[5], '分割'));
//console.log(res2.marks);

//取出字符块
var imageList = [];
marks.sort((a, b) => b.area - a.area);
marks.splice(4);
marks.sort((a, b) => a.cX - b.cX);
var w = 45, h = 50;
for (let i = 0; i < marks.length; i++){
let imageExt = image[4]
.clone()
.extract({ left: marks[i].x, top: marks[i].y, width: marks[i].w, height: marks[i].h })
.extend({
top: Math.floor((h - marks[i].h) / 2),
bottom: h - Math.floor((h - marks[i].h) / 2) - marks[i].h,
left: Math.floor((w - marks[i].w) / 2),
right: w - Math.floor((w - marks[i].w) / 2) - marks[i].w,
background: "white"
});
let imageJPEG = await imageExt.jpeg({
quality: 100,
chromaSubsampling: '4:4:4'
}).toBuffer();
imageList[i] = imageJPEG;
debugFlag && (await showImage(imageExt, i));
}
return { result: imageList, time: Date.now() - timeBegin, marks };
}

const showImage = async (image, name) => {
var scale = 10;
var imageCopy = await superDeepCopySharp(image);
var meta = await imageCopy.metadata();
imageCopy
.clone()
.resize({
width: meta.width * scale,
kernel: cv.kernel.nearest
})
.jpeg({
quality: 100,
chromaSubsampling: '4:4:4'
})
.toBuffer()
.then( data => {
var tmp = require('tmp');
var tmpobj = tmp.dirSync({prefix: 'sharp_' });
debugFlag && console.log('Dir: ', tmpobj.name);
var join = require("path").join;
var exec = require('child_process').exec;
var tempPNGPath = join(tmpobj.name, `${name}.jpg`);
fs.writeFileSync(tempPNGPath, data);
exec(`explorer.exe "${tempPNGPath}"`);
})
.catch( err => {console.error(err)});
}

const superDeepCopySharp = async (image) => {
var { data, info } = await image.raw().toBuffer({ resolveWithObject: true });
var pixelArray = new Uint8ClampedArray(data.buffer);
var { width, height, channels } = info;
return cv(pixelArray, { raw: { width, height, channels } });
}

module.exports = main;
var debugFlag = false;


//debugFlag = true;
if (debugFlag) {
//let code = "TTKO";
//let code = "54JY";
//let code = "7RVO";
//let code = "R796";
//let code = "XP1R";
//let code = "XSVG";
//let code = "XQKA";
let code = "I8YO";
let file = fs.readFileSync(`./examples/${code}.gif`);
main(file)
.then(res => console.log(res))
.catch(err => {console.error(err)});
};
38 changes: 38 additions & 0 deletions codes/dots_and_chars/tesseract_ocr.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
const { createWorker, createScheduler, PSM, OEM } = require('tesseract.js');
const scheduler = createScheduler();

class tesseract_ocr {
constructor(workers) {
this.workersNum = workers
};
init = async () => {
for (var i = 0; i < this.workersNum; i++) {
const worker = createWorker();
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
await worker.setParameters({
tessedit_char_whitelist: '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',
tessedit_pageseg_mode: PSM.SINGLE_CHAR,
tessedit_ocr_engine_mode: OEM.TESSERACT_LSTM_COMBINED,
//tessedit_ocr_engine_mode: OEM.LSTM_ONLY,
});
scheduler.addWorker(worker);
console.log(`${i + 1}/${this.workersNum} worker(s) initalized`);
}
};
recognize = async (image) => {
var timeBegin = Date.now();
const { data: { text } } = await scheduler.addJob('recognize', image);
return { result: text.replace("\n", ""), time: Date.now() - timeBegin };
};
terminate = async () => {
await scheduler.terminate();
};
getQueueLen = async () => {
var len = await scheduler.getQueueLen();
return len;
};
}

module.exports = tesseract_ocr;
Binary file added docs/img/dots_and_chars.gif
3 changes: 2 additions & 1 deletion judge_and_test.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,15 @@ global.debugFlag = 1;
var modeList = {
"simplest" : (ans, rightAns) => ans.result == rightAns,
"grids_and_equations" : (ans, rightAns) => ans.equation.slice(0, 3) == rightAns,
"dots_and_chars" : (ans, rightAns) => ans.result == rightAns
};

(async () => {
var modeI = 0;
for(let mode in modeList){
let cvocr = new cvocrModule(mode);
console.log(`--- ${++modeI}. ${mode} ---\n`);
await cvocr.init(2, 1);
await cvocr.init(4, 2);

let examplePath = path.join(__dirname, "codes", mode, "examples");
let files = fs.readdirSync(examplePath);
Expand Down
Loading

0 comments on commit 8fedc3b

Please sign in to comment.