Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation and pipeline for lexicon free custom dataset #410

Closed
AumXIV opened this issue Jul 14, 2020 · 10 comments
Closed

Annotation and pipeline for lexicon free custom dataset #410

AumXIV opened this issue Jul 14, 2020 · 10 comments

Comments

@AumXIV
Copy link

AumXIV commented Jul 14, 2020

Hello, I’m working on license plate which is lexicon free. Now I understand about Synth90k dataset which is lexicon. And my annotation is shown below. How can I use this dataset for CRNN?

image

I wonder if the Chinese recognition example could help, but I can’t download the dataset. Is the Chinese dataset use the same pipeline with Synth90k? And the same tfrecord?

image

I can also write char_dict_th.json and ord_map_th.json from my char_dict_th.txt by using local_utils/establish_char_dict.py to generate the json file. But the ord_map_th.json have repeatedly number 32, I don’t know that it can use or not. Please give me some suggestions. Thanks. :)

image
image image image

@MaybeShewill-CV
Copy link
Owner

@AumXIV 1.If you want use new dataset. You can remove ./data/char_dict/*.json. The char_dict.json and ord_map.json file will be automatically generated when you make the tfrecords. 2. As for the repetition in order_map.json that may due to the repetition in your char_dict.txt file. The old_map.json file records the character's ord number:)

@AumXIV
Copy link
Author

AumXIV commented Jul 14, 2020

Okay, thanks. Here, I try the tfrecord and some errors are appearing. :)
image

@MaybeShewill-CV
Copy link
Owner

@AumXIV Label index should be a number sequence. You should follow the synth90k's dataset format to reconstruct your dataset. As for dataset's format this may help #354

@AumXIV
Copy link
Author

AumXIV commented Jul 15, 2020

So you mean I must have a lexicon file? How about let the lexicon file be like the label of each image?
image image
Become like this?
image

@MaybeShewill-CV
Copy link
Owner

@AumXIV The second col in annotation file should be the row index in lexicon file:)

@AumXIV
Copy link
Author

AumXIV commented Jul 15, 2020

From
image
Become
image

Right? :)

@MaybeShewill-CV
Copy link
Owner

@AumXIV Right. But I'm not sure if the row index begin with 0 or 1. You may check it yourself:)

@AumXIV
Copy link
Author

AumXIV commented Jul 16, 2020

Okay, now I can generate tfrecord and train the model. Thank you lots. :) ><

@AumXIV AumXIV closed this as completed Jul 16, 2020
@GarhwalRifles
Copy link

@AumXIV @MaybeShewill-CV so row index begin with 0 right??

@XinyuDu
Copy link

XinyuDu commented Dec 21, 2021

@AumXIV @MaybeShewill-CV so row index begin with 0 right??

yes, the row index begin with 0.
You can find label index 0 in annotation.txt of synth90k's dataset:
./2919/3/160_0_0.jpg 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants