Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursionError: maximum recursion depth exceeded in comparison #16

Open
hp0716 opened this issue Sep 2, 2024 · 10 comments
Open

RecursionError: maximum recursion depth exceeded in comparison #16

hp0716 opened this issue Sep 2, 2024 · 10 comments

Comments

@hp0716
Copy link

hp0716 commented Sep 2, 2024

Hello, what should I pay attention to if I use create_lmdb_dataset.py to construct the dataset? When I run eval_rec_all_ch.py after generating the mdb file, I get an error
image

@hp0716
Copy link
Author

hp0716 commented Sep 2, 2024

When I create lmdb data, do I need to modify the config file as follows:
Eval:
dataset:
name: LMDBDataSet
ds_width: True
...
sampler:
name: MultiScaleSampler

@Topdu
Copy link
Owner

Topdu commented Sep 2, 2024

This is caused by unsuccessful data preprocessing, please check where in ratio_dataset.py: outs = transform(data, self.ops[:-1]) the preprocessing is wrong.

@hp0716
Copy link
Author

hp0716 commented Sep 3, 2024

I think my main problem is that I don't know how to construct my own dataset, should I use "create_lmdb_dataset.py" ? My data is a Chinese address, what specific adjustments need to be made.
image

@hp0716
Copy link
Author

hp0716 commented Sep 3, 2024

Because I had a problem creating the dataset, "transform" always returns none. How to solve this?
image

@Topdu
Copy link
Owner

Topdu commented Sep 3, 2024

if __name__ == '__main__':
    data_dir = './Union14M-L/'
    label_file_list = [
        './Union14M-L/train_annos/filter_jsonl_mmocr0.x/filter_train_challenging.jsonl.txt',
        './Union14M-L/train_annos/filter_jsonl_mmocr0.x/filter_train_easy.jsonl.txt',
        './Union14M-L/train_annos/filter_jsonl_mmocr0.x/filter_train_hard.jsonl.txt',
        './Union14M-L/train_annos/filter_jsonl_mmocr0.x/filter_train_medium.jsonl.txt',
        './Union14M-L/train_annos/filter_jsonl_mmocr0.x/filter_train_normal.jsonl.txt'
    ]
    save_path_root = './Union14M-L-LMDB-Filtered/'

    for data_list in label_file_list:
        save_path = save_path_root + data_list.split('/')[-1].split(
            '.')[0] + '/'
        os.makedirs(save_path, exist_ok=True)
        print(save_path)
        train_data_list = get_datalist(data_dir, data_list, 800)

        createDataset(train_data_list, save_path)

你需要修改tools/create_lmdb_dataset.py中的data_dir、label_file_list和save_path_root。其中label_file.txt中的内容应该是 img_dir_name+\t+label的形式,例如:
full_images/COCOTextV2_append/imgs/img_107271_0.jpg LAYI SILK
full_images/COCOTextV2_append/imgs/img_107339_2.jpg PumUoo
full_images/COCOTextV2_append/imgs/img_107339_7.jpg GA
其中data_dir+img_dir_name是图片可读取的路径。

@Topdu
Copy link
Owner

Topdu commented Sep 3, 2024

Because I had a problem creating the dataset, "transform" always returns none. How to solve this? ![image]

这是因为数据预处理有问题,你可以,在data返回为None时打印当前op:

def transform(data, ops=None):
    """transform."""
    if ops is None:
        ops = []
    for op in ops:
        data = op(data)
        if data is None:
            print(op)
            return None
    return data

然后找到op的具体代码,尝试定位导致返回None的具体原因,一般是label处理不通过(比如长度超过max_text_length等)或者图片读取不成功导致。

@hp0716
Copy link
Author

hp0716 commented Sep 4, 2024

感谢解答,确实是超过最大长度导致的问题,但是输出精度全为0,数据集或者参数还要做哪些调整吗?
image

@hp0716
Copy link
Author

hp0716 commented Sep 4, 2024

上述测试代码是正常的,只是识别精度很低,我想使用自己构造地址数据集一边生成数据一边训练,请问训练前是否一定要将图片和标签转换为lmdb格式?

@Topdu
Copy link
Owner

Topdu commented Sep 4, 2024

RatioDateset only supports loading data in LMDB format, you can also modify it to load data in customized formats.

@Topdu
Copy link
Owner

Topdu commented Sep 7, 2024

感谢解答,确实是超过最大长度导致的问题,但是输出精度全为0,数据集或者参数还要做哪些调整吗? ![image]

The analysis of the dataset helps in determining the best parameters such as text length and aspect ratio distribution in the dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants