A bug in tokenization_codegen.py #8

CBellaris · 2024-03-29T19:49:11Z

When running the code, the following error might be encountered:

File "HKU-DASC7606-A2\tokenization_codegen.py", line 203, in get_vocab
return dict(self.encoder, **self.added_tokens_encoder)
AttributeError: 'CodeGenTokenizer' object has no attribute 'encoder'

It appears that the get_vocab() function is being called during the parent class constructor's execution, at which point self.encoder has not been initialized. To address this issue, the following segment of code should be moved to execute before super().init():

with open(vocab_file, encoding="utf-8") as vocab_handle:
    self.encoder = json.load(vocab_handle)

This change can avoid the error but I don't know if it makes sense

The text was updated successfully, but these errors were encountered:

hzngadieu · 2024-04-02T03:17:25Z

This bug is due to the incorrect version of transformers
run:
pip install transformers==4.31.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A bug in tokenization_codegen.py #8

A bug in tokenization_codegen.py #8

CBellaris commented Mar 29, 2024 •

edited

Loading

hzngadieu commented Apr 2, 2024

A bug in tokenization_codegen.py #8

A bug in tokenization_codegen.py #8

Comments

CBellaris commented Mar 29, 2024 • edited Loading

hzngadieu commented Apr 2, 2024

CBellaris commented Mar 29, 2024 •

edited

Loading