Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bug in tokenization_codegen.py #8

Open
CBellaris opened this issue Mar 29, 2024 · 1 comment
Open

A bug in tokenization_codegen.py #8

CBellaris opened this issue Mar 29, 2024 · 1 comment

Comments

@CBellaris
Copy link

CBellaris commented Mar 29, 2024

When running the code, the following error might be encountered:

File "HKU-DASC7606-A2\tokenization_codegen.py", line 203, in get_vocab
return dict(self.encoder, **self.added_tokens_encoder)
AttributeError: 'CodeGenTokenizer' object has no attribute 'encoder'

It appears that the get_vocab() function is being called during the parent class constructor's execution, at which point self.encoder has not been initialized. To address this issue, the following segment of code should be moved to execute before super().init():

with open(vocab_file, encoding="utf-8") as vocab_handle:
    self.encoder = json.load(vocab_handle)

This change can avoid the error but I don't know if it makes sense

@hzngadieu
Copy link

This bug is due to the incorrect version of transformers
run:
pip install transformers==4.31.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants