Different Vocab Size Between Tokenizer and Model's Word Embedding Layer #33

louisowen6 · 2021-07-23T09:37:15Z

Expected Behavior

The length of tokenizer vocab size and the BERT's word embedding layer dimension should be the same

The length of tokenizer vocab size and the BERT's word embedding layer dimension is not the same

Load the model: model = AutoModel.from_pretrained('indobenchmark/indobert-base-p1')
Print the model: print(model)

Load the tokenizer: tokenizer = AutoTokenizer.from_pretrained('indobenchmark/indobert-base-p1')
Print the length of toikenizer: print(len(tokenizer))

The text was updated successfully, but these errors were encountered: