Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute_word_piece_vocabulary #2034

Closed
42Cummer opened this issue Jan 2, 2025 · 3 comments
Closed

compute_word_piece_vocabulary #2034

42Cummer opened this issue Jan 2, 2025 · 3 comments

Comments

@42Cummer
Copy link

42Cummer commented Jan 2, 2025

Describe the bug

if not isinstance(data, (list, tf.data.Dataset)): #from source code
AttributeError: 'NoneType' object has no attribute 'data'

because compute_word_piece_vocabulary function is not importing TensorFlow correctly (tf = None)(refer to source code lines 4-11 and 117)

To Reproduce

def train_word_piece(dataset, vocab_size, reserved_tokens):
    word_piece_ds = dataset.unbatch().map(lambda x, y: x)
    vocab = compute_word_piece_vocabulary(
        data = word_piece_ds.batch(1000).prefetch(2),
        vocabulary_size=vocab_size,
        reserved_tokens=reserved_tokens,
    )
    return vocab

train_word_piece(train_dataset, VOCAB_SIZE, reserved_tokens)

Code from https://keras.io/examples/nlp/fnet_classification_with_keras_hub/

Expected behavior
Function executes without error.

Additional context
Tensorflow version 2.17.1
KerasHub version 0.18.1

Maybe remove lines 4-11 in source?

@mattdangerw
Copy link
Member

I suspect there's an issue with your tensorflow or tensorflow_text install. Can you try running this?

import tensorflow
import tensorflow_text

print(tensorflow.__version__)
print(tensorflow_text.__version__)

Lines 4-11 are to allow keras-hub to function without a tensorflow install. All our preprocessing (including this function) uses tf.data and tensorflow-text (even on jax and torch), so this feature is really only for power users who want to run all preprocessing in a separate python environment from training/inference.

In your case, it's most likely an issue with your tensorflow or tensorflow-text installation.

@mattdangerw
Copy link
Member

There is a valid bug here though. For other library symbols that need tensorflow text we print a helpful error message if the library is missing or otherwise failing to import. We are missing that for compute_word_piece_vocabulary. I'll add it now!

Thanks

@42Cummer
Copy link
Author

42Cummer commented Jan 9, 2025

Oh I see now, thanks for clarifying! But that error message would be helpful!

@42Cummer 42Cummer closed this as completed Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants