Skip to content

How to export Tokenizer-s? #2015

Open
Open
@fdtomasi

Description

@fdtomasi

I am encountering issues in exporting text tokenizers to be served for tf-serving as part of a tf.Graph.

To Reproduce

import tensorflow as tf
import keras
from keras_nlp.models import GPT2CausalLMPreprocessor

tokenizer = GPT2CausalLMPreprocessor.from_preset("gpt2_base_en")
tokenizer.build(None)

export_archive = keras.export.ExportArchive()
export_archive.track(tokenizer)
export_archive.add_endpoint(
    name="generate",
    fn=lambda x: tokenizer(x)[0],
    input_signature=[
        tf.TensorSpec(shape=[None], dtype=tf.string, name="inputs")
    ],
)
export_archive.write_out("test/export")

This should not return errors, but I get the following:

AssertionError: Tried to export a function which references an 'untracked' resource. TensorFlow objects (e.g. tf.Variable) captured by functions must be 'tracked' by assigning them to an attribute of a tracked object or assigned to an attribute of the main object directly. See the information below:
	Function name = b'__inference_signature_wrapper_<lambda>_56462'
	Captured Tensor = <ResourceHandle(name="table_49686", device="/job:localhost/replica:0/task:0/device:CPU:0", container="localhost", type="tensorflow::lookup::LookupInterface", dtype and shapes : "[  ]")>
	Trackable referencing this tensor = <tensorflow.python.ops.lookup_ops.MutableHashTable object at 0x7f87081371f0>
	Internal Tensor = Tensor("56442:0", shape=(), dtype=resource)

I am explicitly tracking the tokenizer as according to this https://keras.io/api/models/model_saving_apis/export/#track-method it seems to be required when using lookup tables, but it seems it is not enough.
I am using keras_hub == 0.17.0, keras == 3.7.0, tensorflow == 2.18.0.
Thanks!

Metadata

Metadata

Labels

type:BugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions