Postprocessing hangs with multiple return sequences #18

onadegibert · 2024-10-08T12:03:05Z

Hello,

I am running the IndicTrans models from HuggingFace and need to return multiple translations per source sentence (e.g., 8). Using the code provided in the README.md of the repository, whenever I increase the num_return_sequences parameter, the postprocessing step hangs indefinitely without any error message.

Expected behavior: I expect the postprocessing step to handle multiple return sequences and provide the output without hanging.

Actual behavior: When increasing num_return_sequences, the postprocessing step hangs, and there is no further message or output.

Here is the code I’m using (adapted from the README.md):

import torch

from IndicTransToolkit import IndicProcessor
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import time

ip = IndicProcessor(inference=True)
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/indictrans2-en-indic-dist-200M", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ai4bharat/indictrans2-en-indic-dist-200M", trust_remote_code=True)

sentences = [
    "This is a test sentence.",
    "This is another longer different test sentence.",
    "Please send an SMS to 9876543210 and an email on newemail123@xyz.com by 15th October, 2023.",
]

batch = ip.preprocess_batch(sentences, src_lang="eng_Latn", tgt_lang="hin_Deva")
batch = tokenizer(batch, padding="longest", truncation=True, max_length=256, return_tensors="pt")

num_return_sequences = 2
print(f"num_return_sequences == {num_return_sequences}")
with torch.inference_mode():
    outputs = model.generate(**batch, num_beams=5, num_return_sequences=num_return_sequences, max_length=256)

with tokenizer.as_target_tokenizer():
    # This scoping is absolutely necessary, as it will instruct the tokenizer to tokenize using the target vocabulary.
    # Failure to use this scoping will result in gibberish/unexpected predictions as the output will be de-tokenized with the source vocabulary instead.
    outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenization_spaces=True)

start = time.time()
print("Starting posprocessing")
outputs = ip.postprocess_batch(outputs, lang="hin_Deva")
end = time.time()
total_time = end-start
print(f"Postprocessing took {total_time:.2f} seconds")
print(outputs)

I'm using Python 3.10.12, with the following libraries installed:

torch==2.4.1
transformers==4.45.2

Thanks!

The text was updated successfully, but these errors were encountered:

VarunGumma · 2025-01-13T06:13:59Z

Hi @onadegibert, thank you for reaching out. Yes, the processor is not designed to handle multiple return sequences from generate. As of now the post-processing assumes a 1-to-1 mapping of input to output to patch the placeholders. We will try to incorporate this request in the next release. If you have a fix by then, please feel free to make a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postprocessing hangs with multiple return sequences #18

Postprocessing hangs with multiple return sequences #18

onadegibert commented Oct 8, 2024

VarunGumma commented Jan 13, 2025

Postprocessing hangs with multiple return sequences #18

Postprocessing hangs with multiple return sequences #18

Comments

onadegibert commented Oct 8, 2024

VarunGumma commented Jan 13, 2025