Skip to content

Conversation

Copy link

Copilot AI commented Dec 5, 2025

Adds a usage pattern demonstrating how to run ML models like Faster Whisper using Daft's class-based UDF system.

Changes

  • New file: usage_patterns/udfs/udf_faster_whisper.py - Complete example showing:

    • @daft.cls() decorator for model initialization once, reused across rows
    • @daft.method() with custom DataType schemas for structured outputs
    • daft.File handling for audio inputs with VAD transcription
  • README.md: Added new pattern to docs, fixed incorrect paths for existing UDF examples

  • Makefile: Added to test-udfs target

Example

@daft.cls()
class FasterWhisperTranscriber:
    def __init__(self, model="distil-large-v3", compute_type="float32", device="auto"):
        self.whisper_model = WhisperModel(model, compute_type=compute_type, device=device)
        self.pipe = BatchedInferencePipeline(self.whisper_model)

    @daft.method(return_dtype=TranscriptionResult)
    def transcribe(self, audio_file: daft.File, batch_size: int = 16):
        with audio_file.to_tempfile() as tmp:
            segments_iter, info = self.pipe.transcribe(str(tmp.name), vad_filter=True, ...)
            return {"transcript": text, "segments": segments, "info": asdict(info)}

# Usage
transcriber = FasterWhisperTranscriber()
df = (
    daft.from_glob_path("hf://datasets/Eventual-Inc/sample-files/audio/*.mp3")
    .with_column("audio_file", file(col("path")))
    .with_column("result", transcriber.transcribe(col("audio_file")))
    .select("path", unnest(col("result")))
)
Original prompt

Build me a usage pattern for running this model on daft udfs


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI self-assigned this Dec 5, 2025
Copilot AI and others added 2 commits December 5, 2025 07:40
Co-authored-by: everettVT <145285237+everettVT@users.noreply.github.com>
Co-authored-by: everettVT <145285237+everettVT@users.noreply.github.com>
Copilot AI changed the title [WIP] Add usage pattern for running model on daft udfs Add Faster Whisper UDF usage pattern for audio transcription Dec 5, 2025
Copilot AI requested a review from everettVT December 5, 2025 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants