Add Faster Whisper UDF usage pattern for audio transcription #18

Copilot · 2025-12-05T07:35:46Z

Adds a usage pattern demonstrating how to run ML models like Faster Whisper using Daft's class-based UDF system.

Changes

New file: usage_patterns/udfs/udf_faster_whisper.py - Complete example showing:
- @daft.cls() decorator for model initialization once, reused across rows
- @daft.method() with custom DataType schemas for structured outputs
- daft.File handling for audio inputs with VAD transcription
README.md: Added new pattern to docs, fixed incorrect paths for existing UDF examples
Makefile: Added to test-udfs target

Example

@daft.cls()
class FasterWhisperTranscriber:
    def __init__(self, model="distil-large-v3", compute_type="float32", device="auto"):
        self.whisper_model = WhisperModel(model, compute_type=compute_type, device=device)
        self.pipe = BatchedInferencePipeline(self.whisper_model)

    @daft.method(return_dtype=TranscriptionResult)
    def transcribe(self, audio_file: daft.File, batch_size: int = 16):
        with audio_file.to_tempfile() as tmp:
            segments_iter, info = self.pipe.transcribe(str(tmp.name), vad_filter=True, ...)
            return {"transcript": text, "segments": segments, "info": asdict(info)}

# Usage
transcriber = FasterWhisperTranscriber()
df = (
    daft.from_glob_path("hf://datasets/Eventual-Inc/sample-files/audio/*.mp3")
    .with_column("audio_file", file(col("path")))
    .with_column("result", transcriber.transcribe(col("audio_file")))
    .select("path", unnest(col("result")))
)

Original prompt

Build me a usage pattern for running this model on daft udfs

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: everettVT <145285237+everettVT@users.noreply.github.com>

Copilot AI self-assigned this Dec 5, 2025

Initial plan

0f725e5

Copilot AI assigned everettVT Dec 5, 2025

Copilot started work on behalf of everettVT December 5, 2025 07:36 View session

Copilot AI and others added 2 commits December 5, 2025 07:40

Add Faster Whisper UDF usage pattern for audio transcription

a552e48

Co-authored-by: everettVT <145285237+everettVT@users.noreply.github.com>

Address code review: avoid redundant model inference in summary

28ae08d

Co-authored-by: everettVT <145285237+everettVT@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add usage pattern for running model on daft udfs~~ Add Faster Whisper UDF usage pattern for audio transcription Dec 5, 2025

Copilot AI requested a review from everettVT December 5, 2025 07:44

Copilot finished work on behalf of everettVT December 5, 2025 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Faster Whisper UDF usage pattern for audio transcription #18

Add Faster Whisper UDF usage pattern for audio transcription #18

Uh oh!

Copilot AI commented Dec 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Faster Whisper UDF usage pattern for audio transcription #18

Are you sure you want to change the base?

Add Faster Whisper UDF usage pattern for audio transcription #18

Uh oh!

Conversation

Copilot AI commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Example

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 5, 2025 •

edited

Loading