Skip to content

Conversation

the-david-oy
Copy link
Contributor

This pull request adds support for synthetic video support. This is best used for small synthetic videos, as larger videos will introduce too much latency due to the size of the memory transfer. For larger videos, we will want to support links to videos that can be referenced by the server directly.

This was tested against a local instant of Cosmos. This uses an extension of the OpenAI chat API, supported by Cosmos. I do not believe there is an OpenAI standard video API yet

Sample command:

aiperf profile \                                                       
  --model-names nvidia/cosmos-reason1-7b \
  --endpoint-type chat \
  --video-width 512 \
  --video-height 288 \
  --video-duration 5.0 \
  --video-fps 4 \
  --video-synth-type moving_shapes \
  --prompt-input-tokens-mean 50 \
  --num-dataset-entries 1 \
  --request-rate 2.0 --request-count 50

Screenshot:
Screenshot 2025-09-29 at 5 03 00 PM

Save draft
@the-david-oy the-david-oy self-assigned this Sep 30, 2025
Copy link

coderabbitai bot commented Sep 30, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dyas-video

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

@the-david-oy the-david-oy changed the title draft: synthetic video support feat: synthetic video support Sep 30, 2025
Copy link

codecov bot commented Sep 30, 2025

Codecov Report

❌ Patch coverage is 30.73770% with 169 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
aiperf/dataset/generator/video.py 13.96% 154 Missing ⚠️
aiperf/dataset/composer/synthetic.py 35.71% 7 Missing and 2 partials ⚠️
aiperf/clients/openai/openai_chat.py 0.00% 5 Missing ⚠️
aiperf/common/enums/endpoints_enums.py 75.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

os.unlink(temp_path)
raise e

def _create_mp4_with_ffmpeg(self, frames: list[Image.Image]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the following code can be heavily optimized to pass raw frames via stdin to the ffmpeg subprocess, and then read the frames back out via stdout, all using PIPEs. Here is some code I generated so you can get the idea:

    # assumes: self.config.fps (int), self.config.format.value == "mp4"
    # assumes: all frames are RGB and same size

    async def _create_mp4_with_ffmpeg(self, frames: List[Image.Image]) -> str:
        """
        Create MP4 data using ffmpeg over stdin/stdout with asyncio.
        Returns a data: URI with base64-encoded MP4 bytes.
        """
        if not frames:
            raise ValueError("No frames provided")

        width, height = frames[0].size

        cmd = [
            "ffmpeg",
            "-loglevel", "error",
            "-y",
            "-f", "rawvideo",
            "-pix_fmt", "rgb24",
            "-s", f"{width}x{height}",
            "-r", str(self.config.fps),
            "-i", "pipe:0",
            "-an",
            "-c:v", "libx264",
            "-preset", "veryfast",
            "-pix_fmt", "yuv420p",
            "-movflags", "frag_keyframe+empty_moov",
            "-f", "mp4",
            "pipe:1",
        ]

        proc = await asyncio.create_subprocess_exec(
            *cmd,
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
        )

        try:
            for im in frames:
                frame_bytes = im.tobytes()
                proc.stdin.write(frame_bytes)
                await proc.stdin.drain()

            proc.stdin.close()

            stdout, stderr = await proc.communicate()

            if proc.returncode != 0:
                raise RuntimeError(f"ffmpeg failed with code {proc.returncode}: {stderr.decode('utf-8', errors='ignore')}")

            base64_data = base64.b64encode(stdout).decode("utf-8")
            return f"data:video/{self.config.format.value};base64,{base64_data}"

        finally:
            if proc.returncode is None:
                proc.kill()
                with contextlib.suppress(Exception):
                    await proc.wait()

Copy link
Contributor Author

@the-david-oy the-david-oy Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I was trying to generate a POC that worked for benchmarking but had planned to iterate on it. I like this for efficiency. What's your rationale for making this async vs sync? I think it could go either way, but our synthetic data generation right now is synchronous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async all the things! Nah, no reason, just assumed since most stuff is async. Asyncio code can still be used to produced deterministic results when done the proper way.

In the future I think a move to async may be beneficial, as we start adding heavier and heavier things. Its a good idea to be able to provide async progress reporting back to the user.

However its not for speed benefits for sure. that needs multiple processes. Except in the case of ffmpeg, in which its already a separate process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants