-
Notifications
You must be signed in to change notification settings - Fork 1
feat: synthetic video support #315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Save draft
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🧪 Early access (Sonnet 4.5): enabledWe are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience. Note:
Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
aiperf/dataset/generator/video.py
Outdated
os.unlink(temp_path) | ||
raise e | ||
|
||
def _create_mp4_with_ffmpeg(self, frames: list[Image.Image]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the following code can be heavily optimized to pass raw frames via stdin to the ffmpeg subprocess, and then read the frames back out via stdout, all using PIPEs. Here is some code I generated so you can get the idea:
# assumes: self.config.fps (int), self.config.format.value == "mp4"
# assumes: all frames are RGB and same size
async def _create_mp4_with_ffmpeg(self, frames: List[Image.Image]) -> str:
"""
Create MP4 data using ffmpeg over stdin/stdout with asyncio.
Returns a data: URI with base64-encoded MP4 bytes.
"""
if not frames:
raise ValueError("No frames provided")
width, height = frames[0].size
cmd = [
"ffmpeg",
"-loglevel", "error",
"-y",
"-f", "rawvideo",
"-pix_fmt", "rgb24",
"-s", f"{width}x{height}",
"-r", str(self.config.fps),
"-i", "pipe:0",
"-an",
"-c:v", "libx264",
"-preset", "veryfast",
"-pix_fmt", "yuv420p",
"-movflags", "frag_keyframe+empty_moov",
"-f", "mp4",
"pipe:1",
]
proc = await asyncio.create_subprocess_exec(
*cmd,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
try:
for im in frames:
frame_bytes = im.tobytes()
proc.stdin.write(frame_bytes)
await proc.stdin.drain()
proc.stdin.close()
stdout, stderr = await proc.communicate()
if proc.returncode != 0:
raise RuntimeError(f"ffmpeg failed with code {proc.returncode}: {stderr.decode('utf-8', errors='ignore')}")
base64_data = base64.b64encode(stdout).decode("utf-8")
return f"data:video/{self.config.format.value};base64,{base64_data}"
finally:
if proc.returncode is None:
proc.kill()
with contextlib.suppress(Exception):
await proc.wait()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I was trying to generate a POC that worked for benchmarking but had planned to iterate on it. I like this for efficiency. What's your rationale for making this async vs sync? I think it could go either way, but our synthetic data generation right now is synchronous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Async all the things! Nah, no reason, just assumed since most stuff is async. Asyncio code can still be used to produced deterministic results when done the proper way.
In the future I think a move to async may be beneficial, as we start adding heavier and heavier things. Its a good idea to be able to provide async progress reporting back to the user.
However its not for speed benefits for sure. that needs multiple processes. Except in the case of ffmpeg, in which its already a separate process.
This pull request adds support for synthetic video support. This is best used for small synthetic videos, as larger videos will introduce too much latency due to the size of the memory transfer. For larger videos, we will want to support links to videos that can be referenced by the server directly.
This was tested against a local instant of Cosmos. This uses an extension of the OpenAI chat API, supported by Cosmos. I do not believe there is an OpenAI standard video API yet
Sample command:
aiperf profile \ --model-names nvidia/cosmos-reason1-7b \ --endpoint-type chat \ --video-width 512 \ --video-height 288 \ --video-duration 5.0 \ --video-fps 4 \ --video-synth-type moving_shapes \ --prompt-input-tokens-mean 50 \ --num-dataset-entries 1 \ --request-rate 2.0 --request-count 50
Screenshot:
