Skip to content

Error when parsing Video #222

@diogoviannaaraujo

Description

@diogoviannaaraujo

I can get output for images but when trying on a video I get:

(.venv) diogoviannaaraujo@vogoiddesk beholder % python -m mlx_vlm.video_generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt "Describe this video" --video ./video1.mp4 --max-pixels 224 224 --fps 1.0
This is a beta version of the video understanding. It may not work as expected.
<frozen runpy>:128: RuntimeWarning: 'mlx_vlm.video_generate' found in sys.modules after import of package 'mlx_vlm', but prior to execution of 'mlx_vlm.video_generate'; this may result in unpredictable behaviour
This is a beta version of the video understanding. It may not work as expected.
Loading model: mlx-community/Qwen2-VL-2B-Instruct-4bit
Fetching 11 files: 100%|███████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 141092.80it/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
numpy reader: video_path=./video1.mp4, total_frames=3585, video_fps=15.0, time=0.000s
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/diogoviannaaraujo/Projects/beholder/.venv/lib/python3.12/site-packages/mlx_vlm/video_generate.py", line 602, in <module>
    main()
  File "/Users/diogoviannaaraujo/Projects/beholder/.venv/lib/python3.12/site-packages/mlx_vlm/video_generate.py", line 507, in main
    image_inputs, video_inputs, fps = process_vision_info(messages, True)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/diogoviannaaraujo/Projects/beholder/.venv/lib/python3.12/site-packages/mlx_vlm/video_generate.py", line 333, in process_vision_info
    video_input, video_sample_fps = fetch_video(
                                    ^^^^^^^^^^^^
  File "/Users/diogoviannaaraujo/Projects/beholder/.venv/lib/python3.12/site-packages/mlx_vlm/video_generate.py", line 245, in fetch_video
    if max_pixels_supposed > max_pixels:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'list' and 'int'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions