Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

captioning + dataset preparation + inference + improvements #34

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

sayakpaul
Copy link
Collaborator

@sayakpaul sayakpaul commented Oct 15, 2024

Known gotchas:

After I ran launch.sh, I got:

ls -lh $output_dir/*.txt

...
-rw-rw-r-- 1 sayak sayak  596 oct.  15 11:42 video-dataset-disney/d3af95bf8f9cce29e5c99e839e630c59_caption.txt
-rw-rw-r-- 1 sayak sayak  596 oct.  15 11:42 video-dataset-disney/d8a33062ef6a446c168acb06bf45b77d_caption.txt
-rw-rw-r-- 1 sayak sayak   55 oct.  15 11:42 video-dataset-disney/ec80a7b9cfb3740e5cdf9300fdc8d8a9_caption.txt
-rw-rw-r-- 1 sayak sayak  647 oct.  15 11:42 video-dataset-disney/f6a95cd4397b6a102e82becdcd05d585_caption.txt
-rw-rw-r-- 1 sayak sayak  591 oct.  15 11:42 video-dataset-disney/fe0efaedf8c47812bff8da9951f77975_caption.txt
-rw-rw-r-- 1 sayak sayak  414 oct.  15 11:42 video-dataset-disney/fec5a5184b05acafc7904cd419cbb5a3_caption.txt

@sayakpaul sayakpaul requested a review from a-r-r-o-w October 15, 2024 09:49
@sayakpaul sayakpaul marked this pull request as draft October 15, 2024 15:30
@sayakpaul sayakpaul marked this pull request as ready for review October 17, 2024 10:22
@sayakpaul
Copy link
Collaborator Author

@a-r-r-o-w this is ready for reviews.

@a-r-r-o-w
Copy link
Owner

awesome, testing now!

@a-r-r-o-w
Copy link
Owner

@a-r-r-o-w
Copy link
Owner

I get the following error when launching. Any idea why?

stacktrace
(nightly-venv) aryan@hf-dgx-01:/raid/aryan/cogvideox-distillation/video_recaptioning$ ./launch.sh
WARNING 10-17 12:43:21 cuda.py:76] Detected different devices in the system:
WARNING 10-17 12:43:21 cuda.py:76] NVIDIA A100-SXM4-80GB
WARNING 10-17 12:43:21 cuda.py:76] NVIDIA A100-SXM4-80GB
WARNING 10-17 12:43:21 cuda.py:76] NVIDIA A100-SXM4-80GB
WARNING 10-17 12:43:21 cuda.py:76] NVIDIA DGX Display
WARNING 10-17 12:43:21 cuda.py:76] NVIDIA A100-SXM4-80GB
WARNING 10-17 12:43:21 cuda.py:76] Please make sure to set `CUDA_DEVICE_ORDER=PCI_BUS_ID` to avoid unexpected behavior.
/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
No module named 'vllm._version'
  from vllm.version import __version__ as VLLM_VERSION
You are using a model of type qwen2_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
INFO 10-17 12:43:30 config.py:887] Defaulting to use mp for distributed inference
INFO 10-17 12:43:30 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='Qwen/Qwen2-VL-2B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen2-VL-2B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=Qwen/Qwen2-VL-2B-Instruct, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
WARNING 10-17 12:43:31 multiproc_gpu_executor.py:53] Reducing Torch parallelism from 64 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 10-17 12:43:31 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
(VllmWorkerProcess pid=554434) INFO 10-17 12:43:32 multiproc_worker_utils.py:216] Worker ready; awaiting tasks
INFO 10-17 12:43:34 utils.py:1008] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=554434) INFO 10-17 12:43:34 utils.py:1008] Found nccl from library libnccl.so.2
INFO 10-17 12:43:34 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=554434) INFO 10-17 12:43:34 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=554434) INFO 10-17 12:43:35 custom_all_reduce_utils.py:242] reading GPU P2P access cache from /home/aryan/.cache/vllm/gpu_p2p_access_cache_for_2,3.json
INFO 10-17 12:43:35 custom_all_reduce_utils.py:242] reading GPU P2P access cache from /home/aryan/.cache/vllm/gpu_p2p_access_cache_for_2,3.json
INFO 10-17 12:43:35 shm_broadcast.py:241] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1], buffer=<vllm.distributed.device_communicators.shm_broadcast.ShmRingBuffer object at 0x7f6928dde860>, local_subscribe_port=47105, remote_subscribe_port=None)
INFO 10-17 12:43:35 model_runner.py:1060] Starting to load model Qwen/Qwen2-VL-2B-Instruct...
(VllmWorkerProcess pid=554434) INFO 10-17 12:43:35 model_runner.py:1060] Starting to load model Qwen/Qwen2-VL-2B-Instruct...
INFO 10-17 12:43:35 weight_utils.py:243] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=554434) INFO 10-17 12:43:35 weight_utils.py:243] Using model weights format ['*.safetensors']
[rank0]: Traceback (most recent call last):
[rank0]:   File "/raid/aryan/cogvideox-distillation/video_recaptioning/recaption.py", line 121, in <module>
[rank0]:     fire.Fire(main)
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
[rank0]:     component_trace = _Fire(component, args, parsed_flag_args, context, name)
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
[rank0]:     component, remaining_args = _CallAndUpdateTrace(
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
[rank0]:     component = fn(*varargs, **kwargs)
[rank0]:   File "/raid/aryan/cogvideox-distillation/video_recaptioning/recaption.py", line 89, in main
[rank0]:     vllm_engine, sampling_params = load_model(
[rank0]:   File "/raid/aryan/cogvideox-distillation/video_recaptioning/recaption.py", line 69, in load_model
[rank0]:     vllm_engine = LLM(
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 177, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 574, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 335, in __init__
[rank0]:     self.model_executor = executor_class(
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__
[rank0]:     super().__init__(*args, **kwargs)
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 47, in __init__
[rank0]:     self._init_executor()
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 111, in _init_executor
[rank0]:     self._run_workers("load_model",
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 192, in _run_workers
[rank0]:     driver_worker_output = driver_worker_method(*args, **kwargs)
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/worker/worker.py", line 183, in load_model
[rank0]:     self.model_runner.load_model()
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1062, in load_model
[rank0]:     self.model = get_model(model_config=self.model_config,
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/model_executor/model_loader/__init__.py", line 19, in get_model
[rank0]:     return loader.load_model(model_config=model_config,
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 402, in load_model
[rank0]:     model.load_weights(self._get_all_weights(model_config, model))
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1126, in load_weights
[rank0]:     for name, loaded_weight in weights:
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 377, in _get_all_weights
[rank0]:     yield from self._get_weights_iterator(primary_weights)
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 336, in _get_weights_iterator
[rank0]:     hf_folder, hf_weights_files, use_safetensors = self._prepare_weights(
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 292, in _prepare_weights
[rank0]:     hf_folder = download_weights_from_hf(
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 246, in download_weights_from_hf
[rank0]:     with get_lock(model_name_or_path, cache_dir):
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/filelock/_api.py", line 297, in __enter__
[rank0]:     self.acquire()
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/filelock/_api.py", line 255, in acquire
[rank0]:     self._acquire()
[rank0]:   File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/filelock/_unix.py", line 39, in _acquire
[rank0]:     fd = os.open(self.lock_file, open_flags, self._context.mode)
[rank0]: PermissionError: [Errno 13] Permission denied: '/tmp/31ada50e7aab2c9a44d651c511a187e63ec6fd0d7ba11408530179c4c8ea1eb2Qwen-Qwen2-VL-2B-Instruct.lock'
INFO 10-17 12:43:35 multiproc_worker_utils.py:121] Killing local vLLM worker processes
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stdout>'> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0x0000563edffa8600)

Current thread 0x00007f69dc88c740 (most recent call first):
  <no Python frame>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, markupsafe._speedups, PIL._imaging, psutil._psutil_linux, psutil._psutil_posix, msgspec._core, sentencepiece._sentencepiece, PIL._imagingft, jaxlib.cpu_feature_guard, regex._regex, msgpack._cmsgpack, google.protobuf.pyext._message, setproctitle, uvloop.loop, ray._raylet, multidict._multidict, yarl._quoting_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, zmq.backend.cython._zmq (total: 46)
/home/aryan/.pyenv/versions/3.10.14/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
./launch.sh: line 11: 554077 Aborted                 (core dumped) python recaption.py --root_dir="/raid/aryan/video-dataset-tom-and-jerry" --output_dir="/raid/aryan/video-dataset-tom-and-jerry-recaptioned" --num_devices=2 --max_num_frames=8 --max_tokens=120 --num_data_workers=4 --batch_size=2 --prompt="Please describe the content of this video in as much detail as possible, including the objects, scenery, animals, characters, and camera movements within the video. Do not include '\n' in your response. Please start the description with the video content directly. Please describe the content of the video and the changes that occur, in chronological order." --num_artifact_workers=4

@sayakpaul
Copy link
Collaborator Author

It's a permission denied error, what can I do to sort your permissions? :P

@a-r-r-o-w
Copy link
Owner

Seems like you're the owner of the lock file on DGX. Not sure why they haven't implemented per-user tmp directories, because this problem is so common when using same cache folders for multiple users lol

image

@sayakpaul
Copy link
Collaborator Author

I can lift it either because I don't have sudo. Better to check in infra.

@a-r-r-o-w
Copy link
Owner

I'm looking through the vLLM docs to see if they have an environment variable I could configure to use different cache dir, but if not then will ask in infra. Thanks!

@a-r-r-o-w
Copy link
Owner

Seems like adding a download_dir param fixes it with multiple users. I'll add the relevant changes and optional arg to the script: vllm-project/vllm#2675

@a-r-r-o-w
Copy link
Owner

I will try a few models to see what works best as default. I personally preferred the outputs of MiniCPM a lot, but will also give Qwen 7b a try. Currently, getting descriptions like:

In these frames, we see an animated scene involving a dog character. The dog is crouching down in front of a wooden dresser using a weapon. The weapon appears to be a sword, and the dog’s expression suggests that they are in the middle of making a move or about to strike. The scene is dynamic, showing the dog in a poised position as they execute the attack or защиты.This is an encoded image sequence of an animated character peacefully hanging in an industrial setting. The character is depicted in a suspended position, seemingly balancing against pipes and metal infrastructure. The environment appears to be confined spaces with industrial fixtures, suggesting a engineering or industrial context.In the given sequence of frames, we see an animated scene featuring two characters associated with a renowned animated series. They are placed in a setting with a brown wall and green floor.
1. At the top of the frame, there is Tom, a character frequently seen in the series, leading in a playful manner. He is dressed in a red dress jacket and surrounded by polka-dotted socks. His head is adorned with a porkpie hat, and his facial expression suggests a cheerful and jovial demeanor.

2. Below him, a character dressed in a blue outfit can be observed.The frames depict a scene of two animated characters sitting on a bench. The character on the left is a cat wearing a cowboy hat and holding a paper bag, while the character on the right is a cow. They appear to be talking or engaged in conversation. The background suggests a makeshift cabin or shed, indicating a rural or desert setting.The scene depicts a close-up of a person sitting in what appears to be a car or a vehicle. The person is seated on a seat, and the background shows a white curtain hanging behind them, possibly indicating that the vehicle is indoors. The overall atmosphere is indoor and cozy, conveying a sense of comfort and relaxation.In the presented sequence of frames, a character wearing white gloves comes into view. The character's white gloves have relaxed and ungrip betting in a peculiar way. The palm-side of the human finger and pointer finger are facing upward with fingers sticking straight out for a distance that seems asymptomatic. Both the hand's back and the fingers' heads are oriented parallel to each other in a high-angle manner, raising the combined formation in a peculiar fashion. The whole posture and positioning of the gloves may signify surprise or surprise-breathing or diahorrea like/unlike roleplay. Other hands purely in casual colourthis set of frames(2,90),(986,613) are quick and energetic.The frames in the video depict an animated scene involving a cat character who appears to be bundles wrapped around timbers or a similar object. The cat is wearing red trousers, white tennis shoes, and seems to be in a festive or playful mood. Surrounding the cat is a similarly dressed character and an ocean scene. The cat holds one of the bundles in its left hand, and it seems to be engaged in a lighthearted task, possibly navigating the bundles to reach a destination or dealing with them as part of a playful activity.a gray cat around a wall, dancing near the sill.The sequence depicts a view from a ship's Wildlife Pool, showcasing a swimming pool surrounded by a diving board with climbing stairs on both sides. The swimming pool has both a shallow end and a deep end, and is flanked by railings that provide an elevated diving ramp for participants. The diving board is adjacent to the pool, and the railing curve along the side of the deep end provides a protective barrier.In the beginning, a cute red ax is visually engaging punches the blue oar in a chilling blue water.Overall, the frames depict a scene of a white boat sailing under a bridge. The scene appears to be calm, with no任何人 attach themselves easilyThis is an animated scene with a curious mouse named Jerry. The mouse is standing in front of a wooden wall, displaying a mix of surprise, confusion, and possibly disbelief. The mouse has large, expressive ears and is wearing a black bow tie, adding a playful element to its expression. The background includes a room with a内阁的 wood and a wooden shelf, which adds to the setting.A cat with a fish tail appears in front of a bathroom shower, pero Suddenly, the cat falls backward and dyes the water behind it light blue.this set of frames.(2,2),(994,990) describes where there is a chair standing by the wall.This set of frames depicts a cartoon scene involving two characters. The character on the left is a anthropomorphic cat named Tom, who is wearing a gray coat and has a frying pan around his neck. He is holding a baseball bat in his right hand. The character on the right is a small brown dog named Jerry, who is wearing a blue coat and has a plastic dog toy in its mouth.

Does not seem to be respecting the 120 token limit set when lauching 🤔

@sayakpaul
Copy link
Collaborator Author

I will try a few models to see what works best as default.

Take note of the following from #34 (comment)

Not all models listed in https://docs.vllm.ai/en/latest/models/supported_models.html will have the same format for doing video captioning. Change the recaption.py as needed to suit your needs.

Does not seem to be respecting the 120 token limit set when lauching 🤔

That is likely a issue for vllm then.

Configuration-wise, we can experiment but I expected code-related comments as the first set of comments.

@a-r-r-o-w a-r-r-o-w changed the title [feat] distributed captioning with Qwen captioning + dataset preparation + inference + improvements Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants