Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No checkpoint path provided. Initializing from scratch #18

Closed
xiaodongww opened this issue Dec 23, 2024 · 2 comments
Closed

No checkpoint path provided. Initializing from scratch #18

xiaodongww opened this issue Dec 23, 2024 · 2 comments

Comments

@xiaodongww
Copy link

Hi,when I caching the dataset, the log show that No checkpoint path provided. Initializing from scratch. [repeated 6x across cluster]. However, I have downloaded the resnet model and set the bkb_path as follows:

# transfuser_config.py line 18 and line 19
    bkb_path: str = "ckpts/resnet34.a1_in1k/pytorch_model.bin"
    plan_anchor_path: str = "ckpts/kmeans_navsim_traj_20.npy"

Here is the full log information

❯ python navsim/planning/script/run_dataset_caching.py agent=diffusiondrive_agent experiment_name=training_diffusiondrive_agent train_test_split=mini | tee temp.log
[2024-12-23 16:09:50,352][__main__][INFO] - Global Seed set to 0
Seed set to 0
[2024-12-23 16:09:50,358][__main__][INFO] - Building Worker
[2024-12-23 16:09:50,943][navsim.planning.utils.multithreading.worker_ray_no_torch][INFO] - Not using GPU in ray
[2024-12-23 16:09:50,944][navsim.planning.utils.multithreading.worker_ray_no_torch][INFO] - Starting ray local!
2024-12-23 16:09:52,172	INFO worker.py:1821 -- Started a local Ray instance.
[2024-12-23 16:09:52,988][nuplan.planning.utils.multithreading.worker_pool][INFO] - Worker: RayDistributedNoTorch
[2024-12-23 16:09:52,988][nuplan.planning.utils.multithreading.worker_pool][INFO] - Number of nodes: 1
Number of CPUs per node: 32
Number of GPUs per node: 0
Number of threads across all nodes: 32
[2024-12-23 16:09:52,989][__main__][INFO] - Building SceneLoader
Loading logs: 100%|█████████████████████████████████████████████████| 64/64 [00:04<00:00, 14.57it/s]
[2024-12-23 16:09:57,399][__main__][INFO] - Extracted 3615 scenarios for training/validation dataset
Ray objects:   0%|                                                           | 0/32 [00:00<?, ?it/s]Loading logs:   0%|          | 0/2 [00:00<?, ?it/s]
Loading logs: 100%|██████████| 2/2 [00:00<00:00, 33.41it/s]
Loading logs: 100%|██████████| 2/2 [00:00<00:00, 11.25it/s]
Caching Dataset:   0%|          | 0/70 [00:00<?, ?it/s]
Loading logs:  50%|█████     | 1/2 [00:00<00:00,  7.27it/s]
Loading logs: 100%|██████████| 2/2 [00:00<00:00, 15.97it/s]
Caching Dataset:   1%|          | 1/124 [00:01<02:48,  1.37s/it]
Caching Dataset:   2%|| 2/124 [00:01<01:25,  1.43it/s]
Loading logs:   0%|          | 0/2 [00:00<?, ?it/s] [repeated 24x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
Loading logs: 100%|██████████| 2/2 [00:00<00:00, 10.13it/s] [repeated 5x across cluster]
Caching Dataset:   0%|          | 0/132 [00:00<?, ?it/s] [repeated 24x across cluster]
Loading logs:  50%|█████     | 1/2 [00:00<00:00,  8.02it/s] [repeated 4x across cluster]
Loading logs: 100%|██████████| 2/2 [00:00<00:00,  8.51it/s] [repeated 18x across cluster]
Caching Dataset:   6%|| 9/140 [00:06<01:05,  1.99it/s] [repeated 157x across cluster]
Loading logs:   0%|          | 0/1 [00:00<?, ?it/s] [repeated 7x across cluster]
Loading logs: 100%|██████████| 1/1 [00:00<00:00, 18.35it/s] [repeated 2x across cluster]
Caching Dataset:   0%|          | 0/57 [00:00<?, ?it/s] [repeated 7x across cluster]
Loading logs:  50%|█████     | 1/2 [00:00<00:00,  8.89it/s]
Loading logs: 100%|██████████| 2/2 [00:00<00:00, 13.84it/s] [repeated 4x across cluster]
Caching Dataset:  20%|█▉        | 22/111 [00:11<00:38,  2.29it/s] [repeated 329x across cluster]
Caching Dataset:  22%|██▏       | 31/140 [00:16<00:50,  2.17it/s] [repeated 337x across cluster]
Caching Dataset:  54%|█████▍    | 41/76 [00:21<00:16,  2.12it/s] [repeated 356x across cluster]
Caching Dataset:  37%|███▋      | 52/140 [00:26<00:42,  2.08it/s] [repeated 346x across cluster]
Caching Dataset:  93%|█████████▎| 53/57 [00:23<00:01,  2.56it/s]
Caching Dataset:  95%|█████████▍| 54/57 [00:24<00:01,  2.54it/s]
Caching Dataset:  96%|█████████▋| 55/57 [00:24<00:00,  2.47it/s]
Caching Dataset:  98%|█████████▊| 56/57 [00:25<00:00,  2.53it/s]
Caching Dataset: 100%|██████████| 57/57 [00:25<00:00,  2.23it/s]
Ray objects:   3%|█▌                                                 | 1/32 [00:32<16:42, 32.34s/it]Caching Dataset:  39%|███▉      | 52/132 [00:27<00:34,  2.29it/s] [repeated 346x across cluster]
Ray objects:   9%|████▊                                              | 3/32 [00:35<04:01,  8.34s/it]Caching Dataset:  98%|█████████▊| 63/64 [00:31<00:00,  2.15it/s] [repeated 11x across cluster]
Caching Dataset: 100%|██████████| 64/64 [00:32<00:00,  1.97it/s] [repeated 2x across cluster]
Caching Dataset:  70%|███████   | 78/111 [00:36<00:13,  2.45it/s] [repeated 321x across cluster]
Ray objects:  12%|██████▍                                            | 4/32 [00:38<02:56,  6.29s/it]Caching Dataset:  99%|█████████▊| 75/76 [00:37<00:00,  2.21it/s] [repeated 6x across cluster]
Caching Dataset: 100%|██████████| 76/76 [00:37<00:00,  2.00it/s]
Caching Dataset:  72%|███████▏  | 89/124 [00:41<00:21,  1.67it/s] [repeated 253x across cluster]
Caching Dataset:  92%|█████████▏| 88/96 [00:42<00:03,  2.34it/s]
Caching Dataset:  93%|█████████▎| 89/96 [00:43<00:03,  2.32it/s]
Caching Dataset:  68%|██████▊   | 79/117 [00:41<00:17,  2.12it/s] [repeated 296x across cluster]
Caching Dataset: 100%|██████████| 96/96 [00:46<00:00,  2.08it/s]
Ray objects:  19%|█████████▌                                         | 6/32 [00:50<02:26,  5.64s/it]Caching Dataset:  98%|█████████▊| 104/106 [00:50<00:00,  2.54it/s] [repeated 63x across cluster]
Ray objects:  34%|█████████████████▏                                | 11/32 [00:52<00:24,  1.15s/it]Caching Dataset:  88%|████████▊ | 92/105 [00:46<00:04,  2.66it/s] [repeated 279x across cluster]
Ray objects:  41%|████████████████████▎                             | 13/32 [00:53<00:16,  1.12it/s]Caching Dataset: 100%|██████████| 110/110 [00:51<00:00,  2.14it/s] [repeated 8x across cluster]
Ray objects:  56%|████████████████████████████▏                     | 18/32 [00:56<00:06,  2.18it/s]Caching Dataset:  95%|█████████▌| 116/122 [00:51<00:01,  4.48it/s] [repeated 83x across cluster]
Caching Dataset:  86%|████████▌ | 113/132 [00:52<00:05,  3.64it/s] [repeated 229x across cluster]
Ray objects:  81%|████████████████████████████████████████▋         | 26/32 [00:59<00:01,  3.41it/s]Caching Dataset: 100%|██████████| 125/125 [00:55<00:00,  2.25it/s] [repeated 13x across cluster]
Ray objects: 100%|██████████████████████████████████████████████████| 32/32 [01:01<00:00,  1.92s/it]
(wrapped_fn pid=79839) No checkpoint path provided. Initializing from scratch.
(wrapped_fn pid=79857) No checkpoint path provided. Initializing from scratch. [repeated 25x across cluster]
[2024-12-23 16:11:04,712][__main__][INFO] - Finished caching 3615 scenarios for training/validation dataset
Caching Dataset:  99%|█████████▉| 144/145 [00:57<00:00,  7.19it/s] [repeated 133x across cluster]
Caching Dataset:  91%|█████████ | 132/145 [00:55<00:02,  5.62it/s] [repeated 30x across cluster]
Caching Dataset: 100%|██████████| 145/145 [00:57<00:00,  2.50it/s] [repeated 6x across cluster]
(wrapped_fn pid=79860) No checkpoint path provided. Initializing from scratch. [repeated 6x across cluster]

I am wondering why it still reports no checkpoint provided. Could you please give me some advices?

@varunjammula
Copy link

It is not an error. For dataset caching, it is loading diffusiondrive agent with checkpoint as null.

@LegendBC
Copy link
Member

It's ok in dataset caching, you just need to ensure the Initialization in the training part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants