-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues on training custom datasets #106
Comments
Please make yourself familiar with the The COCO annotation format is not designed for video or tracking data. Hence, we had to add those fields to the ground truth. |
Thanks for your answering, and now I understand what do these parameters stand for. However, another problem occurs on the very first epoch of training: Traceback (most recent call last): File "src/train.py", line 356, in train(args) File "src/train.py", line 283, in train visualizers['train'], args) File "/root/trackformer/src/trackformer/engine.py", line 119, in train_one_epoch for i, (samples, targets) in enumerate(metric_logger.log_every(data_loader, epoch)): File "/root/trackformer/src/trackformer/util/misc.py", line 230, in log_every for obj in iterable: File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__ data = self._next_data() File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 1. Original Traceback (most recent call last): File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/trackformer/src/trackformer/datasets/mot.py", line 52, in __getitem__ frame_id = self.coco.imgs[idx]['frame_id'] KeyError: 76 To solve this problem, I tried to print "self.coco.imgs" out before this error was caught, and perhaps I've realized what's wrong there. |
The image ids should be continious, yes. How would you initialize them otherwise? You go through all images in the dataet and then give each image an id. These are global and related to the video sequence. |
Got it. I initialized them with continuous numbers before spliting them into train and val sets. Does it mean that the "id" parameter should be continuous, yet "frame_id" is not required to be so? |
trackformer/src/generate_coco_from_mot.py Line 183 in e1dbc25
In combination with the MOT GT files you can derive all of this from there. |
OK, got it, and I've refreshed all of the IDs in both train & val annotation files with continuous numbers. However, I'm still suffering from this: Traceback (most recent call last): File "src/train.py", line 356, in train(args) File "src/train.py", line 283, in train visualizers['train'], args) File "/root/trackformer/src/trackformer/engine.py", line 119, in train_one_epoch for i, (samples, targets) in enumerate(metric_logger.log_every(data_loader, epoch)): File "/root/trackformer/src/trackformer/util/misc.py", line 230, in log_every for obj in iterable: File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__ data = self._next_data() File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/trackformer/src/trackformer/datasets/mot.py", line 61, in __getitem__ prev_img, prev_target = self._getitem_from_id(prev_image_id, random_state) File "/root/trackformer/src/trackformer/datasets/coco.py", line 59, in _getitem_from_id img, target = super(CocoDetection, self).__getitem__(image_id) File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torchvision/datasets/coco.py", line 44, in __getitem__ id = self.ids[index] IndexError: list index out of range |
Please make yourself familiar with the code. Understand how this error can occur and then debug your indices. There is most definitely a simple explanation for this. |
OK, after looking up about how to form MOT20 dataset in this framework, I understood how to form my own custom dataset up now, and filled the "sequences" parameter in both train and val annotations with custom-defined names, as well as the "seq" parameters in every annotations. However, another error occurs after an epoch was ended: Traceback (most recent call last): File "src/train.py", line 356, in train(args) File "src/train.py", line 301, in train output_dir, visualizers['val'], args, epoch) File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/root/trackformer/src/trackformer/engine.py", line 324, in evaluate run = ex.run(config_updates=config_updates) File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/sacred/experiment.py", line 276, in run run() File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/sacred/run.py", line 238, in __call__ self.result = self.main_function(*args) File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/sacred/config/captured_function.py", line 42, in captured_function result = wrapped(*args, **kwargs) File "/root/trackformer/src/track.py", line 109, in main dataset_name, root_dir=data_root_dir, img_transform=img_transform) File "/root/trackformer/src/trackformer/datasets/tracking/factory.py", line 59, in __init__ assert dataset in DATASETS, f"[!] Dataset not found: {dataset}" AssertionError: [!] Dataset not found: N03TCf00 Therefore, how should I define the names of the sequences in my custom dataset? |
If you rename your dataset you must add it to the dataset factory. Please try to solve these issues on your own by making yourself familiar with the code and only ask for help here as a last resort. |
After several attempts, I've added my custom dataset into factory.py in this way: for split in ['N03TCf00', 'N07TCj00', 'N10TCj00', 'N10TCj01']: DATASETS[split] = (lambda kwargs: [DemoSequence(**kwargs), ]) And I've ensured that the "sequences" parameter in both train.json and val.json, as well as the "seq" parameters in every annotation of these json files were correctly valuated. ARNING - submitit - Added new config entry: "obj_detector_model.img_transform" WARNING - submitit - Added new config entry: "obj_detector_model.model" WARNING - submitit - Added new config entry: "obj_detector_model.post.bbox" WARNING - submitit - Changed type of config entry "seed" from int to NoneType WARNING - submitit - Changed type of config entry "dataset_name" from str to DogmaticList WARNING - submitit.track - No observers have been added to this run INFO - submitit.track - Running command 'main' INFO - submitit.track - Started INFO - submitit.main - ------------------ INFO - submitit.main - TRACK SEQ: data 0it [00:00, ?it/s] INFO - submitit.main - NUM TRACKS: 0 ReIDs: 0 INFO - submitit.main - RUNTIME: 0.00 s INFO - submitit.main - NO GT AVAILBLE INFO - submitit.main - RUNTIME ALL SEQS (w/o EVAL or IMG WRITE): 0.00 s for 0 frames (0.00 Hz) INFO - submitit.track - Result: [] INFO - submitit.track - Completed after 0:00:00 Traceback (most recent call last): File "src/train.py", line 356, in train(args) File "src/train.py", line 301, in train output_dir, visualizers['val'], args, epoch) File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/root/trackformer/src/trackformer/engine.py", line 333, in evaluate mot_accums, seqs) File "/root/trackformer/src/trackformer/util/track_utils.py", line 411, in evaluate_mot_accums generate_overall=generate_overall,) File "/root/miniconda3/envs/tf/lib/python3.7/site-packages/motmetrics/metrics.py", line 271, in compute_many assert names is None or len(names) == len(dfs) AssertionError I've reviewed the related codes, and tried to print out the parameters like "names" and "dfs", yet I'm still feeling confused about this error. |
|
I am currently attempting to train my own MOT dataset including 131 images with TrackFormer. Before I begin, I formed all of the images and annotation files with the architecture listed in TRAIN.md, and then ran this code:
And it returns:
To be honest, I don't actually know what do these parameters including "seq_length""frame_ID""first_frame_image_id" in the annotation files means, which caused this error, thus I'm here to ask for help.
My environment is listed as belows:
The text was updated successfully, but these errors were encountered: