Patchcore fails to train with a large dataset #802

chaiiii12345 · 2022-12-20T06:43:49Z

chaiiii12345
Dec 20, 2022

Hello, I have a pcb dataset with about 1000 good images (and set the image size to 224x224) and when training with Patchcore, the training crashes after 2 epochs with the error CUDA out of memory... (similar to #434) and so I had to trim the dataset to around 200 images to get it to work, which obviously affected the training.
Now, I tried the training with the same dataset on Fastflow and it could successfully train with 1000 images.
And now I'm just curious to know if there is a reason why Patchcore fails with the large number of images?

djdameln · 2022-12-20T07:24:05Z

djdameln
Dec 20, 2022
Maintainer

First of all, PatchCore only needs to be trained for a single epoch. The model uses a fixed backbone with frozen weights to extract features from the training images, which are collected in a memory bank. At inference time, the features of the inference images are compared to the features in the memory bank. So the training does not involve fine-tuning of the neural network weights, and training for multiple epochs will not improve results.

With very large datasets you might run into memory issues within the first epoch. That's because PatchCore has to store the extracted features from all images in the training set in memory in order to create the memory bank. This means that the memory use of PatchCore scales with the size of the dataset. For larger datasets you might have to consider subsampling your dataset or using a more memory-efficient algorithm.

5 replies

nixczhou Dec 23, 2022

Do you have any suggestion of how to subsampling dataset?

rishabh-akridata Sep 5, 2024

@djdameln @nixczhou Any idea how to resolve this patchcore memory issue when dataset is large?. @nixczhou Have you figured out how to do the subsampling of the dataset?

nixczhou Sep 5, 2024

hi @rishabh-akridata , you can do random sampling, or stratified sampling, but this should be done before you feed image to anomalib patchocore. Recommend to sample images that cover most of the normal images variances. You may need to implement the sampling code :)

For example, you have 5000 images, while your GPU memory is not enough to build a 5000 images memory bank, you may need to random sample them to 500 images. Then you will not face the out of memory issue.

nixczhou Sep 5, 2024

Btw, if you want to use all of the normal images you have, can consider autoencoder based model like EfficientAD

rishabh-akridata Sep 5, 2024

@nixczhou Thanks a lot for the quick response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patchcore fails to train with a large dataset #802

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Patchcore fails to train with a large dataset #802

chaiiii12345 Dec 20, 2022

Replies: 1 comment · 5 replies

djdameln Dec 20, 2022 Maintainer

nixczhou Dec 23, 2022

rishabh-akridata Sep 5, 2024

nixczhou Sep 5, 2024

nixczhou Sep 5, 2024

rishabh-akridata Sep 5, 2024

chaiiii12345
Dec 20, 2022

Replies: 1 comment 5 replies

djdameln
Dec 20, 2022
Maintainer