Skip to content

FAQ on Face recognition

Marcel Klehr edited this page Oct 10, 2023 · 1 revision

I let it run through my entire photo library before assigning names to clusters. I found some people were consistently confused, like my 3 kids. Now I have clusters with thousands of images that are all mixed together, and the only way to resolve is to manually disambiguate the kids for every. single. photo. I get the impression that if I had started with a sample of, say 100 photos, I could have set up those initial face clusters to give the algo something to work with and gotten a better result. Or maybe not?

Basically, you can either A) run face detection over all photos and afterward run clustering. This way you leverage all possible information for the clustering, but if it goes wrong for a face, it goes wrong a lot. Or B) you run face detection and clustering in batches and directly afterward clean up the results manually. This may lead to less accurate clustering results because the algorithm only ever has access to a portion of information, but you have a more frequent feedback loop built in, to correct its mistakes.

I doubt either approach is generally much better than the other, as the clustering algorithm is as fine tuned as possible already, so when clustering goes wrong, it's unlikely to be a result of a wrong cluster assignment, but rather the result of an wrong / less expressive face descriptor, which will have to be corrected for in both approaches. The current face descriptor model is known to produce less expressive descriptors for kids, which is why they're often wrongly clustered.