Deduplicate Kobo submissions using face pictures.
Note
Terms of Service: usage of dedupliface is permitted only
- for humanitarian programs involving the registration of people,
- to prevent duplicate registrations, whether caused by error or fraud,
- when no proof of legal identity is held by people assisted,
- when duplicates are validated by humanitarian workers, who ultimately decide if a person should (not) be included in a program,
- in combination with KoboToolbox.
Collection of face pictures and their use in dedupliface must be done in accordance with the IFRC Data Protection Policy.
The high-level workflow is:
- Create a Kobo form with a question of type
Photo
, with which you collect face pictures. - Connect the Kobo form with dedupliface using Kobo REST Services.
- When a new submission is uploaded to Kobo, an encrypted numerical representation of the face, a.k.a. an embedding, is saved in a dedicated vector database. The encryption key is unique to the Kobo form.
- Dedupliface checks which faces in the vector database are duplicate and stores the information in the Kobo database.
- Delete the encrypted embeddings from the vector database, for extra safety.
- Define which question in the Kobo form is used to get face pictures.
- Define which question in the Kobo form is used to mark duplicates (can be hidden in the form itself).
- Register a new Kobo REST Service and give it a descriptive name.
- Insert as
Endpoint URL
https://dedupliface.azurewebsites.net/add-face
- Add under
Custom HTTP Headers
:- In
Name
addkoboasset
and inValue
the ID of your Kobo form (asset) - In
Name
addkobotoken
and inValue
your Kobo API token (see how to get one) - In
Name
addkobofield
and inValue
the name of the question used for face pictures
- In
- Upload all submissions to Kobo
- Make a POST request to
https://dedupliface.azurewebsites.net/find-duplicate-faces
through the Swagger UI or whatever tool you prefer.
- Specify
koboasset
andkobotoken
in the headers, as before - Specify
kobofield
andkobovalue
in the request body, wherekobofield
is the name of the question used for marking duplicates andkobovalue
is the value that marks a duplicate (e.g.yes
)
- Your duplicate submissions will now be marked as such in KoboToolbox.
Synopsis: a dockerized python API that checks if face pictures in Kobo are duplicate.
Based on FastAPI and facenet-pytorch. Stores and queries face embeddings with a dedicate vector database, Azure AI Search. Uses Poetry for dependency management.
Encrypts face embeddings with two keys, one global and one unique to each Kobo form.
Create the .env
file for local environment variables
cp example.env .env
and edit them accordingly.
Then, with Uvicorn:
poetry install
uvicorn main:app --reload
or with Docker:
docker compose up --detach
- Create an App Service Plan Premium v3 P2V3 or above.
- Create an App Service Web App with the following settings:
Publish
:Docker Container
Operating System
:Linux
Region
: the same as the App Service Plan