Derek Merck derek_merck@brown.edu
Brown University and Rhode Island Hospital
Winter 2018
Source: https://www.github.com/derekmerck/DIANA
Documentation: https://diana.readthedocs.io
Unique, deterministic study ids, psuedonyms, and pseudodobs for all!
- Python 3.6
- python-dateutil
To use it as a Python library:
>>> import GUIDMint
>>> mint = GUIDMint.PseudoMint()
>>> mint.mint_guid( "MERCK^DEREK^L" )
u'AYJOAUVBBT54F6TP'
Multiple algorithms ('mints') are available. The md5
mint simply hashes the name
parameter to create a name and id, and generates an approximate-date-of-birth.
$ curl "localhost:5000/guid/md5/pseudo_id?value=MERCK^DEREK^L"
{ "dob": "1966-03-16", "gender": "U", "guid": "392ec5209964bfad", "name": "392ec5209964bfad"}
Other mint classes can be created by overriding basic functionality and then easily plugged into the architecture.
This must generate a unique and reproducibly generated tag against any consistent set object-specific variables.
Generation method:
- A
value
parameter is passed in; depending on the available data, this may be a patient name, an MRN, or a subject ID, or any unique combination of those elements along with gender and dob - The sha256 hash of the value is computed and the result is encoded into base32
- If the first three characters are not alphabetic, the value is rehashed until it is (for pseudonym generation)
- By default only the 64 bit prefix is used and any padding symbols are stripped.
It is often useful to replace the subject name with something more natural than a GUID.
Any string beginning with at least 3 (capitalized) alphabetic characters can be used to reproducibly generate a "John Doe" style placeholder name in DICOM patient name format (last^first^middle
). This is very useful for alphabetizing subject name lists similarly to their ID while still allowing for anonymized data sets to be referenced according to memorable names.
Generation method:
- A
guid
parameter is requried andgender
(M,F,U) is optional (defaults to U) - Using the
guid
as a random seed, a gender-appropriate first name and gender-neutral family name is selected from a uniform distribution taken from the US census - The result is returned in DICOM patient name format.
$ curl "localhost:5000/guid/pseudonym/pseudo_id?value=MERCK^DEREK^L&gender=M"
{"dob": "1956-02-03", "gender": "M", "guid": "MLSUJGK22EKMCMBX", "name": "MEMS^LIONEL^S"}
$ curl "localhost:5000/guid/pseudonym/pseudo_id?value=MERCK^DEREK^L&gender=M"
{"dob": "1961-03-20", "gender": "F", "guid": "IRF4WKGJGW36GQKJ", "name": "IACOPINO^RANDA^F"}
Note that each (value, gender, dob) tuple will result in a unique ID!
The default name map can be easily replaced to match your fancy (Shakespearean names, astronauts, children book authors). And with slight modification, a DICOM patient name with up to 5 elements could be generated (i.e., in last^first^middle^prefix^suffix
format).
As with pseudonyms, it can be useful to maintain a valid date-of-birth (dob) in de-identified metadata. Using a GUID as a seed, any dob can be mapped to a random nearby date for a nearly-age-preserving anonymization strategy. This is useful for keeping an approximate patient age available in a data browser.
Generation method:
- A
dob
parameter in%Y-%m-%d
format andguid
parameter are required - Using the
guid
as a random seed, a random integer between -165 and +165 is selected - The original
dob
+ the random delta in days is returned
A pseudo-id is merely an alias for generating a GUID, pseudonym, and pseudo-dob from a subject name/id/mrn, gender, and dob.
Generation method:
- An initial
value
is parameter is required, eitherdob
in%Y-%m-%d
format orage
parameter is optional (defaults to a uniform random value between 19 and 65), and agender
parameter (M,F,U) is optional (defaults to U) - If
age
is given, it is converted to adob
estimate usingdob=now()-365.25*age
- A
guid
is computed using the concatenation ofvalue|dob|gender
as a seed (thus, theguid
is not the same as theguid
hash of only the initial value) - A pseudonym and pseudodob are computed as above
- The
guid
and newname
anddob
are returned
- Inspired in part by the NDAR and FITBIR GUID schema.
- Placeholder names inspired by the Docker names generator