Skip to content

Latest commit

 

History

History
198 lines (141 loc) · 8.83 KB

README.md

File metadata and controls

198 lines (141 loc) · 8.83 KB

Format Preserving Encryption Masking

With the ff1 mask, it is possible to mask data by encryption while preserving its original format.

FF1 is a format-preserving block cipher algorithm recommended by the NIST.1

As of March 2021, FF1 is the only suitable FPE algorithm.2

The motivation for a FPE Mask in PIMO is to meet the requirement of re-identification of the original data (or pseudonymisation) as defined by GDPR in Europe.3

Encryption example

Consider the following stream of object to mask :

data.jsonl

{"siret": "01234567891234"}
{"siret": "01234567891234"}
{"siret": "12345678912340"}
{"siret": "23456789123401"}

The siret column is always a 14-digit string. This can be masked by FPE with the following configuration.

masking.yml

version: "1"
masking:
  - selector:
      jsonpath: "siret"
    mask:
      # use of the FF1 mask
      ff1:
        # radix 10 specify that only the 10 digits are used in the output format
        radix: 10
        # name of the environment variable containing the base64-encoded secret key (note: key length must be 128, 192, or 256 bits)
        keyFromEnv: "FF1_ENCRYPTION_KEY"

Here is the result of applying the above configuration.


NOTE

All command lines are listed in demo.sh.


$ # we first need to set the secret key to use with the proper variable name and encoding
$ export FF1_ENCRYPTION_KEY=$(echo -n "secret12secret12" | base64)

$ cat data.jsonl | pimo
{"siret":"96415668837614"}
{"siret":"94015424363597"}
{"siret":"31043158804356"}

Radix parameter explanation

FF1 uses a fixed domain definition (list of all allowed characters in an output encrypted string).

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

The radix parameter determine which part of the domain definition will actually be used. For example, a radix of 10 will produce values containing only digits (the 10 first characters of the full domain definition).

Therefore, the value of radix must less or equal than 62. Also a value of 1 or 0 is invalid.

Decryption example

To re-identify original data, use the same mask defined in the encryption example, but enable the decrypt option :

masking-decrypt.yml

version: "1"
masking:
  - selector:
      jsonpath: "siret"
    mask:
      # use of the FF1 mask
      ff1:
        # important: use the same radix parameter as for encryption (values will be decrypted incorrectly if 11 is used for example)
        radix: 10
        # use the same secret key as for encryption
        keyFromEnv: "FF1_ENCRYPTION_KEY"
        # activate decryption
        decrypt: true

Here is the result of applying the above configuration on the encrypted stream.


NOTE

All command lines are listed in demo.sh.


$ # the same key is re-used for encryption and decryption
$ export FF1_ENCRYPTION_KEY=$(echo -n "secret12secret12" | base64)

$ # the encrypted stream is generated by the same command line as before : cat data.jsonl | pimo
$ # the decryption is done by the second part : pimo -c masking-decrypt.yml
$ cat data.jsonl | pimo | pimo -c masking-decrypt.yml
{"siret":"01234567891234"}
{"siret":"12345678912340"}
{"siret":"23456789123401"}

Increase security with varying tweak parameter

The tweak is an optional parameter, that reduce the attack surface by using a varying value on each record. It can be considered as an extension of the secret key that change on each record, but is not necessarily kept secret.

Note that to re-identify (decrypt) each tweak will be required (and it must be possible to dispatch the tweaks to exactly the same records as in the encryption step).

Note also that, by using random tweak on each record, collisions can occurs in the output stream (that is not the case if tweak is not used or is a contant). However, such collisions in masked data is not a problem for re-identification of original data.

The tweaks can already be present in the data or can be generated by PIMO as in the following example :

masking-tweak.yml

version: "1"
masking:
  # add a tweakfield on each record of the jsonl stream
  - selector:
      jsonpath: "tweakfield"
    mask:
      add: ""
  # give the tweakfield a 8 character long random value
  - selector:
      jsonpath: "tweakfield"
    mask:
      regex: "[a-zA-Z0-9]{8}"
  - selector:
      jsonpath: "siret"
    mask:
      ff1:
        radix: 10
        keyFromEnv: "FF1_ENCRYPTION_KEY"
        # FF1 will use the value of the tweakfield column as a tweak parameter
        tweakField: "tweakfield"

Here is the result of applying the above configuration on the encrypted stream.


NOTE

All command lines are listed in demo.sh.


$ # the same key is re-used for encryption and decryption
$ export FF1_ENCRYPTION_KEY=$(echo -n "secret12secret12" | base64)

$ cat data.jsonl | pimo -c masking-tweak.yml
{"siret":"19309267052199","tweak":"gY6SpkUA"}
{"siret":"84107001872814","tweak":"l3rIYUkm"}
{"siret":"26786954568342","tweak":"P9k0XCRk"}

Note on security of FF1 algorithm

To be considered secure, the domain size of the cipher must be at least 1.000.0004

The domain size is given by the following formula :

ds = radixlen

With :

  • radix : the radix choosen by configuration in the mask definition
  • len : (minimum) length of the data to encrypt

Applied to the current examples, the domain size is :

ds = 1014 = 100 000 000 000 000

Which would be considered very secured.

Footnotes

1 “Recommendation for Block Cipher Modes of Operation: Methods for Format-Preserving Encryption”, Morris Dworkin, for NIST, March 2016.

2 “Recent Cryptanalysis of FF3”, NIST Website, 12 April 2017.

3 “Pseudonymization” in Wikipedia, Wikimedia Foundation, 26 January 2021.

4 “Methods for Format-Preserving Encryption: NIST Requests Public Comments on Draft Special Publication 800-38G Revision 1”, NIST Website, 28 February 2019.