[BUG]Incorrect Behavior of Obfuscate Processor with Predefined Pattern "%{CREDIT_CARD_NUMBER}" #4340

anudasari20 · 2024-03-26T22:56:03Z

Describe the bug
The issue arises when utilizing the predefined pattern "%{CREDIT_CARD_NUMBER}" with the obfuscate processor in the OSI pipeline. The expected behavior is for the processor to exclusively mask credit card information within logs while leaving non-personally identifiable information (non-PII) fields untouched. However, in our current environment, we have observed that the obfuscate processor is erroneously masking non-PII fields such as trackingId and sdsStayGuid. This unintended behavior complicates troubleshooting efforts for application teams as critical data points become obscured.

Attaching some sceenshots where the data has been masked,

Expected behavior
When employing the patterns configuration option, users expect seamless integration with a predefined set of obfuscation patterns for common fields. Specifically, the obfuscate processor should seamlessly implement the predefined pattern "%{CREDIT_CARD_NUMBER}" without encountering errors. It is imperative that this processor selectively masks only credit card values within logs, while abstaining from obscuring any other field values that may resemble credit card patterns.

The trackingId's should not be masked as shown in this screenshot,

Resolution:
To rectify this issue, the implementation of the obfuscate processor requires refinement. The processor should be updated to accurately discern and mask solely credit card numbers within logs, adhering strictly to the predefined "%{CREDIT_CARD_NUMBER}" pattern. This necessitates a thorough review and potential adjustment of the pattern matching algorithm employed by the processor. Furthermore, comprehensive testing is essential to validate the updated processor's efficacy across diverse log scenarios, ensuring that it effectively safeguards credit card information while preserving the integrity of non-PII fields.

Steps to Reproduce:

Configure the obfuscate processor within the OSI pipeline, utilizing the predefined pattern "%{CREDIT_CARD_NUMBER}".
Analyze logs containing a mixture of credit card numbers and non-PII fields.
Observe whether non-PII fields are erroneously masked alongside credit card numbers, impeding the troubleshooting process for application teams.

Example confgiuration

- obfuscate:
        source: 'data'
        patterns:
          - '%{CREDIT_CARD_NUMBER}'
        action:
          mask:
            mask_character: "&"
            mask_character_length: 10

Environment (please complete the following information):

OS: Amazon EC2 - Linux/UNIX
Version : AML 2.0
Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

dlvenable · 2024-04-02T19:37:56Z

Pattern:

data-prepper/data-prepper-plugins/obfuscate-processor/src/main/java/org/opensearch/dataprepper/plugins/processor/obfuscation/CommonPattern.java

Line 12 in b7c63bc

CREDIT_CARD_NUMBER("(\\d[ -]*?){13,16}"),

Utkarsh-Aga · 2024-04-08T06:21:12Z

Hello @dlvenable,
Just wanted to check, Would modifying the current pattern "(\\d[ -]*?){13,16}" to "\\b(?:\\d[ -]*?){13,16}\\b", help in this particular scenario ?

Utkarsh-Aga · 2024-04-16T09:56:53Z

Tested the scenario at my end and could observe the following -

Using Pattern - `(\\d[ -]*?){13,16}`

Input Data	Output Data
fd55555069-e7a9-11ee4111111111111111	fd55555069-e7a9-11ee##########
4111111111111111	##########
fd55555069-e7a9-11ee-91	fd55555069-e7a9-11ee-91

Using Pattern - `\\b(?:\\d[ -]*?){13,16}\\b`

Input Data	Output Data
fd55555069-e7a9-11ee4111111111111111	fd55555069-e7a9-11ee4111111111111111
4111111111111111	##########
fd55555069-e7a9-11ee-91	fd55555069-e7a9-11ee-91

So, based on the above, I feel that we can update the CREDIT_CARD_NUMBER pattern from (\\d[ -]*?){13,16} to \\b(?:\\d[ -]*?){13,16}\\b.

@dlvenable - Any comments on this ?

dlvenable · 2024-04-24T16:21:44Z

@Utkarsh-Aga , Thank you for looking into this.

It seems the root of your solution is to add the word boundary (\b). But, what if there is a concatenation?

e.g.

visa4111111111111111

or

creditcard4111111111111111

I believe this would not match.

One option would be to add a configuration in the obfuscate processor itself to allow for word boundaries (e.g. single_word_only). Then any pattern could have this setting.

- obfuscate:
        source: "log"
        target: "new_log"
        single_word_only: true
        patterns:
          - '%{CREDIT_CARD_NUMBER}'

dlvenable · 2024-05-14T00:21:23Z

The solution for this will be to use single_word_only: true starting in Data Prepper 2.8.

dlvenable · 2024-05-16T16:37:31Z

We are backporting this to 2.8 to include in that release.

dlvenable · 2024-05-16T16:37:44Z

#4550

anudasari20 added bug Something isn't working untriaged labels Mar 26, 2024

dlvenable added this to the v2.7.1 milestone Apr 2, 2024

dlvenable removed the untriaged label Apr 2, 2024

dlvenable modified the milestones: v2.7.1, v2.8 Apr 2, 2024

dlvenable removed this from the v2.8 milestone Apr 16, 2024

Utkarsh-Aga mentioned this issue Apr 30, 2024

Adding 'single_word_only' option to obfuscate processor #4476

Merged

4 tasks

dlvenable added this to the v2.8 milestone May 14, 2024

dlvenable closed this as completed May 14, 2024

dlvenable modified the milestones: v2.8, v2.9 May 15, 2024

dlvenable mentioned this issue May 16, 2024

Release Notes for version 2.8 #4538

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]Incorrect Behavior of Obfuscate Processor with Predefined Pattern "%{CREDIT_CARD_NUMBER}" #4340

[BUG]Incorrect Behavior of Obfuscate Processor with Predefined Pattern "%{CREDIT_CARD_NUMBER}" #4340

anudasari20 commented Mar 26, 2024 •

edited by graytaylor0

Loading

dlvenable commented Apr 2, 2024

Utkarsh-Aga commented Apr 8, 2024

Utkarsh-Aga commented Apr 16, 2024

dlvenable commented Apr 24, 2024

dlvenable commented May 14, 2024

dlvenable commented May 16, 2024

dlvenable commented May 16, 2024

[BUG]Incorrect Behavior of Obfuscate Processor with Predefined Pattern "%{CREDIT_CARD_NUMBER}" #4340

[BUG]Incorrect Behavior of Obfuscate Processor with Predefined Pattern "%{CREDIT_CARD_NUMBER}" #4340

Comments

anudasari20 commented Mar 26, 2024 • edited by graytaylor0 Loading

dlvenable commented Apr 2, 2024

Utkarsh-Aga commented Apr 8, 2024

Utkarsh-Aga commented Apr 16, 2024

Using Pattern - (\\d[ -]*?){13,16}

Using Pattern - \\b(?:\\d[ -]*?){13,16}\\b

dlvenable commented Apr 24, 2024

dlvenable commented May 14, 2024

dlvenable commented May 16, 2024

dlvenable commented May 16, 2024

anudasari20 commented Mar 26, 2024 •

edited by graytaylor0

Loading

Using Pattern - `(\\d[ -]*?){13,16}`

Using Pattern - `\\b(?:\\d[ -]*?){13,16}\\b`