Guide to perform PII Scrubbing using SigNoz
Overview
PII scrubbing removes sensitive personal information from data to protect privacy and comply with regulations like GDPR and CCPA. SigNoz simplifies this process for your applications using OpenTelemetry. This guide explains how to implement PII scrubbing in your data pipeline before sending information to SigNoz.
Prerequisite
- SigNoz Cloud account
- Logs in log file
Send application logs to SigNoz
You need to configure OpenTelemetry Collector to send your logs in log files to SigNoz, checkout this documentation for detailed instructions on how to do this.
Process of Scrubbing PII Data
OpenTelemetry Collector offers a powerful transform processor to filter and modify sensitive data before it reaches SigNoz. We'll use this processor to perform PII scrubbing on our logs.
Sample log
Let's use the following sample log line to demonstrate the PII scrubbing process:
{"timestamp": "2024-07-05T12:34:56Z", "severity": "INFO", "message": "User john.doe@example.com (SSN: 123-45-6789) made a purchase with credit card 4111-1111-1111-1111", "user_id": "12345", "purchase_amount": 99.99}
This log contains several pieces of PII that we want to scrub: an email address, a social security number (SSN), and a credit card number inside the log body.
Configuring the Transform Processor
To scrub this PII, we'll use the transform processor in our OpenTelemetry Collector configuration. Trasform processor makes use of Regular Expressions for scrubbing your sensitive data. Here's how we can set it up:
processors:
...
transform:
log_statements:
- context: log
statements:
# Scrub email address
- replace_pattern(body, "(\\w+)@(\\w+)\\.(\\w+)", "****@$2.$3")
# Scrub SSN
- replace_pattern(body, "\\d{3}-\\d{2}-\\d{4}", "***-**-****")
# Scrub credit card number
- replace_pattern(body, "\\d{4}-\\d{4}-\\d{4}-\\d{4}", "****-****-****-****")
Let's break down what each statement does:
- The first replace_pattern statement looks for email addresses and replaces the username part with asterisks.
- The second statement finds patterns matching SSNs (###-##-####) and replaces them entirely with asterisks.
- The third statement identifies credit card number patterns and replaces all digits with asterisks.
Applying the Processor
To apply this processor to your logs pipeline, add it to the processors list in your pipeline configuration:
service:
pipelines:
...
logs:
receivers: [otlp, filelog/app]
processors: [batch, transform]
exporters: [otlp]
Result
After applying this configuration and restarting your OpenTelemetry Collector, the sample log line would be transformed to:
{
"timestamp": "2024-07-05T12:34:56Z",
"severity": "INFO",
"message": "User ****@. (SSN: ***-**-****) made a purchase with credit card ****-****-****-****",
"user_id": "12345",
"purchase_amount": 99.99
}
As you can see, the email address, SSN, and credit card number have been scrubbed, protecting this sensitive information before it reaches SigNoz. Remember to adjust the regular expressions in the replace_pattern
statements as needed to match the specific formats of PII in your logs. You can add more statements to handle additional types of sensitive data that may appear in your logs.
To know more about the capabilities of trasform processsor, checkout this documentation.
Get Help
If you need help with the steps in this topic, please reach out to us on SigNoz Community Slack.
If you are a SigNoz Cloud user, please use in product chat support located at the bottom right corner of your SigNoz instance or contact us at cloud-support@signoz.io.