Guide to perform PII Scrubbing using SigNoz

Overview

PII scrubbing removes sensitive personal information from data to protect privacy and comply with regulations like GDPR and CCPA. SigNoz simplifies this process for your applications using OpenTelemetry. This guide explains how to implement PII scrubbing in your data pipeline before sending information to SigNoz.

Prerequisite

Send application logs to SigNoz

You need to configure OpenTelemetry Collector to send your logs in log files to SigNoz, checkout this documentation for detailed instructions on how to do this.

Process of Scrubbing PII Data

OpenTelemetry Collector offers a powerful transform processor to filter and modify sensitive data before it reaches SigNoz. We'll use this processor to perform PII scrubbing on our logs.

Sample log

Let's use the following sample log line to demonstrate the PII scrubbing process:

{"timestamp": "2024-07-05T12:34:56Z", "severity": "INFO", "message": "User john.doe@example.com (SSN: 123-45-6789) made a purchase with credit card 4111-1111-1111-1111", "user_id": "12345", "purchase_amount": 99.99}

This log contains several pieces of PII that we want to scrub: an email address, a social security number (SSN), and a credit card number inside the log body.

Configuring the Transform Processor

To scrub this PII, we'll use the transform processor in our OpenTelemetry Collector configuration. Trasform processor makes use of Regular Expressions for scrubbing your sensitive data. Here's how we can set it up:

processors:
  ...
  transform:
    log_statements:
      - context: log
        statements:
          # Scrub email address
          - replace_pattern(body, "(\\w+)@(\\w+)\\.(\\w+)", "****@$2.$3")
          # Scrub SSN
          - replace_pattern(body, "\\d{3}-\\d{2}-\\d{4}", "***-**-****")
          # Scrub credit card number
          - replace_pattern(body, "\\d{4}-\\d{4}-\\d{4}-\\d{4}", "****-****-****-****")

Let's break down what each statement does:

  • The first replace_pattern statement looks for email addresses and replaces the username part with asterisks.
  • The second statement finds patterns matching SSNs (###-##-####) and replaces them entirely with asterisks.
  • The third statement identifies credit card number patterns and replaces all digits with asterisks.

Applying the Processor

To apply this processor to your logs pipeline, add it to the processors list in your pipeline configuration:

service:
  pipelines:
    ...
    logs:
      receivers: [otlp, filelog/app]
      processors: [batch, transform]
      exporters: [otlp]

Result

After applying this configuration and restarting your OpenTelemetry Collector, the sample log line would be transformed to:

{
  "timestamp": "2024-07-05T12:34:56Z",
  "severity": "INFO",
  "message": "User ****@. (SSN: ***-**-****) made a purchase with credit card ****-****-****-****",
  "user_id": "12345",
  "purchase_amount": 99.99
}
Sample log output with PII data scrubbed out
Sample log output with PII data scrubbed out

As you can see, the email address, SSN, and credit card number have been scrubbed, protecting this sensitive information before it reaches SigNoz. Remember to adjust the regular expressions in the replace_pattern statements as needed to match the specific formats of PII in your logs. You can add more statements to handle additional types of sensitive data that may appear in your logs.

To know more about the capabilities of trasform processsor, checkout this documentation.

Get Help

If you need help with the steps in this topic, please reach out to us on SigNoz Community Slack.

If you are a SigNoz Cloud user, please use in product chat support located at the bottom right corner of your SigNoz instance or contact us at cloud-support@signoz.io.

Was this page helpful?