

Google Cloud’s Data Loss Prevention (DLP) API is a powerful tool that helps groups detect, classify, and redact sensitive information, such as identifiable data (PII), credit card numbers, and other sensitive data, within their packages running on Google Cloud Platform (GCP). This blog demonstrates the way to integrate the DLP API into your application.
DLP API allows you to scan text, images, and records stored in GCP repositories for privacy-sensitive data. It provides over 120 integrated detectors, along with customization options via regex, dictionaries, or custom guidelines. After detection, you can redact, mask, tokenize, or rework sensitive content.
- Detect and redact PII in user data before storage.
- Mask credit card numbers in logs or results.
- Tokenize health records for compliance.
- Secure data moving into BigQuery, GCS, or other GCP services.
Activate the DLP API from the GCP Console Marketplace or via gcloud CLI:
gcloud services enable dlp.googleapis.com
You can authenticate using API keys or service accounts:
- API Keys: Suitable for some endpoints, especially for apps not tied to a specific user.
- Service Account: Preferred for server-to-server processing.
Create a service account with the roles/dlp.user permission. Generate and download its JSON key file.
Set the environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
DLP API provides client libraries for multiple languages. Example for Python:
pip install google-cloud-dlp
Here’s a simple Python example to detect and redact PII in text:
from google.cloud import dlp_v2dlp = dlp_v2.DlpServiceClient()
project = "your-gcp-project-id"
parent = f"projects/{project}"
item = {"value": "User name: John Doe, Email: john@example.com, Card: 4111-1111-1111-1111"}
inspect_config = {
"info_types": [{"name": "EMAIL_ADDRESS"}, {"name": "CREDIT_CARD_NUMBER"}],
"include_quote": True,
}
response = dlp.inspect_content(
request={
"parent": parent,
"inspect_config": inspect_config,
"item": item,
}
)
# Redact sensitive info using DLP API
redact_config = [
{"info_type": {"name": "EMAIL_ADDRESS"}, "replace_with": "[EMAIL]"},
{"info_type": {"name": "CREDIT_CARD_NUMBER"}, "replace_with": "[CARD]"}
]
redacted_response = dlp.deidentify_content(
request={
"parent": parent,
"deidentify_config": {"info_type_transformations": {"transformations": redact_config}},
"item": item,
"inspect_config": inspect_config,
}
)
print(redacted_response.item.value)
This code detects and redacts sensitive info in the item[“value”] before storing it.
You can also invoke DLP API in GCP workflows:
- Batch Processing: Use DLP with Dataflow or Dataproc for batch jobs on files in Cloud Storage.
- Real-Time Processing: Trigger DLP inspections on streaming data using Cloud Functions or Pub/Sub.
- Scheduled Scanning: Set up triggers/cron jobs for regular scans of BigQuery tables or GCS buckets.
- Store API keys and credentials securely.
- Use activity triggers and batch processing for large datasets.
- Export type and inspection consequences for compliance audits.
- Tune detectors and masking guidelines to your corporation’s wishes.
Google Cloud DLP API seamlessly allows sensitive data protection and compliance for apps running on GCP. Whether you’re scanning user-entered files or logs, the API’s plug-and-play approach enables you to perceive, redact, or tokenize sensitive facts before exposure or storage. Scale with batch jobs, integrate in real-time, and hold privacy with advanced detection abilties.
Resources: GCP Documentation of DLP
Source Credit: https://medium.com/google-cloud/implementing-google-cloud-dlp-api-in-your-gcp-application-f84fe6cb8ad7?source=rss—-e52cf94d98af—4