The BigQuery Connector for SAP SLT version 2.9 and later offers enhanced flexibility for replicating SAP data into BigQuery. While continuing to support the established legacy streaming method, this version introduces a powerful new option: leveraging Google Cloud Pub/Sub for Change Data Capture (CDC), designed to work seamlessly with BigQuery’s native CDC capabilities. This new feature streamlines the path to real-time analytics by minimizing the need for complex post-load data merging.
The Evolution of SAP to BigQuery Streaming
The BigQuery Connector for SAP SLT has traditionally utilized the legacy BigQuery streaming API. This method appends every change (insert, update, delete) from SAP as a new row in the BigQuery table. This approach creates a complete historical log of all transactions, which is invaluable for audit trails and point-in-time analysis. To obtain a current-state view, users typically implement additional deduplication logic, often using tools like Cloud Composer and BigQuery DML statements (e.g., MERGE) to consolidate these changes, as sometimes seen in earlier CDC processing patterns.
What’s New in Version 2.9: Real-Time CDC with Pub/Sub
Version 2.9 introduces an alternative, event-driven architecture using Pub/Sub, available alongside the legacy streaming option:
- SAP SLT Captures Changes: SAP LT Replication Server (SLT) captures inserts, updates, and deletes from the source SAP tables.
- Publish to Pub/Sub: When configured for this new mode, the connector transforms changes into Avro-compliant JSON messages. These include key CDC fields like _CHANGE_TYPE (UPSERT, DELETE) and _CHANGE_SEQUENCE_NUMBER for ordering, and publishes them to a Google Cloud Pub/Sub topic.
- BigQuery Subscription: A BigQuery push subscription pulls messages from the Pub/Sub topic.
- Native BigQuery CDC: This subscription uses the BigQuery Storage Write API, which natively processes the CDC metadata to apply changes directly to the target BigQuery table as true upsert and delete operations. A primary key on the BigQuery table is required.
- Error Handling: Failed messages can be routed to a Dead-Letter Queue (DLQ) for enhanced reliability.
Architecture (CDC with Pub/Sub):
Key Advantages of the New Pub/Sub Approach:
- Native CDC: Direct application of upserts and deletes for a current-state view.
- Reduced Deduplication Complexity: Minimizes the need for separate merge processes for real-time analytics.
- Cost Efficiency: Potential reduction in BigQuery costs by avoiding frequent, large-scale DML operations.
- Real-Time Insights: Data is updated in BigQuery in near real-time.
- Enhanced Reliability & Scalability: Leverages Pub/Sub’s robust messaging.
- Simplified Management: Aids in creating and managing necessary GCP artifacts.
Choosing the Right Replication Method in v2.9:
- Legacy Streaming API: Recommended when a full, append-only audit log of all changes is required in the BigQuery table. This provides a complete history, with current-state views derived through subsequent querying or processing.
- New CDC with Pub/Sub: Ideal for use cases needing a near real-time, consolidated ‘current-state’ view of the SAP data directly in the target BigQuery table, optimal for immediate analysis and operational reporting.
Both modes can be configured within the same SLT environment, potentially for different tables or use cases.
See it in Action:
- Watch the demo to see the Pub/Sub CDC feature
Resources & Getting Started:
Upgrade to BigQuery Connector for SAP SLT v2.9 or later to utilize these flexible replication options.
Source Credit: https://medium.com/google-cloud/enhancing-sap-data-integration-real-time-cdc-with-bigquery-connector-for-sap-slt-v2-9-e66198772a09?source=rss—-e52cf94d98af—4