Great questions on implementation details. Let me share the complete architecture and lessons learned:
Data Volume & Performance:
We have 380K active customer records in SAP. Initial full extract took 45 minutes to process through validation and load to staging. Daily incremental updates average 2,500-3,500 modified records (about 0.7% change rate), completing in 8-12 minutes end-to-end including Tableau extract refresh. The incremental approach was crucial - a nightly full refresh would take too long and impact SAP system performance during early morning batch window.
Automated ETL Architecture:
-
SAP Data Services Job Flow:
- Step 1: Query SAP change documents (CDHDR/CDPOS tables) for customer changes since last run
- Step 2: Extract full customer records from KNA1 (general data), KNVV (sales area data), KNVP (partner functions)
- Step 3: Apply data validation rules (detailed below)
- Step 4: Load validated records to staging database table
- Step 5: Call Python script to trigger Tableau extract refresh via REST API
- Step 6: Send completion notification with statistics
-
Data Validation Rules (Critical for Master Data):
Completeness Validations:
- Customer name (KUNNR, NAME1) must be populated
- Sales organization (VKORG) must be valid active org
- Company code (BUKRS) must exist in T001 table
- Credit limit (KLIMK) must be numeric and non-negative
Format Validations:
- Country code (LAND1) must match ISO 3166-1 alpha-2
- Postal code format must match country postal rules (regex patterns)
- Email address (SMTP_ADDR) must pass RFC 5322 validation
- Phone numbers standardized to E.164 format
Business Rule Validations:
- Credit limit must be within approved range for customer classification (A/B/C tier)
- Sales territory (BZIRK) must exist in TVBZ table and be active
- Payment terms (ZTERM) must be valid in T052 table
- Customer account group (KTOKD) must allow sales order creation
Records failing validation are logged to exception table with specific error codes. BI team reviews exceptions daily and works with SAP master data team to correct source data.
- Incremental Update Strategy:
Using SAP change documents provides field-level change tracking:
SELECT DISTINCT cd.objectid as customer_id, cd.udate, cd.utime
FROM cdhdr cd
WHERE cd.objectclas = 'KUNNR' -- Customer master
AND cd.tcode IN ('XD02', 'VD02') -- Change transactions
AND cd.udate >= [last_run_date]
AND cd.utime >= [last_run_time]
This captures all customer modifications including:
- Direct changes via XD02 (central) or VD02 (sales view)
- Mass changes via LSMW or batch input
- Workflow-triggered updates
- Integration updates from other systems
- Handling Deleted Customers:
We keep soft-deleted customers in Tableau with status indicators:
- Extract includes deletion flag (LOEVM) and deletion block (SPERR) fields
- Tableau calculated field: [Customer Status] = IF [Deletion Flag] = ‘X’ THEN ‘Deleted’ ELSE ‘Active’
- Default dashboard filters show Active only, but users can include Deleted for historical analysis
- This preserves historical sales reporting accuracy when customers are deleted after orders were placed
- Customer Hierarchy & Territory Changes:
We maintain change history using Type 2 slowly changing dimension:
- Staging table includes effective_from and effective_to dates
- When sales territory changes, old record gets effective_to = change_date
- New record inserted with effective_from = change_date, effective_to = ‘9999-12-31’
- Tableau dashboards join on customer_id and report_date BETWEEN effective_from AND effective_to
- This allows accurate historical reporting: “What was this customer’s territory in Q3 2023?”
- Daily Tableau Extract Refresh:
Triggered via REST API from SAP Data Services:
# Pseudocode - Key implementation steps:
1. Authenticate to Tableau Server with service account credentials
2. Get datasource ID for 'SAP Customer Master' from REST API
3. POST request to /api/3.x/sites/{site-id}/datasources/{ds-id}/refresh
4. Poll refresh status every 30 seconds until complete
5. Log completion time and row counts to monitoring database
# See documentation: Tableau REST API Reference - Datasource Methods
Refresh completes in 3-5 minutes for incremental updates. The REST API approach ensures Tableau refresh only starts after SAP ETL validation succeeds, eliminating timing gaps.
- Data Quality Monitoring Dashboard:
We built a Tableau dashboard specifically for monitoring pipeline health:
- Validation failure trends by rule type (completeness, format, business rules)
- Top 10 customers with recurring validation failures
- Extract refresh timing and row count trends
- SAP change document volume by day (spike detection)
- Exception queue aging report (records awaiting manual review)
This dashboard is reviewed daily by BI team and weekly by SAP master data governance team. It surfaced data quality issues in SAP that were previously invisible.
Business Impact Results:
- Sales reporting accuracy improved from 87% to 99.4% (measured by correct territory assignment)
- BI team time savings: 8 hours/week manual export work eliminated
- Sales managers now have confidence in dashboard data, adoption increased 40%
- Data quality issues in SAP identified and resolved proactively (validation failure rate dropped from 2.3% to 0.4% over 6 months)
- Customer service improved: accurate customer data enables better territory routing and credit decisions
Lessons Learned:
- Start with comprehensive data validation - don’t assume SAP master data is clean
- Incremental updates are essential for large data volumes and daily refresh requirements
- Type 2 slowly changing dimensions add complexity but are critical for accurate historical reporting
- Programmatic Tableau refresh via REST API is more reliable than scheduled refreshes
- Data quality monitoring dashboard is essential for maintaining long-term pipeline health
- Partner with SAP master data team - automated pipeline exposes upstream data quality issues that need collaborative resolution
This implementation can be adapted for other SAP master data objects (materials, vendors, GL accounts) using the same architectural patterns. The key is balancing automation with data quality controls to ensure Tableau dashboards show accurate, trusted business data.