Automated customer master data sync from SAP ERP to Tableau - daily extract refresh workflow

Sharing our implementation of automated customer master data sync from SAP ERP to Tableau Server 2023.2. We needed reliable daily updates of customer records for sales reporting dashboards used by 150+ sales managers.

Previously, the BI team manually exported customer data from SAP weekly, causing reports to show outdated customer attributes (address changes, credit limits, sales territories). This resulted in inaccurate territory assignments and missed sales opportunities.

We built an automated ETL pipeline using SAP Data Services that extracts customer master records nightly, validates data quality, and refreshes Tableau extracts on schedule. The solution handles incremental updates, data validation rules, and automatic error notifications. Sales reporting accuracy improved significantly, and BI team saved 8 hours per week of manual data export work.

One challenge we faced with SAP customer master sync was handling deleted customers. SAP doesn’t truly delete records, it marks them with deletion flags. How did you handle this in your Tableau extracts? Do you filter out deleted customers, or keep them with a status flag for historical reporting? Also curious about your approach to customer hierarchy changes when a customer gets reassigned to a different sales territory.

This is exactly the type of automation every organization with SAP should implement. Can you share more details on the data validation rules you built into the pipeline? We’re planning something similar but struggling to define what validations are critical for customer master data before it hits Tableau. Also curious about your incremental update strategy - are you using change pointers in SAP or timestamp-based detection?

We trigger Tableau extract refresh programmatically from the final step of our SAP Data Services job using Tableau Server REST API. This ensures the refresh only starts after ETL validation completes successfully. The Data Services job calls a Python script that authenticates to Tableau Server and initiates the extract refresh task. We monitor the refresh status and send email alerts if it fails. This tight integration eliminated the timing gap issue you mentioned.

For data validation, we implemented three layers: 1) Completeness checks (required fields like customer name, sales org, credit limit must be populated), 2) Format validation (postal codes match country format, email addresses valid syntax), 3) Business rule validation (credit limit within approved ranges, sales territory matches region mapping). Records failing validation go to an exception queue for manual review.

For incremental updates, we use SAP’s change document tables (CDHDR/CDPOS) to identify modified customer records since last extract. This is more reliable than timestamps because it captures all field-level changes. The ETL job queries change documents from the last successful run timestamp, extracts only modified customer IDs, then does full record pull for those customers.

How did you handle the Tableau extract refresh orchestration? Did you trigger the extract refresh from SAP Data Services after ETL completes, or use Tableau Server’s native scheduling? We’ve had issues with extract refresh timing - sometimes the ETL job completes but Tableau refresh hasn’t started yet, causing a gap where dashboards show stale data.

Impressive implementation. What’s your data volume and refresh time? We have about 500K customer records in SAP and considering a similar approach. Also, did you implement any data quality dashboards in Tableau to monitor the health of the automated pipeline? Would be useful to track validation failure rates and identify data quality issues in SAP upstream.

Great questions on implementation details. Let me share the complete architecture and lessons learned:

Data Volume & Performance: We have 380K active customer records in SAP. Initial full extract took 45 minutes to process through validation and load to staging. Daily incremental updates average 2,500-3,500 modified records (about 0.7% change rate), completing in 8-12 minutes end-to-end including Tableau extract refresh. The incremental approach was crucial - a nightly full refresh would take too long and impact SAP system performance during early morning batch window.

Automated ETL Architecture:

  1. SAP Data Services Job Flow:

    • Step 1: Query SAP change documents (CDHDR/CDPOS tables) for customer changes since last run
    • Step 2: Extract full customer records from KNA1 (general data), KNVV (sales area data), KNVP (partner functions)
    • Step 3: Apply data validation rules (detailed below)
    • Step 4: Load validated records to staging database table
    • Step 5: Call Python script to trigger Tableau extract refresh via REST API
    • Step 6: Send completion notification with statistics
  2. Data Validation Rules (Critical for Master Data):

Completeness Validations:

  • Customer name (KUNNR, NAME1) must be populated
  • Sales organization (VKORG) must be valid active org
  • Company code (BUKRS) must exist in T001 table
  • Credit limit (KLIMK) must be numeric and non-negative

Format Validations:

  • Country code (LAND1) must match ISO 3166-1 alpha-2
  • Postal code format must match country postal rules (regex patterns)
  • Email address (SMTP_ADDR) must pass RFC 5322 validation
  • Phone numbers standardized to E.164 format

Business Rule Validations:

  • Credit limit must be within approved range for customer classification (A/B/C tier)
  • Sales territory (BZIRK) must exist in TVBZ table and be active
  • Payment terms (ZTERM) must be valid in T052 table
  • Customer account group (KTOKD) must allow sales order creation

Records failing validation are logged to exception table with specific error codes. BI team reviews exceptions daily and works with SAP master data team to correct source data.

  1. Incremental Update Strategy:

Using SAP change documents provides field-level change tracking:


SELECT DISTINCT cd.objectid as customer_id, cd.udate, cd.utime
FROM cdhdr cd
WHERE cd.objectclas = 'KUNNR'  -- Customer master
AND cd.tcode IN ('XD02', 'VD02')  -- Change transactions
AND cd.udate >= [last_run_date]
AND cd.utime >= [last_run_time]

This captures all customer modifications including:

  • Direct changes via XD02 (central) or VD02 (sales view)
  • Mass changes via LSMW or batch input
  • Workflow-triggered updates
  • Integration updates from other systems
  1. Handling Deleted Customers: We keep soft-deleted customers in Tableau with status indicators:
  • Extract includes deletion flag (LOEVM) and deletion block (SPERR) fields
  • Tableau calculated field: [Customer Status] = IF [Deletion Flag] = ‘X’ THEN ‘Deleted’ ELSE ‘Active’
  • Default dashboard filters show Active only, but users can include Deleted for historical analysis
  • This preserves historical sales reporting accuracy when customers are deleted after orders were placed
  1. Customer Hierarchy & Territory Changes: We maintain change history using Type 2 slowly changing dimension:
  • Staging table includes effective_from and effective_to dates
  • When sales territory changes, old record gets effective_to = change_date
  • New record inserted with effective_from = change_date, effective_to = ‘9999-12-31’
  • Tableau dashboards join on customer_id and report_date BETWEEN effective_from AND effective_to
  • This allows accurate historical reporting: “What was this customer’s territory in Q3 2023?”
  1. Daily Tableau Extract Refresh: Triggered via REST API from SAP Data Services:
# Pseudocode - Key implementation steps:
1. Authenticate to Tableau Server with service account credentials
2. Get datasource ID for 'SAP Customer Master' from REST API
3. POST request to /api/3.x/sites/{site-id}/datasources/{ds-id}/refresh
4. Poll refresh status every 30 seconds until complete
5. Log completion time and row counts to monitoring database
# See documentation: Tableau REST API Reference - Datasource Methods

Refresh completes in 3-5 minutes for incremental updates. The REST API approach ensures Tableau refresh only starts after SAP ETL validation succeeds, eliminating timing gaps.

  1. Data Quality Monitoring Dashboard: We built a Tableau dashboard specifically for monitoring pipeline health:
  • Validation failure trends by rule type (completeness, format, business rules)
  • Top 10 customers with recurring validation failures
  • Extract refresh timing and row count trends
  • SAP change document volume by day (spike detection)
  • Exception queue aging report (records awaiting manual review)

This dashboard is reviewed daily by BI team and weekly by SAP master data governance team. It surfaced data quality issues in SAP that were previously invisible.

Business Impact Results:

  • Sales reporting accuracy improved from 87% to 99.4% (measured by correct territory assignment)
  • BI team time savings: 8 hours/week manual export work eliminated
  • Sales managers now have confidence in dashboard data, adoption increased 40%
  • Data quality issues in SAP identified and resolved proactively (validation failure rate dropped from 2.3% to 0.4% over 6 months)
  • Customer service improved: accurate customer data enables better territory routing and credit decisions

Lessons Learned:

  1. Start with comprehensive data validation - don’t assume SAP master data is clean
  2. Incremental updates are essential for large data volumes and daily refresh requirements
  3. Type 2 slowly changing dimensions add complexity but are critical for accurate historical reporting
  4. Programmatic Tableau refresh via REST API is more reliable than scheduled refreshes
  5. Data quality monitoring dashboard is essential for maintaining long-term pipeline health
  6. Partner with SAP master data team - automated pipeline exposes upstream data quality issues that need collaborative resolution

This implementation can be adapted for other SAP master data objects (materials, vendors, GL accounts) using the same architectural patterns. The key is balancing automation with data quality controls to ensure Tableau dashboards show accurate, trusted business data.