Great questions - let me walk through the complete implementation:
AWS Glue Zero-ETL SAP OData Replication:
We created a Glue connection to SAP’s OData endpoint with these settings:
ConnectionType: ODATA
URL: https://sap-prod.company.com:8000/sap/opu/odata/sap/API_MATERIAL_STOCK_SRV
AuthType: OAuth2
PollingInterval: 300 seconds
The Zero-ETL integration monitors SAP tables MARA (material master), MARD (storage location data), and MARC (plant data). Delta detection uses SAP’s LAEDA (last change date) field. Initial full load took 4 hours for 2.3M SKUs, then incremental syncs run every 5 minutes averaging 200-500 changed records.
Key configuration: Enable parallel processing in Glue job settings (DPU=10) to handle peak change volumes during business hours.
Redshift Serverless Data Warehouse:
We provisioned a Redshift Serverless namespace with 128 RPU base capacity. The Zero-ETL integration creates staging tables automatically:
sap_inventory.material_master
sap_inventory.stock_by_location
sap_inventory.plant_data
We then created aggregated views for common queries:
- inventory_summary_by_plant
- low_stock_alerts (items below reorder point)
- inventory_aging_analysis
Redshift Serverless auto-scales to 512 RPU during morning report generation (8-10 AM), then scales down. This elasticity saves 60% vs. provisioned clusters.
Bedrock Knowledge Base Indexing:
This was the most innovative part. We use Bedrock Knowledge Base with OpenSearch Serverless as the vector store:
- Created Lambda function triggered by Redshift data API whenever aggregate views update
- Lambda extracts inventory summaries and generates embeddings using Bedrock Titan
- Embeddings stored in OpenSearch with metadata (SKU, location, quantity, status)
- Business users query via natural language: “Show me critical stock items in Texas warehouses”
- Bedrock translates to semantic search, retrieves relevant vectors, then generates SQL for Redshift
The Knowledge Base doesn’t query Redshift directly - it uses embeddings for semantic matching, then generates precise SQL based on matched context.
IAM Least Privilege Configuration:
We implemented strict role separation:
GlueServiceRole: Read access to SAP OData, Write to Redshift staging
RedshiftServiceRole: Read from S3 (for Glue), Write to data warehouse
LambdaExecutionRole: Query Redshift data API, Invoke Bedrock, Write to OpenSearch
BusinessUserRole: Query Redshift views only (no raw table access)
All roles use condition keys to enforce resource-level permissions. For example, Glue can only write to sap_inventory schema, not other Redshift schemas.
VPC Private Connectivity Setup:
Security was critical - SAP is in on-premises data center:
- AWS Direct Connect (1 Gbps) links on-prem to AWS VPC
- Glue connections run in private subnets with VPC endpoints
- Redshift Serverless deployed in private subnet, no public access
- VPC endpoints for S3, Glue, Bedrock, OpenSearch (all traffic stays on AWS backbone)
- Security groups restrict Glue to SAP IP range only
- Network ACLs enforce additional layer of defense
No data traverses public internet. All communication encrypted with TLS 1.3.
Business Value & Use Cases:
The 70% improvement enabled several new capabilities:
- Real-time Stock Allocation: Sales can check availability during customer calls instead of waiting for overnight reports
- Predictive Replenishment: ML models on fresh data predict stockouts 3 days ahead vs. 1 day with old system
- Natural Language Analytics: Executives ask questions in plain English: “Which products have highest turnover in Q4?” - no SQL needed
- Cross-DC Optimization: Real-time view across all 23 DCs enables dynamic inventory balancing, reducing emergency transfers by 40%
- Supplier Collaboration: Near-real-time data shared with key suppliers via secure API for vendor-managed inventory
ROI: Project cost $180K (8 weeks × 2 engineers + AWS services). Annual savings: $520K from reduced emergency shipments, better inventory turns, and eliminated legacy data warehouse licenses.
The natural language interface was the game-changer for adoption. Business users who never touched SQL now run their own analyses. Report requests to IT dropped 65%.
Happy to share more details on specific components if helpful!