We’re architecting a new reporting solution for our ERP system with 15TB of historical data in S3 (Parquet format). Business users need to run ad-hoc queries and scheduled reports through Tableau. Our team is debating between Athena serverless queries versus a Redshift managed cluster. The reporting workload is unpredictable - sometimes 200 queries/day, other times just 20 queries/day. Most queries scan 50-500GB of data across multiple years of financial and operational data. What are the real-world tradeoffs between these two approaches for ERP analytics workloads?
Run the actual cost math for your specific workload. If you’re scanning 50-500GB per query with 200 queries/day average, that’s 20-100TB scanned daily = $100-500/day in Athena ($3000-15000/month). A right-sized Redshift cluster (dc2.8xlarge 2-node) costs $9600/month with reserved pricing and handles unlimited queries. Redshift becomes cheaper above 2TB/day of scan volume. Also factor in Athena’s query result caching - repeated queries are free, which benefits dashboard refresh patterns.
The cost variability is concerning with Athena - $500/day on high-volume days adds up quickly. But I’m also worried about Redshift’s concurrency limits and the operational overhead. How do both solutions handle concurrent Tableau users refreshing dashboards? We have 40 business analysts who might all refresh reports Monday morning.
These perspectives are helpful. It sounds like the decision hinges on consistent vs sporadic usage patterns. Given our variable query load and the need for interactive Tableau performance, maybe Redshift Spectrum is the sweet spot? Keep hot data in Redshift, query cold historical data in S3?
Tableau integration works with both, but the experience differs significantly. Redshift provides better sub-second response for dashboard loads because data is pre-loaded and indexed. Athena queries have 2-3 second startup latency which feels sluggish in interactive dashboards. However, Tableau can cache results from either source. For your 40 concurrent analysts scenario, consider Redshift Spectrum as a hybrid - you get Redshift’s query performance and concurrency management while leaving cold data in S3 to control costs.
The unpredictable query volume you mentioned is exactly where Athena shines. With 200 queries scanning 500GB each, that’s 100TB scanned = $500/day in Athena. But on low-volume days with 20 queries, you’re only paying $50. A Redshift cluster costs $360/month whether you use it or not. Athena’s pay-per-query model makes more sense for sporadic workloads. Plus, you avoid cluster management overhead entirely - no resizing, no vacuum operations, no distribution key tuning.