We implemented an automated data archiving solution that moves historical ERP data from Cloud SQL to Cloud Storage, achieving significant cost savings and improved query performance.
Our Cloud SQL PostgreSQL instance was growing rapidly with 5+ years of transactional data. Monthly costs exceeded $3,200 for a db-n1-highmem-8 instance, and query performance degraded as tables grew beyond 50M rows. We needed to retain data for compliance but rarely accessed records older than 18 months.
The automated archiving pipeline runs nightly via Cloud Scheduler, identifying records older than 18 months, exporting them to Cloud Storage in Parquet format, and purging them from Cloud SQL. We downsized to db-n1-highmem-4, reducing monthly costs to $1,900. Active query performance improved by 60% with smaller table scans. Archive storage costs under $100/month for 2TB of historical data.
How did you handle the compliance and audit requirements? Do you have a mechanism to query archived data when needed, or is it purely cold storage?
We went with Cloud Functions for orchestration and Cloud SQL export API for the heavy lifting. The function triggers daily, calls gcloud sql export to dump tables directly to Cloud Storage in CSV format, then a second function converts to Parquet using pandas. For your Oracle migration, you might need Dataflow if transformations are complex, but Cloud Functions worked perfectly for our straightforward archival needs. The export API handles large tables efficiently without memory issues.
This is exactly the kind of optimization we’re looking to implement. What tool did you use for the extraction and archiving process? We’re evaluating Cloud Functions vs Dataflow for similar use case with Oracle data.
Great question. We maintain a metadata table in Cloud SQL with archive locations and date ranges. For audits, we load specific archived files into BigQuery external tables on-demand. This gives us SQL query capability without permanent storage costs. The Parquet format makes this fast and cost-effective. We’ve run three audit queries in six months, each costing under $2 in BigQuery processing. For compliance, we set 7-year retention policies on the Cloud Storage buckets and enabled object versioning.