Enabling VPC Flow Logs to audit network traffic for PCI compliance in a multi-account AWS environment

Sharing our implementation of VPC Flow Logs for PCI DSS compliance audit requirements. Payment processing environment needed complete network traffic visibility with immutable audit trails and query capabilities for security investigations.

Implemented centralized VPC Flow Logs aggregation across 12 VPCs in our cardholder data environment. Key requirements were S3 object lock for immutability to prevent tampering and Athena for audit queries during quarterly PCI assessments. QSA auditors specifically requested ability to prove all network traffic to/from CDE is logged and queryable.

Setup took about 3 weeks including testing with our QSA. Documenting the architecture and lessons learned for others facing similar PCI audit requirements.

How are you handling Athena query performance? We tried similar setup but queries were taking 5+ minutes. Did you use Glue Data Catalog or manual table definitions? Any optimization tips for faster audit queries?

Here’s our complete implementation architecture that addresses VPC Flow Logs aggregation, S3 object lock for immutability, and Athena for audit queries:

VPC Flow Logs Aggregation Setup: Enabled VPC Flow Logs on all 12 VPCs in our cardholder data environment with custom format to capture additional fields required by PCI auditors. Flow logs publish directly to centralized S3 bucket in us-east-1 with prefix structure: vpc-flow-logs/year=YYYY/month=MM/day=DD/vpc-id=vpc-xxxxx/. This partitioning strategy is crucial for efficient Athena queries.

Configuration applied to each VPC:

  • Traffic Type: ALL (accepted and rejected traffic)
  • Destination: S3 bucket with server-side encryption (SSE-S3)
  • Log Format: Custom format including srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, log-status
  • Max Aggregation Interval: 1 minute for near real-time visibility

S3 Object Lock for Immutability: The centralized S3 bucket configuration ensures immutability for PCI compliance:

  • Object Lock enabled in compliance mode (cannot be disabled, even by root account)
  • Default retention period: 7 years to exceed PCI requirement
  • Bucket versioning enabled (required for Object Lock)
  • MFA Delete enabled for additional protection
  • Bucket policy denies s3:DeleteObject and s3:DeleteObjectVersion for all principals
  • Cross-region replication to backup bucket (also with Object Lock) in us-west-2

S3 Lifecycle policy transitions:

  • 0-90 days: S3 Standard
  • 91-365 days: S3 Standard-IA
  • 366 days+: S3 Glacier Deep Archive

This tiered approach reduced storage costs by 65% while maintaining immutability and compliance.

Athena for Audit Queries: Implemented automated Glue Data Catalog setup:

  • Glue Crawler runs daily to discover new partitions (year/month/day/vpc-id)
  • Created Glue ETL job that converts raw VPC Flow Logs to Parquet format with Snappy compression
  • Parquet conversion happens nightly, reduces data scanned by Athena by 85%
  • Separate Athena workgroup for compliance team with query result encryption

Common audit queries saved as named queries:

  1. All rejected traffic to CDE subnets (identifies potential attacks)
  2. Traffic from specific IP to payment application servers (incident investigation)
  3. Unusual port access patterns (anomaly detection)
  4. Data transfer volumes by source/destination (data exfiltration detection)

Query performance: Average audit query completes in 8-12 seconds scanning 2-5GB of data.

PCI Audit Evidence Collection: During quarterly PCI assessments, QSA auditors use Athena to:

  • Verify all network traffic to/from CDE is logged (Requirement 10.2.7)
  • Investigate security incidents by querying specific timeframes
  • Validate network segmentation by confirming no unauthorized traffic flows
  • Generate reports showing rejected connection attempts

Created Athena views that pre-filter for CDE-relevant traffic, making it easy for auditors to run queries without understanding complex VPC Flow Log schema.

Automation and Monitoring:

  • CloudWatch Events trigger Lambda function when new VPC is created, automatically enabling Flow Logs
  • SNS notification if Flow Log delivery to S3 fails
  • CloudWatch dashboard showing log delivery metrics and S3 bucket size
  • Monthly cost allocation report for flow logs infrastructure

Cost Analysis: Monthly costs for 12 VPCs with ~800GB flow logs:

  • VPC Flow Logs publishing: $360 (12 VPCs × $30/VPC)
  • S3 storage (tiered): $285
  • Cross-region replication: $120
  • Glue Crawler and ETL: $85
  • Athena queries: $40 (during audit periods) Total: ~$890/month

Lessons Learned:

  1. Enable Flow Logs early - historical data cannot be retroactively generated
  2. Parquet conversion is essential for reasonable Athena query costs and performance
  3. Object Lock compliance mode requires careful planning - retention cannot be shortened
  4. Custom log format provides better audit evidence than default format
  5. Partition projection in Athena eliminates need for Glue Crawler in some scenarios

PCI Audit Outcome: QSA validated our implementation fully satisfies PCI DSS Requirements 10.2.7 (audit trail for all access to network resources and cardholder data) and 10.5 (secure audit trails). The immutability provided by S3 Object Lock and queryability through Athena significantly streamlined the audit process. What previously took 2 days of manual log analysis now takes 2 hours of running pre-built Athena queries.

This architecture successfully addresses all three focus areas: centralized VPC Flow Logs aggregation provides comprehensive network visibility, S3 Object Lock ensures immutability for compliance evidence, and Athena enables efficient audit queries for PCI assessments.

Performance optimization is all about partitioning. Use Glue crawler to automatically detect partitions by date. Convert logs to Parquet format using Glue ETL job - this reduced our query times from minutes to seconds. Also create views in Athena for common audit queries so QSA doesn’t need to write complex SQL every time.

We enabled ALL traffic logging - no sampling or filtering. PCI auditors want complete visibility. Volume is significant (about 800GB/month across all VPCs) but S3 storage costs are manageable. Used S3 Intelligent-Tiering to automatically move older logs to cheaper storage tiers. The key was partitioning strategy in Athena - organized by date and VPC ID so queries only scan relevant data. Typical audit query scans less than 5GB even when looking at full day’s traffic.

This is exactly what PCI Requirement 10.3 needs - comprehensive network audit trails. How did you handle the log volume? VPC Flow Logs can generate massive amounts of data in high-traffic environments. Did you implement any filtering or sampling?

Object Lock configuration is critical. We initially used governance mode but QSA required compliance mode for PCI evidence. Also make sure your bucket policy prevents any deletion or modification. Did you set up separate S3 bucket for flow logs or combine with other audit logs?

Separate dedicated bucket for VPC Flow Logs with Object Lock in compliance mode and 7-year retention. Bucket policy denies all delete operations and restricts access to security team only. Cross-region replication enabled to backup region for disaster recovery. QSA was satisfied with this immutability setup. Also implemented CloudTrail logging on the S3 bucket itself to track any access attempts.