Serverless batch processing with OCI Functions and Object Storage for ML inference on compute-intensive datasets

We recently migrated our nightly batch processing workload from traditional compute instances to a fully serverless architecture using OCI Functions and Object Storage. The use case involved processing large CSV files (500MB-2GB) that vendors upload daily to our landing bucket.

The implementation leverages Object Storage event triggers to automatically invoke functions when new files arrive. We packaged our Python-based processing logic into container images deployed as OCI Functions, which parse the CSVs, validate data, and write results to a database.

The serverless scaling has been impressive - functions spin up based on file arrival, process in parallel, and shut down automatically. We’ve eliminated manual scheduling and reduced operational overhead significantly. Processing time dropped from 45 minutes to 12 minutes on average due to parallel execution.

Happy to share our architecture patterns and lessons learned for anyone considering similar serverless batch implementations.

Also curious about monitoring. With traditional VMs we had standard OS metrics and logs. How do you monitor function performance and troubleshoot failures in this serverless setup? Are you using OCI Logging and Monitoring services, or something else?

The chunking approach is smart. How are you managing the Object Storage event triggers? We’ve had issues with duplicate events in the past when using cloud events. Do you have any deduplication logic in your functions?

Good catch - we did implement idempotency handling. Each function checks a processing status table at the start using the object name and upload timestamp as a composite key. If a record exists with status ‘processing’ or ‘completed’, the function exits gracefully. We also use Object Storage tags to mark processed files.

For the event triggers themselves, we configured Events Service rules that filter on ObjectCreated events with a prefix match for our landing bucket path. The rule invokes the function through an action. We haven’t seen duplicate processing since adding the idempotency checks, though we do occasionally see duplicate event deliveries logged.

This sounds like exactly what we need! We’re currently running cron jobs on compute instances for similar file processing. A few questions: How did you handle the container packaging for OCI Functions? Did you use the Fn Project CLI or Docker directly? Also, what’s the maximum execution time you’ve seen for a single function invocation?