Benchmark Purpose: Evaluate long-running pipeline stability and measure resource utilization with Spark Master.
Pipeline Duration: 2025-06-11 22:08:46 to 2025-06-13 15:23:17
Total Records Processed: 2,020,000
Batch Size: 10,000 records
API Page Size: 30 records
ETL Host Instance: EC2 t3.large (7.56 GB usable RAM)
RAM Peak Usage: ~6 GB (Consistent)
Dockerized Infrastructure: Yes
Total Docker Containers: 10
Total Duration: 1 day, 17 hours, 14 minutes, 31 seconds
Data Source: Public EC2-hosted API (Ubuntu) - Free Tier
Data Target: PostgreSQL 15 (Aiven)
Benchmark Purpose: Test short job ETL performance without Spark Master.
Pipeline Duration: 2025-06-16 19:50:34.000 to 2025-06-17 00:46:27.000
Total Records Processed: 900,000
Batch Size: 100,000 records
API Page Size: 30 records
ETL Host Instance: EC2 t3.large (7.56 GB usable RAM)
RAM Peak Usage: ~5 GB (Consistent)
Dockerized Infrastructure: Yes
Total Docker Containers: 6
Total Duration: 4 hours, 55 minutes, 53 seconds
Data Source: Public EC2-hosted API (Ubuntu) - Free Tier
Data Target: PostgreSQL 15 (Aiven)