Introduction
Agentic AI enables businesses to unlock massive amounts of value for their customers. In some cases, that involves solving conventionally familiar problems so that agents can focus on harder learning problems. For example, Aampe enables businesses to provision and manage an agentic learning agent for every one of its customers. Those agents need to process massive volumes of events in real-time while keeping costs under control.
So we've enabled our agents to operate through an ELT (Extract, Load, Transform) pipeline that can scale to handle billions of events and users without sacrificing performance or cost efficiency. In this post, we'll dive deep into our architecture, cost optimization strategies, and how we've achieved this scale.
The Challenge: Scale Meets Cost Efficiency
When processing events at scale, four primary costs dominate:
Ingress Costs: The cost of data entering cloud services (often $0 initially, but can add up)
Egress Costs: The cost of data leaving customer systems (can be significant)
Storage Costs: The cost of storing and querying that data over time
Compute Costs: The cost of processing and transforming data
Our Architecture
Apache Beam: Distributed Processing at Scale
We leverage Apache Beam running on Google Cloud Dataflow to process events in a distributed, scalable manner. This allows us to:
Process multiple customers in parallel: Each customer's data pipeline runs independently
Handle varying data volumes: From small customers with thousands of events to enterprise customers with millions
Scale workers dynamically: Configure worker counts (10-100 workers) based on data volume and customer needs
Process multiple file formats: Parquet, NDJSON, Avro, CSV - all handled efficiently
Multi-Source Ingestion
Our pipeline supports ingestion from multiple sources:
S3 buckets (with cross-account role assumption)
Google Cloud Storage
Snowflake (direct query-based extraction)
BigQuery (cross-project data pulls)
Streaming sources (via staging tables)
This flexibility allows us to work with customers regardless of their existing data infrastructure.
Cost Optimization Strategies
1. Eliminating Data Duplication: Direct Access Pattern
Customer Service Account Pattern: When ingesting from customer GCS buckets, we use customer service accounts to read data directly without copying it to our buckets first:
Benefits:
No data duplication: We read directly from customer buckets, avoiding the need to copy data to Aampe's buckets
Reduced storage costs: We don't maintain duplicate copies of customer data in our buckets
Faster processing: Eliminates the intermediate copy step
Note: While GCS ingress is free, the real savings come from avoiding duplicate storage
Cost Impact:
Traditional approach: Copy 1TB from customer bucket to Aampe bucket = ~$23/month storage cost for duplicate data
Additional Benefits:
No ingress operations: While GCS ingress is free, we avoid unnecessary data transfer operations
Allows to pull only subset of data as noted in next section
2. Minimizing Egress Costs: Incremental Data Pulling
Incremental Extraction: Instead of pulling full tables every day, we use configurable time windows to pull only recent data. This applies to Snowflake, S3, and other data sources where customers still pay egress costs:
Key Features:
Configurable windows:
data_change_windowparameter (typically 0-7 days)Partition-aware queries: Only query relevant partitions
Full load on first run: Initial load pulls everything, subsequent runs are incremental
Applies to all sources: Snowflake, S3, GCS, and other external data sources
Important Note:
For Snowflake, S3, and other sources: Customers still pay egress costs, but we minimize them through incremental pulls
For BigQuery sources: Egress is free (see Billing Project Pattern section below) because BigQuery-to-BigQuery transfers within the same region have no egress charges
Cost Impact for Snowflake/S3 Sources:
Full table pull: For a 1TB table, egress costs ~$120/month (at $0.12/GB)
Incremental pull (7 days): For 7 days of data (~50GB), egress costs ~$6/month
Example: A customer with a 500GB events table in Snowflake:
Daily full pull: 500GB × 30 days = 15TB/month = ~$1,800/month in egress
Daily incremental (1 day): ~17GB/day × 30 = 510GB/month = ~$61/month
For BigQuery Sources: Egress costs are $0 because BigQuery-to-BigQuery transfers within the same region are free, and we use the billing project pattern (see next section).
3. Billing Project Pattern: Absorbing Query Costs
Aampe as Billing Project: When pulling data from customer BigQuery projects, we use customer credentials but set Aampe's project as the billing project:
How It Works:
We authenticate using customer's service account credentials
We query customer's BigQuery tables (with their permission)
But the query costs are billed to Aampe's project, not the customer's
Benefits for Customers:
Zero query costs: Customers don't pay for queries we run on their data
Zero egress costs: BigQuery-to-BigQuery transfers within the same region are free (unlike Snowflake/S3 where egress costs apply)
Transparent pricing: Customers only see storage costs, not processing costs
Key Distinction:
BigQuery sources: $0 egress costs (free BQ-to-BQ transfers) + $0 query costs (billed to Aampe)
Snowflake/S3 sources: Customers pay egress costs, but we minimize them through incremental pulls (95%+ savings)
Cost Impact: For a customer with daily queries scanning 100GB:
Customer pays: $0 (queries billed to Aampe)
Traditional approach: Customer would pay ~$5/day = ~$150/month in query costs
Customer savings: ~$150/month
Note: This pattern works because:
We have proper IAM permissions via customer service accounts
The billing project is just an accounting mechanism - data access is still controlled by customer IAM
4. Strategic BigQuery Partitioning
Daily Time Partitioning: Every table in our system uses daily partitioning, which provides several benefits:
Query Cost Reduction: Queries can scan only relevant partitions instead of entire tables
Efficient Data Management: Easy to drop old partitions or backfill specific date ranges
Better Performance: BigQuery can parallelize queries across partitions
Impact: For a table with 180 days of data, a query targeting a single day scans only 0.55% of the data, resulting in ~99.45% cost reduction for date-filtered queries.
5. Intelligent Clustering
We strategically cluster tables on frequently queried fields:
Event tables: Clustered on
event_name,event_type, oruser_idAttribute tables: Clustered on
user_idfor fast user lookups
Impact: Clustering can reduce query costs by 50-90% for queries that filter on clustered columns, as BigQuery can skip entire blocks of data.
6. Automated Data Lifecycle Management
180-Day Expiration Policy: All staging tables automatically expire after 180 days:
Benefits:
Automatic cleanup: No manual intervention needed
Cost control: Prevents unbounded storage growth
Compliance: Helps meet data retention requirements
Cost Impact: For a customer generating 1TB/month, this policy saves ~$20/month in storage costs per customer after the first 180 days, and prevents exponential growth.
Scaling Patterns
Per-Customer Isolation
Each customer has:
Dedicated service accounts: Isolated IAM and permissions
Separate staging buckets: Per-customer GCS buckets for Dataflow staging
Independent pipelines: Customer-specific DAGs that can scale independently
This architecture allows us to:
Scale individual customers without affecting others
Apply customer-specific optimizations
Handle failures in isolation
Parallel Processing Architecture
Our ingestion pipeline processes multiple data types in parallel:
Benefits:
Reduced total processing time
Better resource utilization
Independent scaling per data type
Scaling Linearly
Our architecture scales linearly with data volume:
10x more events → ~10x processing time (not 100x)
10x more customers → ~10x total cost (not exponential)
Parallel processing ensures we can handle spikes without over-provisioning
Conclusion
By combining strategic BigQuery optimizations (partitioning, clustering, expiration), efficient batch processing with Apache Beam, and a scalable multi-customer architecture, Aampe’s agents can handle massive scale while keeping costs predictable and manageable both internally and for the customer.




