When I joined the Analytics team at Shopify, our merchants were facing a common challenge: they needed real-time insights into their store's performance during high-traffic events like Black Friday. Here's how we built a system that now processes millions of events daily while maintaining sub-second latency.
The Challenge
Our merchants needed to:
- Monitor sales and conversion rates in real-time
- Detect inventory issues immediately
- Track promotional campaign performance
- Identify traffic patterns as they happen
System Architecture
We implemented a three-tier architecture:
- Event Collection Layer
- WebSocket connections for real-time updates
- Redis pub/sub for event distribution
- Event batching for efficiency
- Processing Layer
- Stream processing with automatic scaling
- Pre-aggregation for common metrics
- Cache warming strategies
- Serving Layer
- GraphQL subscriptions for live updates
- Multi-level caching
- Fallback mechanisms
Performance Optimizations
The biggest challenge was maintaining performance during peak traffic. Here are some key optimizations we implemented:
// Example of our caching strategy
const getStoreSummary = async (storeId: string) => {
const cacheKey = `store:${storeId}:summary`;
// Try Redis cache first
let summary = await redis.get(cacheKey);
if (!summary) {
// Fall back to pre-aggregated data
summary = await db.getPreAggregatedSummary(storeId);
// Warm cache with short TTL during high traffic
await redis.setex(cacheKey, 300, JSON.stringify(summary));
}
return JSON.parse(summary);
};