Version: 1.0.1

Scaling Guide

This guide covers strategies for scaling Milvaion horizontally and vertically to handle increasing workloads.

Scaling Overview

Milvaion is designed for horizontal scaling:

Component	Scaling Type	Strategy
API Server	Horizontal	Add more instances behind load balancer
Workers	Horizontal	Add more instances per job type
PostgreSQL	Vertical / Read replicas	Larger instance or read replicas
Redis	Vertical / Cluster	Larger instance or Redis Cluster
RabbitMQ	Horizontal	Clustering with replicated queues

Scaling Workers

Workers are stateless (unless you make them stateful) – scaling is straightforward.

Basic Scaling

# Docker Compose - scale to 5 instances
docker compose up -d --scale email-worker=5

# Kubernetes - scale deployment
kubectl scale deployment email-worker --replicas=5

# Kubernetes HPA (automatic)
kubectl autoscale deployment email-worker --min=2 --max=10 --cpu-percent=70

Capacity Planning

Concurrency Limits

MaxParallelJobs: Maximum parallel jobs per worker instance (RabbitMQ prefetch count). Default: 10
MaxParallelJobsPerWorker: Maximum parallel jobs across ALL worker instances (global limit). If null, no global limit applies.

Effective concurrency = min(MaxParallelJobsPerWorker ?? ∞, Workers × MaxParallelJobs)

Throughput Formula:

Jobs per second = Effective Concurrency × (1 / AvgJobDuration)

Example 1: No global limit (MaxParallelJobsPerWorker = null)

3 workers
MaxParallelJobs: 20 (per instance)
MaxParallelJobsPerWorker: null
Average job duration: 2 seconds

Effective concurrency = 3 × 20 = 60
Throughput = 60 × (1 / 2) = 30 jobs/second = 1,800 jobs/minute

Example 2: With global limit

3 workers
MaxParallelJobs: 32 (per instance → potential: 3 × 32 = 96)
MaxParallelJobsPerWorker: 64 (global cap)
Average job duration: 2 seconds

Effective concurrency = min(64, 3 × 32) = min(64, 96) = 64
Throughput = 64 × (1 / 2) = 32 jobs/second = 1,920 jobs/minute

When to use MaxParallelJobsPerWorker

Use global limits when:

External API has rate limits (e.g., max 100 concurrent requests)
Database connection pool is shared across workers
You want to cap resource usage regardless of worker count

Best-Effort Guarantee

MaxParallelJobsPerWorker is a best-effort limit, not an exact guarantee. The global limit may be temporarily exceeded during:

Worker scaling events – When new instances join or existing ones terminate, coordination delays may cause brief overages
High-load spikes – Under extreme load, multiple workers may acquire jobs before the global counter updates
Network partitions – Temporary communication issues between workers and the coordination layer (Redis)

Specialized Workers

Create job-specific workers for resource optimization:

Email Worker (I/O-bound, high concurrency)

{
  "JobConsumers": {
    "SendEmailJob": {
      "MaxParallelJobs": 100,
      "MaxParallelJobsPerWorker": null
    }
  }
}

Each instance handles up to 100 concurrent email jobs. No global limit.

Report Worker (CPU-bound, low concurrency)

{
  "JobConsumers": {
    "GenerateReportJob": {
      "MaxParallelJobs": 4,
      "MaxParallelJobsPerWorker": 8
    }
  }
}

Each instance handles up to 4 concurrent report jobs. Global limit: 8 total across all instances.

Worker Affinity

Route specific jobs to specific worker pools:

+---------------------+
|      RabbitMQ       |
|    Topic Exchange   |
+---------------------+
            |
   -------------------------
   |           |           |
sendemail.*  report.*   migration.*
   |           |           |
+---------+ +---------+ +------------+
| Email   | | Report  | | Migration  |
| Workers | | Workers | | Worker     |
| (x10)   | | (x2)    | | (x1)        |
+---------+ +---------+ +------------+

Scaling API Server

Horizontal Scaling

The API is stateless – scale by adding instances:

# docker-compose.yml
services:
  milvaion-api:
    image: milvasoft/milvaion-api:latest
    deploy:
      replicas: 3

Load Balancer Configuration

NGINX example:

upstream milvaion_api {
    least_conn;
    server api-1:5000;
    server api-2:5000;
    server api-3:5000;
}

server {
    listen 80;

    location / {
        proxy_pass http://milvaion_api;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
    }
}

Note

Enable WebSocket support for SignalR (Upgrade and Connection headers).

Dispatcher Leader Election

Only one dispatcher should be active. Milvaion uses Redis distributed locking:

Each API instance attempts to acquire dispatcher lock
Lock winner runs dispatcher service
Other instances skip dispatch (passive standby)
If leader fails, lock expires, another instance takes over

Configure lock TTL:

{
  "MilvaionConfig": {
    "JobDispatcher": {
      "LockTtlSeconds": 600
    }
  }
}

Scaling Infrastructure

PostgreSQL

Vertical Scaling (Primary):

Workload	vCPU	RAM	Storage
Small	2	4 GB	50 GB SSD
Medium	4	8 GB	100 GB SSD
Large	8	16 GB	500 GB SSD

Read Replicas:

For heavy read workloads (dashboard, reports), add read replicas.

Redis

Vertical Scaling:

Workload	RAM	Notes
Small	1 GB	Single instance
Medium	4 GB	Single instance with persistence
Large	8+ GB	Redis Cluster or Redis Sentinel

Key Capacity Planning:

~1 KB per scheduled job
~100 bytes per worker heartbeat
~500 bytes per distributed lock

Example: 10,000 active jobs ≈ 10 MB

RabbitMQ

Clustering:

For high availability, use RabbitMQ clustering with quorum queues.

Queue Capacity:

~500 bytes per queued job message
100,000 pending jobs ≈ 50 MB

Throughput Benchmarks

Reference Numbers

Tested on: 4 vCPU, 8 GB RAM, PostgreSQL / Redis / RabbitMQ on same machine

Scenario	Jobs/sec	Workers	Concurrency
Simple logging job	500	1	50
API call (100 ms)	100	1	10
Database insert	200	1	20
Email send (500 ms)	20	1	10

Scaling linearly with workers (when MaxParallelJobsPerWorker = null):

10 workers × 20 MaxParallelJobs × (1 / 1 sec avg) = 200 jobs/sec

warning

If MaxParallelJobsPerWorker is set, scaling is capped at that global limit regardless of worker count.

Bottleneck Identification

Symptom	Likely Bottleneck	Solution
High API CPU	Too many dashboard polls	Add API replicas
Redis high latency	Too many ZSET operations	Redis Cluster
RabbitMQ queue depth growing	Workers too slow	Add workers
PostgreSQL high CPU	Too many occurrence inserts	Read replicas, partitioning
Worker high memory	Job data too large	Optimize job payloads

Concurrency Policies

Per-Job Concurrency

{
  "concurrentExecutionPolicy": 0
}

Value	Policy	Behavior
0	Skip	Do not create occurrence if already running
1	Queue	Create occurrence, wait for previous to complete

Worker-Level Concurrency

{
  "JobConsumers": {
    "SendEmailJob": {
      "MaxParallelJobs": 50,
      "MaxParallelJobsPerWorker": null
    },
    "GenerateReportJob": {
      "MaxParallelJobs": 4,
      "MaxParallelJobsPerWorker": 10
    },
    "DataSyncJob": {
      "MaxParallelJobs": 20,
      "MaxParallelJobsPerWorker": 30
    }
  }
}

Job Type	MaxParallelJobs	MaxParallelJobsPerWorker	Behavior
SendEmailJob	50	null	50 per instance, no global limit
GenerateReportJob	4	10	4 per instance, max 10 globally
DataSyncJob	20	30	20 per instance, max 30 globally

Auto-Scaling Patterns

Kubernetes HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: email-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: email-worker
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Queue-Based Scaling (KEDA)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: email-worker-scaler
spec:
  scaleTargetRef:
    name: email-worker
  minReplicaCount: 1
  maxReplicaCount: 50
  triggers:
    - type: rabbitmq
      metadata:
        queueName: jobs.sendemail.wildcard
        host: amqp://guest:guest@rabbitmq:5672/
        mode: QueueLength
        value: "100"

Scaling Checklist

Before Scaling

Identify the bottleneck (CPU, memory, I/O, network)
Measure current throughput baseline
Check infrastructure capacity (connections, disk)

After Scaling

Verify even load distribution
Monitor for new bottlenecks
Update connection pool sizes if needed
Adjust timeout values for increased load

Connection Limits

Component	Default Limit	Recommendation
PostgreSQL	100 connections	Increase to workers × 2
Redis	10,000 connections	Usually sufficient
RabbitMQ	65,535 connections	Usually sufficient

What's Next?

Monitoring – Metrics and alerting
Database Maintenance – Cleanup and retention
Security – Security considerations

Scaling Overview​

Scaling Workers​

Basic Scaling​

Capacity Planning​

Specialized Workers​

Worker Affinity​

Scaling API Server​

Horizontal Scaling​

Load Balancer Configuration​

Dispatcher Leader Election​

Scaling Infrastructure​

PostgreSQL​

Redis​

RabbitMQ​

Throughput Benchmarks​

Reference Numbers​

Bottleneck Identification​

Concurrency Policies​

Per-Job Concurrency​

Worker-Level Concurrency​

Auto-Scaling Patterns​

Kubernetes HPA​

Queue-Based Scaling (KEDA)​

Scaling Checklist​

Before Scaling​

After Scaling​

Connection Limits​

What's Next?​