Version: 1.0.1

Core Concepts

This page explains the fundamental concepts you need to understand before building with Milvaion.

Architecture Overview

Milvaion consists of four main components:

Milvaion Dashboard

Component Responsibilities

Milvaion API (Scheduler)

The API is responsible for:

Responsibility	How It Works
Job Management	REST endpoints for CRUD operations
Scheduling	Stores jobs in Redis ZSET, sorted by next execution time
Dispatching	Polls Redis, publishes due jobs to RabbitMQ
Status Tracking	Consumes status updates from workers via RabbitMQ
Log Collection	Receives and persists execution logs
Failed Execution Handling	Consumes failed job DLQ and add them to the database.
Zombie Execution Detecting	Monitor not heartbeating unkown occurrences and marks them as zombie.
Auto Disabling	Auto disables the always failing jobs according to configurable threshold.
Dashboard	Serves React UI + SignalR for real-time updates

Workers

Workers are separate .NET processes that:

Responsibility	How It Works
Job Execution	Consume messages from RabbitMQ, run `IJob` code
Status Reporting	Publish Running -> Completed/Failed transitions
Log Streaming	Send execution logs via RabbitMQ
Heartbeating	Periodic Redis updates to prove liveness
Retry Handling	Automatic retry with exponential backoff

Infrastructure

Component	Purpose
PostgreSQL	Persistent storage for jobs, occurrences, logs, users etc.
Redis	Fast scheduling (ZSET), distributed locks, caching, heartbeats
RabbitMQ	Reliable job distribution, status/log queues, DLQ

Key Terms

Job

A job is a recurring or one-time task definition stored in PostgreSQL.

{
  "displayName": "My First Job",
  "description": "This is a test job!",
  "tags": "test,first-job",
  "workerId": "sample-worker",
  "jobType": "SampleSendEmailJob",
  "jobData": "{\"to\": \"[email protected]\", \"body\": \"Test email body.\", \"subject\": \"Test email subject\"}",
  "executeAt": "2026-02-01T22:26:00Z",
  "cronExpression": "0 * * * * *",
  "isActive": true,
  "concurrentExecutionPolicy": 0,
  "auditInfo": {
      "creationDate": "2026-02-01T22:20:54.474819Z",
      "creatorUserName": "rootuser",
      "lastModificationDate": null,
      "lastModifierUserName": null
  },
  "avarageDuration": 30013.666666666668,
  "successRate": 100,
  "totalExecutions": 6,
  "zombieTimeoutMinutes": null,
  "executionTimeoutSeconds": null,
  "version": 1,
  "jobVersions": [],
  "autoDisableSettings": {
      "consecutiveFailureCount": 0,
      "lastFailureTime": null,
      "disabledAt": null,
      "disableReason": null,
      "enabled": true,
      "threshold": null
  },
  "id": "019c1b4b-6f4a-75fb-b094-0dec83f168f5"
}

Occurrence

An occurrence is a single execution of a job.

{
  "jobName": "SampleSendEmailJob",
  "jobId": "019c1b4b-6f4a-75fb-b094-0dec83f168f5",
  "correlationId": "019c1b50-1a91-7612-bf91-7467378ac682",
  "workerId": "sample-worker",
  "status": 2,
  "startTime": "2026-02-01T22:26:00.47244Z",
  "endTime": "2026-02-01T22:26:30.481741Z",
  "durationMs": 30010,
  "result": "Job SampleSendEmailJob completed successfully",
  "exception": null,
  "logs": [
      {
         "occurrenceId": "019c1b50-1a91-7612-bf91-7467378ac682",
         "timestamp": "2026-02-01T22:26:00.465748Z",
         "level": "Information",
         "message": "Job dispatched to RabbitMQ queue and will start closely...",
         "data": {
             "WorkerId": "sample-worker",
             "ExecuteAt": "2026-02-01T22:26:00.0000000Z",
             "JobVersion": 1
         },
         "category": "Dispatcher",
         "exceptionType": null,
         "occurrence": null,
         "creationDate": "2026-02-01T22:26:00.471996Z",
         "creatorUserName": "Anonymous",
         "id": "019c1b50-1a91-7d7b-8c3d-6c7b45ac8d3e"
      }...
   ],
  "statusChangeLogs": [
      {
         "timestamp": "2026-02-01T22:26:00.5428676Z",
         "from": 0,
         "to": 1
      },
      {
         "timestamp": "2026-02-01T22:26:30.5487366Z",
         "from": 1,
         "to": 2
      }
  ],
  "createdAt": "2026-02-01T22:26:00.465747Z",
  "lastHeartbeat": "2026-02-01T22:26:30.549094Z",
  "jobVersion": 1,
  "id": "019c1b50-1a91-7612-bf91-7467378ac682"
}  

Occurrence statuses:

Status	Code	Meaning
Queued	0	Dispatched to RabbitMQ, waiting for worker
Running	1	Worker is executing the job
Completed	2	Job finished successfully
Failed	3	Job threw exception (after retries)
Cancelled	4	Job was cancelled by user
TimedOut	5	Execution exceeded timeout
Unknown	6	Lost heartbeat from worker. Possible causes: Worker crashed, RabbitMQ connection lost, or network failure. Health monitor marks running jobs as Unknown when they don't send heartbeat for threshold time.

Worker

A worker is a logical group of application instances that execute the same job types. Think of it as a deployment unit or service definition.

A worker instance is a single running process within that worker group. Multiple instances of the same worker can run simultaneously for horizontal scaling.

┌─────────────────────────────────────────────────────┐
│                  Worker: "email-worker"             │
│  (Logical group that handles email-related jobs)    │
├─────────────────────────────────────────────────────┤
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐    │
│  │  Instance 1 │ │  Instance 2 │ │  Instance 3 │    │
│  │  (Pod/Host) │ │  (Pod/Host) │ │  (Pod/Host) │    │
│  └─────────────┘ └─────────────┘ └─────────────┘    │
│                                                     │
│  1 Worker = N Worker Instances                      │
└─────────────────────────────────────────────────────┘

Worker (as a group):

Can be written in any programming language
Defines which job types it can execute
Shares the same RabbitMQ queue bindings
Is identified by a common WorkerId

Worker Instance (as a single process):

Connects to RabbitMQ
Subscribes to job queues based on routing patterns
Executes IJob implementations
Reports status back to the API
Sends heartbeats to prove liveness

Workers are identified by:

WorkerId: Logical group name (e.g., email-worker) - shared by all instances
InstanceId: Unique per-process (e.g., email-worker-6e183cdc) - auto-generated for each instance

Worker vs Worker Instance

When scaling documentation says "3 workers with MaxParallelJobs: 20", it means 3 worker instances of the same worker group, each capable of running 20 parallel jobs.

IJob Interface

Jobs are implemented as classes:

public class SendEmailJob : IAsyncJob
{
    public async Task ExecuteAsync(IJobContext context)
    {
        var data = JsonSerializer.Deserialize<EmailData>(context.Job.JobData);
        
        context.LogInformation($"Sending email to {data.To}");
        
        // Your logic here
        await _emailService.SendAsync(data.To, data.Subject, data.Body);
        
        context.LogInformation("Email sent successfully");
    }
}

The SDK provides four interfaces:

Interface	Async	Returns Result
`IJob`	No	No
`IJobWithResult`	No	Yes
`IAsyncJob`	Yes	No
`IAsyncJobWithResult`	Yes	Yes

Always prefer IAsyncJob for async I/O operations.

Message Flow

Job Dispatch Flow

1. Cron trigger or manual request
   → API creates a JobOccurrence with status = Queued

2. Dispatcher (inside API)
   → Publishes message to RabbitMQ
   → Routing key: sendemail.{occurrenceId}

3. Worker
   → Consumes message from RabbitMQ
   → Updates occurrence status to Running

4. Worker
   → Executes IJob implementation
   → Streams execution logs to RabbitMQ

5. Worker
   → Job completes or fails
   → Publishes final status (Completed / Failed)

6. API
   → Consumes job status events
   → Persists final state in PostgreSQL

7. API
   → Broadcasts updates via SignalR
   → Dashboard reflects status in real time

Routing Keys

Jobs are routed to workers using RabbitMQ topic exchange:

Exchange: milvaion.jobs (type: topic)

Routing Key Format: {workerId}.{jobType}.{occurrenceId}

Examples:
  - email-worker.sendemailasync.abc-123 → Consumed by email workers
  - report-worker.generatereport.def-456 → Consumed by report workers
  - sample-worker.samplejob.ghi-789 → Consumed by test workers

Workers subscribe to patterns:

// This worker handles all email-related jobs
options.RoutingPatterns = new[] { "sendemail.*", "emailcampaign.*" };

Routing Patterns

Setting up routing patterns is not recommended . The scheduler and worker will determine this automatically at runtime.

Scheduling Mechanics

Redis ZSET

Jobs are scheduled using a Redis Sorted Set:

Key: Milvaion:JobScheduler:scheduled_jobs
Score: Unix timestamp (seconds) of next execution
Member: Job ID

Example:
| Score (Unix) | Job ID        | Notes             |
|--------------|---------------|-------------------|
| 1705320000   | job-abc-123   | Due now           |
| 1705320060   | job-def-456   | Due in 1 minute   |
| 1705320120   | job-ghi-789   | Due in 2 minutes  |

Dispatcher Loop

Queries Redis: ZRANGEBYSCORE scheduled_jobs 0 {now}
For each due job:
- Acquires distributed lock
- Creates Occurrence in PostgreSQL
- Publishes to RabbitMQ
- Calculates next cron time
- Updates Redis ZSET score
- Releases lock

Cron Expressions

There are two types of cron commands;

Standard 5-field cron format:

* * * * *
| | | | |
| | | | +-- Day of Week (0–6, Sunday = 0)
| | | +---- Month (1–12)
| | +------ Day of Month (1–31)
| +-------- Hour (0–23)
+---------- Minute (0–59)

Common 5-field examples;

Expression	Schedule
`0 * * * *`	Every hour at :00
`0 9 * * *`	Daily at 9:00 AM
`/15 * * *`	Every 15 minutes

6-field seconds included cron format.

* * * * * *
| | | | | |
| | | | | +-- Day of Week (0–6, Sunday = 0)
| | | | +---- Month (1–12)
| | | +------ Day of Month (1–31)
| | +-------- Hour (0–23)
| +---------- Minute (0–59)
+------------ Second (0–59)

Common examples:

Expression	Schedule
`0 * * * * *`	Every minute (at second 0)
`0 0 * * * *`	Every hour at :00
`0 0 9 * * *`	Daily at 9:00 AM
`0 0 9 * * MON`	Every Monday at 9:00 AM
`0 /15 * * *`	Every 15 minutes
`0 0 0 1 * *`	First day of month at midnight

Milvaion uses 6-field cron format.

Reliability Patterns

Retry with Exponential Backoff

When a job fails, Milvaion automatically retries:

Attempt 1: Immediate
Attempt 2: Wait 5 seconds
Attempt 3: Wait 10 seconds
Attempt 4: Wait 20 seconds
Attempt 5: Wait 40 seconds
| Max retries exceeded | Move to DLQ

Dead Letter Queue (DLQ)

Jobs that fail after all retries are moved to a Dead Letter Queue:

RabbitMQ routes failed message to DLQ exchange
Failed Occurrence Handler consumes from DLQ
Creates FailedOccurrence record for manual review
Dashboard shows failed jobs with exception details

Zombie Detection

If a worker crashes while processing a job:

Job stays in "Running" status forever (zombie)
Zombie Occurrence Detector runs every 5 minutes
Detects occurrences stuck in Running/Queued beyond threshold
Marks them as Failed and requeues if configured

Auto Disabling (Failure Threshold Protection)

To prevent continuously failing jobs from being dispatched indefinitely, Milvaion supports automatic job disabling.

If a job exceeds a configured failure threshold within a defined time window:

The job is automatically marked as Disabled
No new occurrences are dispatched for the job
Manual intervention is required to re-enable the job

Typical use cases:

Misconfigured jobs
External dependency outages
Deterministic failures caused by code bugs

Example behavior:

Failure threshold: 5 consecutive failures
Time window: 10 minutes

→ Job fails 5 times within 10 minutes
→ Job status is set to Disabled
→ Dispatcher stops creating new occurrences

Auto-disabling

Auto-disabling is applied at the job level, not per occurrence. Once disabled, the job must be explicitly re-enabled by an operator.

Idempotency

Each occurrence has a unique CorrelationId. Workers track completed jobs to avoid duplicate execution if a message is redelivered.

What's Next?

Now that you understand the concepts:

Your First Worker - Build a custom worker
Implementing Jobs - Write job logic with DI
Configuration - All available settings

Architecture Overview​

Component Responsibilities​

Milvaion API (Scheduler)​

Workers​

Infrastructure​

Key Terms​

Job​

Occurrence​

Worker​

IJob Interface​

Message Flow​

Job Dispatch Flow​

Routing Keys​

Scheduling Mechanics​

Redis ZSET​

Dispatcher Loop​

Cron Expressions​

Reliability Patterns​

Retry with Exponential Backoff​

Dead Letter Queue (DLQ)​

Zombie Detection​

Auto Disabling (Failure Threshold Protection)​

Idempotency​

What's Next?​