Ana içeriğe geç
Versiyon: 1.0.1

Reporter Worker

The Reporter Worker is a built-in analytics worker that automatically generates metric reports about your Milvaion infrastructure. It queries JobOccurrences, ScheduledJobs, WorkflowRuns, and Workflows tables and writes aggregated JSON reports to the MetricReports table for consumption by the API and Dashboard UI.

Features

  • 10 built-in metric report types (job, worker, and workflow analytics)
  • Configurable lookback window and data retention
  • Time-series, ranking, and health score reports

Use Cases

ScenarioExample
Failure MonitoringTrack error rate trends across all jobs over time
Performance AnalysisIdentify slowest jobs with P50/P95/P99 duration metrics
Worker Capacity PlanningMonitor throughput and utilization per worker instance
SLA ComplianceMeasure schedule deviation between cron and actual execution
Job Health TrackingScore each job by success rate for reliability dashboards
Workflow AnalyticsAnalyze workflow success rates and step-level bottlenecks

Report Types

The Reporter Worker includes 10 report jobs organized into three categories:

Job Metrics

Job ClassMetric TypeDescription
FailureRateTrendReportJobFailureRateTrendHourly failure rate percentage over the lookback period
PercentileDurationsReportJobPercentileDurationsP50/P95/P99 execution duration distribution per job
TopSlowJobsReportJobTopSlowJobsJobs ranked by highest average execution duration
JobHealthScoreReportJobJobHealthScoreSuccess rate and occurrence counts per job
CronScheduleVsActualReportJobCronScheduleVsActualDeviation between scheduled and actual execution times

Worker Metrics

Job ClassMetric TypeDescription
WorkerThroughputReportJobWorkerThroughputJob count, success/failure breakdown per worker
WorkerUtilizationTrendReportJobWorkerUtilizationTrendHourly capacity utilization percentage per worker

Workflow Metrics

Job ClassMetric TypeDescription
WorkflowSuccessRateReportJobWorkflowSuccessRateSuccess, failure, partial, and cancelled rates per workflow
WorkflowStepBottleneckReportJobWorkflowStepBottleneckStep-level avg/max duration, failure count, and retry count
WorkflowDurationTrendReportJobWorkflowDurationTrendAverage workflow execution duration over time

Worker Configuration

Configure the Reporter Worker in appsettings.json:

{
"Reporter": {
"DatabaseConnectionString": "Host=localhost;Port=5432;Database=MilvaionDb;Username=postgres;Password=secret;",
"ReportGeneration": {
"DataRetentionDays": 30,
"LookbackHours": 24,
"TopNLimit": 10,
"MaxScheduleDeviations": 500
}
}
}

Configuration Properties

PropertyTypeRequiredDefaultDescription
DatabaseConnectionStringstring-PostgreSQL connection string for reading occurrence/workflow data and writing reports
DataRetentionDaysint-30How many days of historical data to consider
LookbackHoursint-24Time window (in hours) for time-series reports
TopNLimitint-10Maximum items in ranking reports (TopSlowJobs)
MaxScheduleDeviationsint-500Maximum deviation records for CronScheduleVsActual

Scheduling Report Jobs

Each report type is an independent job. Schedule them through the Milvaion API just like any other worker job:

{
"displayName": "Failure Rate Trend Report",
"selectedJobName": "FailureRateTrendReportJob",
"cronExpression": "0 */6 * * *",
"isActive": true
}
JobCron ExpressionFrequencyRationale
FailureRateTrendReportJob0 */6 * * *Every 6 hoursTrack error trends throughout the day
PercentileDurationsReportJob0 */6 * * *Every 6 hoursMonitor latency distribution changes
TopSlowJobsReportJob0 0 * * *Daily at midnightDaily ranking is sufficient
WorkerThroughputReportJob0 */6 * * *Every 6 hoursTrack worker load throughout the day
WorkerUtilizationTrendReportJob0 */6 * * *Every 6 hoursCapacity monitoring
CronScheduleVsActualReportJob0 0 * * *Daily at midnightAccumulated daily deviations
JobHealthScoreReportJob0 0 * * *Daily at midnightDaily health overview
WorkflowSuccessRateReportJob0 0 * * *Daily at midnightDaily workflow health
WorkflowStepBottleneckReportJob0 0 * * *Daily at midnightDaily step analysis
WorkflowDurationTrendReportJob0 */6 * * *Every 6 hoursTrack workflow duration trends

Report Data Schemas

Each metric type stores its result as a JSON payload in the Data column (PostgreSQL jsonb).

FailureRateTrend

Hourly error rate as a time series with a configurable threshold.

{
"thresholdPercentage": 5.0,
"dataPoints": [
{ "timestamp": "2026-06-01T10:00:00Z", "value": 2.5 },
{ "timestamp": "2026-06-01T11:00:00Z", "value": 3.1 }
]
}

PercentileDurations

P50/P95/P99 duration distribution per job (requires ≥10 occurrences).

{
"jobs": {
"EmailSenderJob": { "p50": 120.5, "p95": 450.2, "p99": 890.7 },
"DataSyncJob": { "p50": 80.3, "p95": 310.1, "p99": 620.4 }
}
}

TopSlowJobs

Jobs ranked by average duration, limited by TopNLimit.

{
"jobs": [
{ "jobName": "HeavyReportJob", "averageDurationMs": 45200.5, "occurrenceCount": 12 },
{ "jobName": "DataMigrationJob", "averageDurationMs": 32100.3, "occurrenceCount": 8 }
]
}

WorkerThroughput

Per-worker job count and success/failure breakdown.

{
"workers": [
{
"workerId": "worker-1",
"jobCount": 150,
"successCount": 145,
"failureCount": 5,
"averageDurationMs": 1200.5
}
]
}

WorkerUtilizationTrend

Hourly utilization percentage per worker (capped at 100%).

{
"dataPoints": [
{
"timestamp": "2026-06-01T10:00:00Z",
"workerUtilization": { "worker-1": 75.5, "worker-2": 42.3 }
}
]
}

CronScheduleVsActual

Deviation between cron-scheduled and actual execution times, sorted by largest deviation.

{
"jobs": [
{
"occurrenceId": "01968a3b-...",
"jobId": "01968a2a-...",
"jobName": "HourlySync",
"scheduledTime": "2026-06-01T10:00:00Z",
"actualTime": "2026-06-01T10:00:12Z",
"deviationSeconds": 12.0
}
]
}

JobHealthScore

Success rate per job (requires ≥5 occurrences), ordered by lowest success rate.

{
"jobs": [
{
"jobName": "EmailSenderJob",
"successRate": 98.5,
"totalOccurrences": 200,
"successCount": 197,
"failureCount": 3
}
]
}

WorkflowSuccessRate

Per-workflow success/failure/partial/cancelled breakdown.

{
"workflows": [
{
"workflowId": "01968a3b-...",
"workflowName": "OrderProcessing",
"successRate": 95.0,
"totalRuns": 100,
"completedCount": 95,
"failedCount": 3,
"partialCount": 1,
"cancelledCount": 1,
"avgDurationMs": 5400.0
}
]
}

WorkflowStepBottleneck

Step-level performance analysis per workflow.

{
"workflows": [
{
"workflowId": "01968a3b-...",
"workflowName": "OrderProcessing",
"steps": [
{
"stepName": "ValidateOrder",
"avgDurationMs": 200.5,
"maxDurationMs": 1500.0,
"executionCount": 100,
"failureCount": 2,
"skippedCount": 0,
"retryCount": 1
}
]
}
]
}

WorkflowDurationTrend

Average workflow duration over time.

{
"dataPoints": [
{
"timestamp": "2026-06-01T10:00:00Z",
"workflowAvgDurationMs": {
"OrderProcessing": 5200.0,
"DataPipeline": 12400.0
}
}
]
}

Deployment

The Reporter Worker can be deployed as a Docker container:

# docker-compose.yml
services:
reporter-worker:
image: milvasoft/milvaion-reporter-worker:latest
environment:
- Worker__WorkerId=reporter-worker-01
- Worker__RabbitMQ__Host=rabbitmq
- Worker__RabbitMQ__Port=5672
- Worker__RabbitMQ__Username=guest
- Worker__RabbitMQ__Password=guest
- Worker__MaxParallelJobs=4
- Reporter__DatabaseConnectionString=Host=postgres;Port=5432;Database=MilvaionDb;Username=postgres;Password=secret
- Reporter__ReportGeneration__LookbackHours=24
- Reporter__ReportGeneration__TopNLimit=10
depends_on:
- postgres
- rabbitmq
restart: unless-stopped

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
name: reporter-worker
spec:
replicas: 1
selector:
matchLabels:
app: reporter-worker
template:
metadata:
labels:
app: reporter-worker
spec:
containers:
- name: reporter-worker
image: milvasoft/milvaion-reporter-worker:latest
env:
- name: Worker__WorkerId
value: "reporter-worker-01"
- name: Worker__RabbitMQ__Host
value: "rabbitmq"
- name: Reporter__DatabaseConnectionString
valueFrom:
secretKeyRef:
name: milvaion-secrets
key: database-connection-string
- name: Reporter__ReportGeneration__LookbackHours
value: "24"
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"

Note: A single replica is typically sufficient since report generation jobs run periodically (every 6 hours or daily), not continuously. Scale only if you have very high report frequency requirements.

Best Practices

  1. Schedule Reports During Low Traffic

    • Run daily reports (TopSlowJobs, JobHealthScore, CronScheduleVsActual) at off-peak hours
    • Time-series reports (FailureRateTrend, WorkerUtilizationTrend) can run every 6 hours safely
  2. Tune the Lookback Window

    • Default LookbackHours: 24 covers a full day
    • For high-volume environments, consider shorter windows (6–12 hours) to reduce query load
    • For low-volume environments, extend to 48–72 hours for more meaningful data
  3. Set Appropriate TopN Limits

    • Default TopNLimit: 10 works well for most deployments
    • Increase for environments with many different job types
  4. Implement Data Retention

    • Reports accumulate over time—use the cleanup API (DELETE /metricreports/cleanup?OlderThanDays=30)
    • Schedule a periodic cleanup job via the Maintenance Worker or a cron-based scheduled job
  5. Monitor Report Generation

    • Check Serilog output for generation success/failure messages
    • Each job logs the number of data points or items generated
    • Failed report generation does not affect other reports (jobs are independent)
  6. Use Read Replicas for Heavy Workloads

    • Point DatabaseConnectionString to a PostgreSQL read replica to avoid impacting the primary database
    • Especially important for PercentileDurations and WorkerUtilizationTrend which run aggregation-heavy queries

Troubleshooting

Reports Not Being Generated

  1. Check worker connectivity: Ensure the worker can reach RabbitMQ and PostgreSQL
  2. Check job schedules: Verify report jobs are scheduled and active in the Milvaion API
  3. Check logs: Look for Starting ... Report generation log entries

Reports Show Empty Data

  1. Check lookback window: If LookbackHours is 24 but no jobs ran in the last 24 hours, data will be empty
  2. Check minimum thresholds: PercentileDurations requires ≥10 occurrences, JobHealthScore requires ≥5 occurrences per job
  3. Check workflow data: Workflow reports require WorkflowRuns records

Database Connection Errors

Npgsql.NpgsqlException: Failed to connect to ...
  • Verify DatabaseConnectionString is correct
  • Check network connectivity between the worker and PostgreSQL
  • Ensure the database user has SELECT permission on JobOccurrences, ScheduledJobs, WorkflowRuns, Workflows and INSERT permission on MetricReports

High Query Load

  • Reduce LookbackHours to narrow the query window
  • Schedule reports less frequently (e.g., daily instead of every 6 hours)
  • Point DatabaseConnectionString to a read replica
  • Add appropriate indexes on JobOccurrences.StartTime and WorkflowRuns.StartTime

For viewing and managing generated reports via the API and Dashboard, see Metric Reports. For custom workers, see Your First Worker and Implementing Jobs.