Skip to Content
APIHealth & Monitoring

Health & Monitoring

Check API health, service status, and diagnostics.

Health Endpoint

GET /api/health

Returns the overall health status of the API and its dependencies.

Response

{ "status": "healthy", "timestamp": "2026-02-15T12:00:00Z", "version": "0.2.0", "checks": { "database": { "status": "pass", "latencyMs": 12 }, "solana": { "status": "pass", "latencyMs": 89, "message": "Slot: 245789012" }, "redis": { "status": "pass", "latencyMs": 5 } }, "uptime": 86400, "circuitBreakers": {} }

Status Values

StatusHTTP CodeDescription
healthy200All systems operational
degraded200Optional services unavailable
unhealthy503Critical services down

Check Results

Each service check returns:

FieldTypeDescription
status"pass" | "fail"Health status
latencyMsnumberResponse time in ms
messagestring?Additional info

Simple Uptime Check

For load balancers and uptime monitors:

HEAD /api/health

Returns 200 OK if the API is running (no body).

Circuit Breakers

When webhook endpoints fail repeatedly, circuit breakers prevent cascade failures.

The health response includes circuit breaker status:

{ "circuitBreakers": { "webhook.example.com": { "state": "open", "failures": 3 }, "api.partner.io": { "state": "closed", "failures": 0 } } }

Circuit Breaker States

StateDescription
closedNormal operation
openRequests blocked (3+ failures)
half-openTesting if service recovered

Circuits reset after 5 minutes of being open.

Request Tracing

All API requests include a correlation ID for tracing:

Request Header (optional)

X-Request-ID: your-custom-id

Response Header (always present)

X-Request-ID: 550e8400-e29b-41d4-a716-446655440000

Use this ID when reporting issues or debugging webhook deliveries.

Rate Limit Headers

All responses include rate limit information:

X-RateLimit-Limit: 60 X-RateLimit-Remaining: 42 X-RateLimit-Reset: 1700000060

When rate limited (HTTP 429):

Retry-After: 60 X-RateLimit-Remaining: 0

Monitoring Best Practices

Health Check Intervals

EnvironmentIntervalTimeout
Production30s10s
Staging60s15s
Development5m30s

Alerting Thresholds

MetricWarningCritical
Health statusdegradedunhealthy
Response time> 2s> 5s
Error rate> 1%> 5%
Circuit breakers open1+3+

Example: Prometheus Scrape Config

scrape_configs: - job_name: 'etch-api' metrics_path: /api/health static_configs: - targets: ['etch.film.fun'] relabel_configs: - source_labels: [__address__] target_label: instance

Dead Letter Queue

Failed launch jobs are saved to a dead letter queue for manual review.

Jobs end up in the DLQ when:

  • All retry attempts exhausted
  • Non-recoverable errors (e.g., invalid config)
  • Solana transaction failures after retries

Contact support to review and retry DLQ items.

Last updated on