Health & Monitoring
Check API health, service status, and diagnostics.
Health Endpoint
GET /api/healthReturns the overall health status of the API and its dependencies.
Response
{
"status": "healthy",
"timestamp": "2026-02-15T12:00:00Z",
"version": "0.2.0",
"checks": {
"database": {
"status": "pass",
"latencyMs": 12
},
"solana": {
"status": "pass",
"latencyMs": 89,
"message": "Slot: 245789012"
},
"redis": {
"status": "pass",
"latencyMs": 5
}
},
"uptime": 86400,
"circuitBreakers": {}
}Status Values
| Status | HTTP Code | Description |
|---|---|---|
healthy | 200 | All systems operational |
degraded | 200 | Optional services unavailable |
unhealthy | 503 | Critical services down |
Check Results
Each service check returns:
| Field | Type | Description |
|---|---|---|
status | "pass" | "fail" | Health status |
latencyMs | number | Response time in ms |
message | string? | Additional info |
Simple Uptime Check
For load balancers and uptime monitors:
HEAD /api/healthReturns 200 OK if the API is running (no body).
Circuit Breakers
When webhook endpoints fail repeatedly, circuit breakers prevent cascade failures.
The health response includes circuit breaker status:
{
"circuitBreakers": {
"webhook.example.com": {
"state": "open",
"failures": 3
},
"api.partner.io": {
"state": "closed",
"failures": 0
}
}
}Circuit Breaker States
| State | Description |
|---|---|
closed | Normal operation |
open | Requests blocked (3+ failures) |
half-open | Testing if service recovered |
Circuits reset after 5 minutes of being open.
Request Tracing
All API requests include a correlation ID for tracing:
Request Header (optional)
X-Request-ID: your-custom-idResponse Header (always present)
X-Request-ID: 550e8400-e29b-41d4-a716-446655440000Use this ID when reporting issues or debugging webhook deliveries.
Rate Limit Headers
All responses include rate limit information:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1700000060When rate limited (HTTP 429):
Retry-After: 60
X-RateLimit-Remaining: 0Monitoring Best Practices
Health Check Intervals
| Environment | Interval | Timeout |
|---|---|---|
| Production | 30s | 10s |
| Staging | 60s | 15s |
| Development | 5m | 30s |
Alerting Thresholds
| Metric | Warning | Critical |
|---|---|---|
| Health status | degraded | unhealthy |
| Response time | > 2s | > 5s |
| Error rate | > 1% | > 5% |
| Circuit breakers open | 1+ | 3+ |
Example: Prometheus Scrape Config
scrape_configs:
- job_name: 'etch-api'
metrics_path: /api/health
static_configs:
- targets: ['etch.film.fun']
relabel_configs:
- source_labels: [__address__]
target_label: instanceDead Letter Queue
Failed launch jobs are saved to a dead letter queue for manual review.
Jobs end up in the DLQ when:
- All retry attempts exhausted
- Non-recoverable errors (e.g., invalid config)
- Solana transaction failures after retries
Contact support to review and retry DLQ items.
Last updated on