Overview
Canary Checker can collect health about systems in few different ways:
- Active Application health checks involve sending periodic requests to the service or application and checking the response to ensure that it is working correctly, Active health checks are proactive and can detect issues quickly, but they can also introduce some load on the system being monitored.
- Active Infrastructure health checks are similar to application health checks, but instead of sending a request to the application it sends a request to the infrastructure to deploy a new application or infrastructure component e.g. a new Kubernetes pod or EC2 instance.
- Passive health checks rely on monitoring the activity in the system, analysing it, and detecting anomalies or errors. Passive health checks are less intrusive than active health checks, but they may not detect issues as quickly.
Health checks can be defined in 3 different ways:
- UI: Navigate to Settings → Health → Click on the button
- GitOps canary-checker is fully Gitops enabled using Kubernetes Custom Resource Definitions (CRD)
- CLI For rapid development and feedback, canary-checker can be run as a normal CLI application by specifying the health check definition in a config file.
Canary checker runs health checks on a pre-defined CRON schedule and provides a fully customizable platform that:
- Securely references authentication credentials from Kubernetes secrets and configmaps.
- Parses and transforms the response using JSONPath, Go templates or Javascript to validate, extrapolate or aggregate results.
Check Types
Protocol | Status | Checks |
---|---|---|
HTTP(s) | GA | Response body, headers and duration |
DNS | GA | Response and duration |
Ping/ICMP | GA | Duration and packet loss |
TCP | GA | Port is open and connectable |
Data Sources | ||
SQL (MySQL, Postgres, SQL Server) | GA | Ability to login, results, duration, health exposed via stored procedures |
LDAP | GA | Ability to login, response time |
ElasticSearch / Opensearch | GA | Ability to login, response time, size of search results |
Mongo | Beta | Ability to login, results, duration, |
Redis | GA | Ability to login, results, duration, |
Prometheus | GA | Ability to login, results, duration, |
Alerts | Prometheus | |
Prometheus Alert Manager | GA | Pending and firing alerts |
AWS Cloudwatch Alarms | GA | Pending and firing alerts |
DevOps | ||
Git | GA | Query Git and Github repositories via SQL |
Azure Devops | ||
Integration Testing | ||
JMeter | Beta | Runs and checks the result of a JMeter test |
JUnit | Beta | Run a pod that saves Junit test results |
File Systems / Batch | ||
Local Disk / NFS | GA | Check folders for files that are: too few/many, too old/new, too small/large |
S3 | GA | Check contents of AWS S3 Buckets |
GCS | GA | Check contents of Google Cloud Storage Buckets |
SFTP | GA | Check contents of folders over SFTP |
SMB / CIFS | GA | Check contents of folders over SMB/CIFS |
Config | ||
AWS Config | GA | Query AWS config using SQL |
AWS Config Rule | GA | AWS Config Rules that are firing, Custom AWS Config queries |
Config DB | GA | Custom config queries for Mission Control Config D |
Kubernetes Resources | GA | Kubernetes resources that are missing or are in a non-ready state |
Backups | ||
GCP Databases | GA | Backup freshness |
Restic | Beta | Backup freshness and integrity |
Infrastructure | ||
EC2 | GA | Ability to launch new EC2 instances |
Kubernetes Ingress | GA | Ability to schedule and then route traffic via an ingress to a pod |
Docker/Containerd | Deprecated | Ability to push and pull containers via docker/containerd |
Helm | Deprecated | Ability to push and pull helm charts |
S3 Protocol | GA | Ability to read/write/list objects on an S3 compatible object store |