What is Health Monitoring?
Health Monitoring is a tool that allows you to see if your System is in good shape and displays information such as the performance of the System and if any errors have occurred. It makes managing systems regardless of their size much easier, while also allowing our support engineers to quickly see what's going on when you open a request.
Nx Witness servers independently collect their Health monitoring metrics, which the System then aggregates into a single view. This information is then accessible through the Server API.
It allows you to get a quick snapshot of your system, which is particularly useful in larger systems with numerous servers and devices.
What goes into Health Monitoring?
A manifest is a list of the available parameters with their attributes. It specifies the fields that you can view for health monitoring and dictates what information is collected and shown regarding the Servers. For example, it can include information about the system such as: number of servers, number of users, etc.
The manifest is subject to change from version to version to accommodate user demand or feedback. Future versions may include additional parameters that the System will collect.
Metrics are the parameters of different entities (servers, cameras, storages) in the system that can give important information about the state of this entity. All metrics are aimed at helping investigate problems that could happen during system exploitation.
There are five types of metrics (refer to the Nx Witness user manual for more info):
- System-level
- Servers — number of servers in the System.
- Camera channels — number of camera channels in the System.
- Storage locations — number of storage locations in the System.
- Users — number of users in the System.
- System Version — Nx Witness server version.
- Server-level
- Server Availability — status of server, events count, uptime, etc.
- Server Load — CPU usage, RAM usage, number of threads and devices, etc.
- Server Info — public IP, OS, OS/VMS time, number of cores, amount of RAM, etc.
- Server Activity — transactions per second, Event rule activations per second, active plugins list, etc.
- Camera-level
- Camera Info — name of the camera, name of the server, type of device, etc.
- Camera Availability — status of the device, number of times it went offline, number of times it had issues, etc.
- Primary Stream — resolution of Primary Stream, actual FPS, and avg FPS drop.
- Secondary Stream — resolution of secondary stream, actual FPS, and avg FPS drop.
- Storage Analytics — length of all archived footage from the camera, and the bitrate of the archive.
- Storage-level
- Storage info — storage location path, name of the server the storage is installed on, and types of storage being used.
- Storage state — current status of the storage device and the number of storage issue events within the past 24 hours.
- Storage activity — storage device read/write rate per second.
- Storage space — total size of the storage in GB and the amount of storage space occupied by data.
- Network-level
- Name of the network interface
- Network info — name of the server the network interface is installed on, status of network interface, and IP of the network interface.
- I/O rates — the amount of data received (in) and sent (out) per second in KB.
The “state” of a system is changed when the Warning level or Danger level threshold is reached.
When the threshold is reached, alerts are shown.
An alert is the representation of a metric in the wrong state. It is shown when any Metrics threshold (rule) is reached. The two types of alerts are the following:
- Error — shown when the error-level Metric threshold is reached.
- Warning — shown when the following two conditions are true: warning-level Metric threshold is reached AND error-level Metric threshold is NOT reached.
Alerts belong to the category respective to their component; a few examples can be the following:
- System alerts — e.g. Maximum number of Servers or channels per System reached.
- Server alerts — e.g. Offline event, high CPU/RAM usage, logging level status, encoding threads greater than 2, etc.
- Camera alerts — e.g. Camera offline event, IP conflict, frame drop, etc.
- Storage alerts — e.g. Storage inaccessible or offline, storage issue in the last 24 hours, etc.
API Calls for Health Monitoring
In order to perform the API call for Health Monitoring, you must be in one of the following user groups: Cloud admins, administrators, or owners.
Review our API Documentation to learn more about the API calls regarding Health Monitoring: https://localhost:7011/static/api.xml
- GET /ec2/metrics/alarms — Returns the currently active alarms.
- GET /ec2/metrics/manifest — Returns the manifest for GET /ec2/metrics/alarms and GET /ec2/metrics/values visualization.
- GET /ec2/metrics/values — Returns the current state of the values.
- GET /api/aggregator — This function allows to execute several requests with json content type and returns the result as a single JSON object.
For example, you will find this info when you look up GET /ec2/metrics/manifest.
The purpose of an aggregated API call is to combine multiple HTTP requests into one request.
For example, the aggregated API call below (as shown in the URL) is a combination of the following API calls:
- /ec2/metrics/manifest
- /ec2/metrics/values
- /ec2/metrics/alarms
Aggregated API Call:
/api/aggregator?exec_cmd=ec2%2Fmetrics%2Fmanifest&exec_cmd=ec2%2Fmetrics%2Fvalues&exec_cmd=ec2%2Fmetrics%2Falarms
Comments
0 comments
Article is closed for comments.