How does the scalability of 3.2 compare to 4.0+?
One of the biggest features of Nx Witness 4.0 is a significant improvement in the maximum System scalability. The synchronization module was re-engineered. The set of automated scalability tests was developed, which is run on the scalability testing infrastructure on a regular basis.
These updates to Nx Witness scalability will prevent situations similar to the following:
- 20+ individual servers in the System with a 20% CPU load.
- Merging all servers into a single system, leading to a significant increase in CPU load.
- Adding more servers and causing CPU utilization to reach 100% -- resulting in a denial of service.
The chart below compares the maximum recommended numbers for Nx Witness 3.2 and Nx Witness 4.0. Please continue reading for more specific points of comparison and details of the testing methodology used.
Maximum Recommended Numbers |
Nx Witness 3.2 |
Nx Witness 4.0 |
Servers per system |
20 |
100 |
Cameras per server |
128 |
128 |
Cameras per system |
1,000 |
10,000 |
Tests and measurements
Once a user changes something to the System configuration or a new camera is discovered by the server or a camera changes its status, this information is added to the main database and this change is distributed across all servers in the System. This process is called Hive synchronization.
Synchronization changes are done with transactions. One transaction can normally be represented by one change in the System.
Let's consider a System consisting of N servers. The challenge is that in a worst-case scenario there are (N-1)*N/2 connections between servers in the System. If N=100, it would mean 4950 connections total.
In order to guarantee that information is delivered to all the servers, a synchronization algorithm must be very efficient and must not rely on any specific connection between two server instances.
Typically, most transactions are generated by camera status changes (offline/online), user-made configuration changes and time synchronization.
The following criteria were formulated to be referent during System design and testing. A System is considered to be functional and scalable if three conditions are met:
- A number of servers started simultaneously can merge into a single System in less than 30 minutes. This time is called synchronization time.
- UnSynchronized transaction queue does not grow and sync time of one transaction is less than 2 mins. This time is called transaction propagation time.
- The client can connect to the System in less than 1 min. This time is called client connection time.
- Servers do not produce storage issues.
- There are no gaps in the recorded video.
Testing environment and process
Testing environment and architecture
In general, the video network and management network have to be separated in order to avoid the influence of video traffic on the manageability of the System. In scalability tests, the dedicated management network is implemented.
For scalability tests, AWS is used. Nx Witness servers are deployed on multiple AWS EC2 t3.medium virtual machines (VMs). One server per VM. All the VMs are interconnected to one flat switched network: no routers, no proxies.
The Nx Desktop Client runs on a separate AWS EC t3.medium VM. All actions with servers are performed using the REST API: merge requests, resource creation, and change, resource status requests. All collected metrics are stored to elasticsearch and are available as graphics and tables for online monitoring.
Testing process stages
The test performs a number of activities divided into several stages.
Creation of virtual infrastructure in AWS.
At this stage the VMs for servers, the virtual switched network, the client’s VM, the Test Agent VM are created. Test Agent VM is a VM which executes and controls actual scripts performing the server manipulations and measurements of parameters.
Creation of resources on servers.
Servers are populated with several test resources. Resource number and size reflects the usual number and size met in real installations.
Resource |
REST API call |
Number |
User |
ec2/saveUser |
~1000/S |
Storage |
ec2/saveStorage |
1 |
Camera parameters |
ec2/setResourceParams |
5 |
Camera |
ec2/saveCamera |
20 or 100 |
ec2/saveCameraUserAttributes |
There are four special servers set up and configured. They record video streams of 100 test cameras each and feed in a total of 500 video streams to client-emulating RTSP connections.
Merge of the servers
The hive of all connected servers, cameras, and other devices is called the System. The client can connect to any server and work with the entire system through the connected server. There is no "master" server, as a result, there is no single point of failure.
At this stage, the merge of servers into a single System is performed: merge initiation, synchronization of resources between all servers, and synchronization time measurement.
Synchronization time is the time in which the configuration of each server, once added to the System, is copied to all servers in the System. In other words, the time in which the slowest (in our test it is the most loaded server) server gets synchronized.
Transaction stage
After synchronization is complete, during a certain period of time the changes are made in test resources at a specific rate thus generating transactions. Usually, the rate is one change per 100 or 220 seconds per resource. The test measures transaction propagation time, namely how long it takes the transaction data to reach every server in the System.
What is synchronized:
Here is the easiest way to think about what is synchronized and what is not: everything a user can configure and adjust is synchronized, everything a server generates itself is not synchronized (with a few exceptions).
If one server goes offline for some reason, the rest of the servers in the System are still fully functional. This is because each server has consistent identical configurational data on:
- Cameras (all camera settings: recording schedule, codecs, and resolutions) and camera statuses (offline, online, and recording). This information is frequently used by a specific server so the server can handle a failover scenario, autodiscovery (do not discover the same camera multiple times), and quickly navigate to a specific camera.
- Users (each server has to keep all users and permissions information to authenticate users.)
- Layouts
- Event rules
- System time
- Video walls, web pages, storage settings.
What is not synchronized:
- Video/audio data (no need for the server to know what video is recorded on other servers until the user requests it.)
- Bookmarks (bookmark log is assembled on the fly while the user views the timeline.)
- Events (events log is merged on the fly from many servers.)
- Audit log
Client connection
During the transaction stage, in a separate Linux VM, the Nx Desktop Client begins connecting to one of the servers. The Client connects with different user accounts, since the account determines the number of checks to be performed, because of different permissions to a diverse set of resources.
Measurements
Synchronization time
The synchronization time is calculated as the difference between the moment the merge of servers is initiated at and the time at which these conditions are met sequentially:
- There are no more messages in the message bus on several servers;
- The configuration of all servers is identical.
Transaction propagation time
In the transaction stage two activities are performed simultaneously:
- Resource state change is posted using REST API:
- Every camera’s state is switched between 'Online' and 'Offline'.
- The 'scalability-stamp' property is changed for one of the cameras by setting the value to the current date/time.
- Events from several servers are monitored by connection to the message bus.
The 'scalability-stamp' property of every camera once arrived is checked against the current date and time.
The transaction propagation time is calculated as the difference between the current date/time and date/time value received in the property.
Client connection
During the transaction stage, the test measures the client’s connection time repeatedly.
The connection time is calculated as the difference in time between the moment the client logs into a server and the moment the client receives the first video frame.
S - the number of servers in the system.
С - the number of cameras per server.
R - the total number of resources in the system, R=S*(C+US).
US - the number of users per server (~1000/server)
TR (Transaction rate) - how frequently the transaction is generated for each resource (once per 100 or 220 seconds).
Rate/Server - the number of generated transactions per second per Server, R/S/TR.
Servers do not produce storage issues
The Test Agent listens to the message bus for error messages. There should be no errors related to Storage.
There are no gaps in the recorded video
For each camera, the API method "ec2/recordedTimePeriods" must return a single chunk with start position at the recording start and end position at the timestamp of the test stop.
Limitations
The final numbers really depend on how the System is used. In these tests the following aspects were not taken into account:
- The number of bookmarks that are created per second;
- The number of rules in the System;
- How many client connections are actually planned;
- Scenarios of client connections;
- Specific network topology;
- Custom set of rules in the “Routing Management” table;
- Extra CPU load that might affect a significant part of the servers due to third-party plugins.
- The content of this article only applies if the recommended server specifications are followed. The specifications can be found through the following article Nx Witness Server Hardware Specs and Nx System Calculator.
- For ARM servers other parameters apply. Please follow the information in the ARM Support Policy.
Questions
If you have any questions related to this topic or you want to share your experience with other community members or our team, please visit and engage in our support community or reach out to your local reseller.
Comments
0 comments
Article is closed for comments.