How do I design and implement a large system?
Large systems are comprised of
- >= 20 servers and/or
- >= 50 TB footage storage per server and/or
- >= 500 Mbit/s Rx+Tx network interface streaming load.
Important aspects we must consider are performance, reliability, and configuration optimization of these VMS-based system
The picture above represents a logical diagram of the System. The surveillance system contains cameras, servers, operator terminals (using VMS clients), and a network that mediates the interaction between all of them.
The most important function of the System is to acquire video streams, store them, and provide access to the recorded footage by request.
Discovery is the process of establishing communication with cameras and other devices that are new to the system. It reduces the necessity to add a new device to the existing system and undergo the configuration process. Discovery is a continuous process, devices already discovered by the Server have their online status refreshed then filtered out from the result.
Camera drivers are developed in order to maximize the amount of utilized features from a wide range of camera models. Each driver's purpose is to use the maximum of camera’s features, set it up, allow PTZ usage, etc.
HTTP/RTSP front-end allows the VMS server to interact with clients and other VMS servers and also broadcast video (either live or recorded footage).
The Server application is a cross-platform application that works inside the operating system’s environment. In this article we must also consider the impact of OS components on performance and reliability.
What are some potential performance bottlenecks?
Server-server network connection
Each server in the system can be used to re-configure the entire system (such as adding users, changing recording schedules, and so on), so servers must communicate within the system.
Servers also communicate in order to monitor each other. If one server failed or shut down intentionally, the other servers must catch up its cameras and continue recording.
Having a connection between servers is necessary to provide availability in complex network environments. If one server cannot establish a direct peer-to-peer connection, it may use other servers as a proxy.
Client-server network connection
The client connects to the Server application in order to manage the System and watch footage or live video. The Desktop client subscribes to the stream as it is stored on the server’s drive whereas both Web client and Mobile client require transcoding on the Server’s side.
Server-cameras network connection
Generally speaking, there are two data streams between the server and the camera: management and video stream. The management stream is used for configuring the stream and camera, while the video stream contains the picture itself. Both data streams are used to determine the availability of the camera. The video stream is very susceptible to network losses and jitter. A subpar network leads to worse footage quality and fullness and excessive error messages.
Block devices are used to store the footage and footage index along with persistent data storage (internal database). The load on the storage increases when
- a user requests to re-index the archive;
- the daily index rebuild happens; or
- the VMS client requests high-resolution footage recording for layout with many cameras.
It is unusual for the VMS to require extensive CPU usage, but some clients require the VMS server to prepare a stream for them. Transcoding is enabled for the stream that is either requested by the VMS Web client, Mobile client, or via the VMS server API. A good rule of thumb is that two 1080p streams at 30 fps will load one CPU kernel.
Failover is a feature that allows healthy servers to take over cameras that used to belong to a server that failed.
It is necessary to have direct IP network connectivity between cameras of a failed server and at least one healthy server, otherwise failover is not viable. In complex network environments it is necessary to set up a process of verifying the network reliability, that proves that in case of server failure network links towards other servers will withstand excessive streaming load. The excessive load is created by the redistribution of streams that previously had been transmitted to the failed server.
Server capacity considerations
Not only network, but healthy servers are also stressed if one of the servers in a System has failed. This involves a higher load on storage and CPU due to the redistribution of client connections. Depending on planned fault tolerance level it is recommended to leave a capacity reserve on hardware resources of every server in a system.
Important things to remember
Large systems usually require more scrupulous capacity planning and system robustness verification. For the sake of this following points must be considered.
- Use robust network connections, especially between servers. The VMS still can work in case of connectivity failure, but that failure never can be considered as a normally functioning environment.
- Throughput of connections between all servers must slightly exceed the total bitrate of all the cameras of the most loaded server. This measure guarantees system can successfully bear failure of every single server.
- Pay special attention to completely avoid losses and jitter in the network between cameras and servers. Such losses influence the quality of footage drastically as in most cases lost streaming data can not be recovered.
- If the customer uses mobile client or Video API calls, consider using better performing CPUs.
- Use monitoring for continuous evaluation of the network environment, hardware status, current load (CPU, storage and RAM) and application status.
- Have a failure recovery plans that contain a model of threats and appropriate actions to mitigate each.
If you have any questions related to this topic or you want to share your experience with other community members or our team, please visit and engage in our support community or reach out to your local reseller.