How do I design and implement a large System?
Large systems are comprised of
- >= 10 servers and/or
- >= 50 TB footage storage per server and/or
- >= 500 Mbit/s Rx+Tx network interface streaming load.
Important aspects we must consider are performance, reliability, and configuration optimization of these VMS-based systems
The picture above represents a logical diagram of the Site. The surveillance system contains cameras, servers, operator terminals (using VMS clients), and a network that mediates the interaction between all of them.
The most important function of the Servers are to acquire video streams, store them, and provide access to the recorded footage by request.
Discovery is the process of establishing communication with cameras and other devices that are new to the System. It reduces the necessity to add a new device to the existing system and undergo the configuration process. Discovery is a continuous process, devices already discovered by a Server have their online status refreshed then filtered out from the result.
Camera drivers are developed in order to maximize the amount of utilized features from a wide range of camera models. Each driver's purpose is to use the maximum of camera’s features, set it up, allow PTZ usage, etc.
HTTP/RTSP front-end allows the VMS Mediaservers to interact with the Clients and other VMS Mediaervers and also broadcast video (either live or recorded footage).
The Mediaserver application is a cross-platform application that works inside the operating system’s environment. In this article, we must also consider the impact of OS components on performance and reliability.
What are some potential performance bottlenecks?
Server-server network connection
Each Server in the Site can be used to re-configure the entire Site (such as adding users, changing recording schedules, and so on), so Servers must communicate within the Site.
Servers also communicate in order to monitor each other. If one Server fails or is shutdown intentionally, the other servers must catch up its cameras and continue recording.
Having a connection between the Servers is necessary to provide availability in complex network environments. If one Server cannot establish a direct peer-to-peer connection, it may use other Servers as a proxy.
Client-server network connection
The Client connects to the Server application in order to manage the Site and watch footage or live video. The Desktop client subscribes to the stream as it is stored on the Server’s storage drives, whereas both Web client and Mobile client require transcoding on the Server’s side.
Server-cameras network connection
Generally speaking, there are two data streams between the Server and the camera: management and video stream. The management stream is used for configuring the stream and camera, while the video stream contains the picture itself. Both data streams are used to determine the availability of the camera. The video stream is very susceptible to network losses and jitter. A subpar network leads to worse footage quality and usefulness, and excessive error messages.
Storage throughput
Block devices are used to store the footage and footage index, along with persistent data storage (internal database). The load on the storage increases when
- a user requests to re-index the archive;
- the daily index rebuild happens; or
- the VMS client requests high-resolution footage recording for layout with many cameras.
CPU/RAM capacity
It is unusual for the VMS to require extensive CPU/RAM usage, but some clients require the VMS server to prepare a stream for them. Transcoding is enabled for the stream that is either requested by the VMS Web client, Mobile client, or via the VMS server API. A good rule of thumb is that two 1080p streams at 30 fps will load one CPU kernel.
Failover
Failover is a feature that allows healthy servers to take over cameras that used to belong to a Server that failed.
Network considerations
It is necessary to have direct IP network connectivity between cameras of a failed server and at least one healthy server, otherwise failover is not viable. In complex network environments it is necessary to set up a process of verifying the network reliability, that proves that in case of Server failure network links towards other Servers will withstand excessive streaming load. The excessive load is created by the redistribution of streams that previously had been transmitted to the failed Server.
Server capacity considerations
Not only network, but healthy servers are also stressed if one of the servers in a System has failed. This involves a higher load on storage and CPU due to the redistribution of client connections. Depending on planned fault tolerance level, it is recommended to leave a capacity reserve on hardware resources of every server in a system.
Important things to remember
Large Systems usually require more scrupulous capacity planning and system robustness verification. For the sake of the following points must be considered.
- Use robust network connections, especially between servers. The VMS still can work in case of connectivity failure, but that failure never can be considered as a normally functioning environment.
- Throughput of connections between all servers must slightly exceed the total bitrate of all the cameras of the most loaded server. This measure guarantees the system can successfully bear failure of every single server.
- Pay special attention to completely avoid losses and jitter in the network between cameras and servers. Such losses influence the quality of footage drastically, as in most cases lost streaming data can not be recovered.
- If the customer uses mobile client or Video API calls, consider using better performing CPU's.
- Use monitoring for continuous evaluation of the network environment, hardware status, current load (CPU, storage and RAM) and application status.
- Have a failure recovery plans that contain a model of threats and appropriate actions to mitigate each.
Comments
0 comments
Article is closed for comments.