suspected bug: timeline shows 100% recording
CompletedHi all,
Not 100% sure if this is a bug in NX Witness or in my code, or something else.
I am developing an storage plugin, for backups.
Sometimes (but not always) when a backup runs, the timeline shows solid green, for all the time that is stored on local disk. I can fix this by running "reindex backup", which works for a while, then the problem happens again at some later backup.
Any data that is ONLY on backup shows correctly.
It has happened on 2 different NX Witness servers.
Any ideas what I might look for to work out what is going on?
-
Hi Jort,
could you please give more details about how your plugin works? How does it backup data and how often? Does it also delete footage from disk?
Could you please set server logLevel to DEBUG, wait for the issue to reproduce and attach collected logs here? Here is the article explaining how to increase logLevel and where to find server logs files.
-
Hi Anton, thanks so much for your help.
The plugin implements the storage SDK (similar to the sample ftp storage). Backups are triggered by NX Witness itself (under server settings -> storage management -> backup settings). Deletions happen the same way (controlled by NX Witness).
I am switching on debug logging now, on all three servers.
-
If log level was changed in mediaserver's config, it won't get rolled back due to server restart or system upgrade. You can follow this article explaining how to modify logLevel via server's configuration.
-
Hi Anton,
It's happened again, and this time it was in debug mode.
It happened after May 25 11:57, and before May 27 16:08, localtime.
It happened to the camera "Front Lawn West", 4C-11-BF-9D-E8-D5
As per previous events, the oldest data (in this case, from January to 2 April 2021 shows normally on the timeline (recorded parts are light, not-recorded parts being dark), then from approx 10:21AM on the 2nd of April to right now shows solid light green, implying continuous recording over that time. The thumbnails show gaps.
I assume the next step is to get the logs to you?
-
You can download it from:
https://mergesync.btg.co.nz/index.php/s/onxH98qo4LJ97Ga
Please let me know when you have it, so I can delete it.
Thanks in advance.
-
Try this link: http://www.btg.co.nz/debug-logs.tar.gz
-
Hi Jort,
Please, call the recordedTimePeriods API. Most probably, there will be the chunk with the duration == -1. The Server treats such chunk as not closed and currently being written. In your case, such a chunk would correspond to the green bar in the timeline, i.e. to a chunk being permanently written.
The possible reason for the chunk not being closed is a timeout. I suppose, one or several plugin methods (the one closing a file or the one writing to a file) is responding too slowly or just hangs.
Any ideas what I might look for to work out what is going on?
I'd suggest profiling the plugin and measuring and tracing plugin methods execution timeouts.
-
Hi Andrey,
Thanks so much for your help. I have confirmed that there is indeed a record with a durationMs of -1, and a startTimeMs of 1617312026946, which is the exact time the bar became solid.
I found this file both in the backup and in the backup, both exist and have the correct filename. They are both timestamped in April, and the server was rebooted on May 25. I'm 90% sure the bar was correct after the reboot. I can see the directory got rescanned shortly after the reboot, it took .008 seconds to scan. The rescan is the only thing I can see in the logs which refers to that file.
Is there any way this data would persist after a restart & rescan? What/where?
The plugin is a network-based storage (locally cached), so slow opens/closes/directory reads are entirely plausible. What is the maximum acceptable duration for the various calls?
-
Hello Jort,
Is there any way this data would persist after a restart & rescan? What/where
What data do you mean?
Could you please refactor your plugin methods code for logging the parameters passed and the result returned?
Similar to this
bool Plugin::close()
{
DEBUG("close called for {}", m_fileName); // 2021-06-03 01:04:36.889 DEBUG close called for xxx.mkv
...
DEBUG("close finished for {}. Result is {}", m_fileName, code); // 2021-06-03 01:04:36.948 DEBUG close finished for xxx.mkv. Result is true
}When the issue reoccurs again, please share log files with us again.
-
Hi Andrey,
I'm very sorry, you'll have to help me a little further.
I am working from nx_storage_sdk 1.0, based on the samples/ftp_storage_plugin example.
The nxpl::Plugin class reference does not make any mention that I can see to a close() member function - only name() and setSettings(). Like the ftp_storage_plugin, it does not currently define the plugin class - closing the file is done in ~IODevice
Is there a more recent SDK? The SDK I have appears to be dated Jan 15 this year.
-
-
Hello Jort,
I'm sorry for not responding.
Can you please confirm what is the timeout in NXWitness for the various functions?
I'm facing some hurdles in finding out this from our devs. That's why I need your help.
Could you please share your log files with the debug messages advised for the period when the issui is taking place? It would help me much in presenting the issue and in making questions to be the most specific.
-
Hi Andrey,
After much fixing of bugs, and more testing, I have come to the following conclusion:
The issue does not relate to the storage plugin at all. It appears to be something specific to a particular recording.
Removing this specific file from local storage makes the problem go away, and putting the file back on local storage makes the problem come up again (both after a restart of the service).
One of the two recordings is available for your troubleshooting from: http://www.btg.co.nz/1626495494006_32856.mkv
Just waiting for permission on the other one.
-
Another piece of information:
when the service is restarted, in the NXWitness gui, it says it's doing a "Fast Backup Scan". Under some circumstances this takes approx > 1 minute, and overlaps with the initial backup. If this is the case, the problem is very likely to occur. Once this takes < 1 second, and doesn't overlap with the backup, the problem appears to be fixed. It always takes > 1 minute, or < 1 second, never between.
-
Hi Jort,
I did some tests locally, but couldn't reproduce the issue with .mkv files. Would it be possible to arrange remote access to your test system? I can convert this thread to support ticket for you, so you can share access details securely.
We need to review configuration of your system and collect DEBUG logs. This should help us either to find the cause of the issue or to replicate it in our test lab.
-
Hi Anton,
Yes, please convert this thread to a support ticket.
Also, could you send me your SSH public key, and the ip address you will be connecting from, so we can give you access. If you do not know how to generate an SSH public key, please let me know what platform you are using (Windows/Mac/Linux/Other), I will send you step-by-step instructions.
Thanks in advance,
Jort
Please sign in to leave a comment.
Comments
28 comments