suspected bug: timeline shows 100% recording

Completed

Comments

28 comments

  • Avatar
    Jort Bloem

    Here is a screenshot.

    the area circled in red is data that has (since) expired from local disk.

    0
    Comment actions Permalink
  • Avatar
    Anton Babinov

    Hi Jort,

    could you please give more details about how your plugin works? How does it backup data and how often? Does it also delete footage from disk?

    Could you please set server logLevel to DEBUG, wait for the issue to reproduce and attach collected logs here? Here is the article explaining how to increase logLevel and where to find server logs files.

     

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    Hi Anton, thanks so much for your help.

    The plugin implements the storage SDK (similar to the sample ftp storage). Backups are triggered by NX Witness itself (under server settings -> storage management -> backup settings). Deletions happen the same way (controlled by NX Witness).

    I am switching on debug logging now, on all three servers.

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    FYI:

    It happened again,  but the logfile debug config file change seems to have rolled back, due to an upgrade taking place :(

    0
    Comment actions Permalink
  • Avatar
    Anton Babinov

    If log level was changed in mediaserver's config, it won't get rolled back due to server restart or system upgrade. You can follow this article explaining how to modify logLevel via server's configuration.

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    Hi Anton,

    It's happened again, and this time it was in debug mode.

    It happened after May 25 11:57, and before May 27 16:08, localtime.

    It happened to the camera "Front Lawn West", 4C-11-BF-9D-E8-D5

    As per previous events, the oldest data (in this case, from January to 2 April 2021 shows normally on the timeline (recorded parts are light, not-recorded parts being dark), then from approx 10:21AM on the 2nd of April to right now shows solid light green, implying continuous recording over that time.  The thumbnails show gaps.

    I assume the next step is to get the logs to you?

    0
    Comment actions Permalink
  • Avatar
    Anton Babinov

    Hi Jort, yes please send me the DEBUG logs. I'm going to to analyze them and propose next steps.

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    What's the easiest way to get these to you? They won't fit in a post here, and I'm hesitant to put them on a public forum.

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    we have our own online storage, so I would just need a way to get the URL to you.

    0
    Comment actions Permalink
  • Avatar
    Andrey Terentyev

    Hi Jort,

    Just place the files on any publicly accessible file service (Google Disk etc.) and send a link here in the post.

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    You can download it from:

    https://mergesync.btg.co.nz/index.php/s/onxH98qo4LJ97Ga

    Please let me know when you have it, so I can delete it.

    Thanks in advance.

    0
    Comment actions Permalink
  • Avatar
    Andrey Terentyev

    Hello Jort,

    Something went wrong. No archive on the link.

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    Try this link: http://www.btg.co.nz/debug-logs.tar.gz

    0
    Comment actions Permalink
  • Avatar
    Andrey Terentyev

    Jort,

    I've got those.

    0
    Comment actions Permalink
  • Avatar
    Andrey Terentyev

    Hi Jort,

    Please, call the recordedTimePeriods API. Most probably, there will be the chunk with the duration == -1. The Server treats such chunk as not closed and currently being written. In your case, such a chunk would correspond to the green bar in the timeline, i.e. to a chunk being permanently written.

    The possible reason for the chunk not being closed is a timeout. I suppose, one or several plugin methods (the one closing a file or the one writing to a file) is responding too slowly or just hangs.

    Any ideas what I might look for to work out what is going on?

    I'd suggest profiling the plugin and measuring and tracing plugin methods execution timeouts.

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    Hi Andrey,

    Thanks so much for your help. I have confirmed that there is indeed a record with a durationMs of -1, and a startTimeMs of 1617312026946, which is the exact time the bar became solid.

    I found this file both in the backup and in the backup, both exist and have the correct filename. They are both timestamped in April, and the server was rebooted on May 25. I'm 90% sure the bar was correct after the reboot. I can see the directory got rescanned shortly after the reboot, it took .008 seconds to scan. The rescan is the only thing I can see in the logs which refers to that file.

    Is there any way this data would persist after a restart & rescan? What/where?

    The plugin is a network-based storage (locally cached), so slow opens/closes/directory reads are entirely plausible. What is the maximum acceptable duration for the various calls?

    0
    Comment actions Permalink
  • Avatar
    Andrey Terentyev

    Hello Jort,

    Is there any way this data would persist after a restart & rescan? What/where

    What data do you mean?

    Could you please refactor your plugin methods code for logging the parameters passed and the result returned?

    Similar to this

    bool Plugin::close()
    {
    DEBUG("close called for {}", m_fileName); // 2021-06-03 01:04:36.889 DEBUG close called for xxx.mkv
    ...
    DEBUG("close finished for {}. Result is {}", m_fileName, code); // 2021-06-03 01:04:36.948 DEBUG close finished for xxx.mkv. Result is true
    }

    When the issue reoccurs again, please share log files with us again.

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    Hi Andrey,

    I'm very sorry, you'll have to help me a little further.

    I am working from nx_storage_sdk 1.0, based on the samples/ftp_storage_plugin example.

    The nxpl::Plugin class reference does not make any mention that I can see to a close() member function - only name() and setSettings(). Like the ftp_storage_plugin, it does not currently define the plugin class - closing the file is done in ~IODevice

    Is there a more recent SDK? The SDK I have appears to be dated Jan 15 this year.

     

    0
    Comment actions Permalink
  • Avatar
    Andrey Terentyev

    Jort,

    Sorry, my bad.

    I meant the Storage class and the IODevice class methods.Regarding file closing. Files should be closed in the IODevice::releaseRef method along with other finalizing activities. Place the debugging code here, too.

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    Hi Anton,

    Can you please confirm what is the timeout in NXWitness for the various functions? That way I can align my plugin's timeout accordingly.

    Jort

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    Bump? @Andrey Terentyev

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    Andrey Terentyev

    0
    Comment actions Permalink
  • Avatar
    Andrey Terentyev

    Hello Jort,

    I'm sorry for not responding.

    Can you please confirm what is the timeout in NXWitness for the various functions?

    I'm facing some hurdles in finding out this from our devs. That's why I need your help.

    Could you please share your log files with the debug messages advised for the period when the issui is taking place? It would help me much in presenting the issue and in making questions to be the most specific.

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    Hi Andrey,

    I'm recording the timing now, and a first look at the data shows things that are taking a long time,but shouldn't be.

    I'm going to fix these changes first.

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    Hi Andrey,

    After much fixing of bugs, and more testing, I have come to the following conclusion:

    The issue does not relate to the storage plugin at all. It appears to be something specific to a particular recording.

    Removing this specific file from local storage makes the problem go away, and putting the file back on local storage makes the problem come up again (both after a restart of the service).

    One of the two recordings is available for your troubleshooting from: http://www.btg.co.nz/1626495494006_32856.mkv

    Just waiting for permission on the other one.

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    Another piece of information:

    when the service is restarted, in the NXWitness gui, it says it's doing a "Fast Backup Scan". Under some circumstances this takes approx > 1 minute, and overlaps with the initial backup. If this is the case, the problem is very likely to occur. Once this takes < 1 second, and doesn't overlap with the backup, the problem appears to be fixed. It always takes > 1 minute, or < 1 second, never between.

    0
    Comment actions Permalink
  • Avatar
    Anton Babinov

    Hi Jort,

    I did some tests locally, but couldn't reproduce the issue with .mkv files. Would it be possible to arrange remote access to your test system? I can convert this thread to support ticket for you, so you can share access details securely.

    We need to review configuration of your system and collect DEBUG logs. This should help us either to find the cause of the issue or to replicate it in our test lab.

    0
    Comment actions Permalink
  • Avatar
    Jort Bloem

    Hi Anton,

    Yes, please convert this thread to a support ticket.

    Also, could you send me your SSH public key, and the ip address you will be connecting from, so we can give you access. If you do not know how to generate an SSH public key, please let me know what platform you are using (Windows/Mac/Linux/Other), I will send you step-by-step instructions.

    Thanks in advance,

     

    Jort

    0
    Comment actions Permalink

Please sign in to leave a comment.