NX Server keeps crashing

In Progress

Comments

18 comments

  • Avatar
    Norman - Nx Support

    Hi Osku Mattinen,

    Can you tell a bit more about the issue and the environment as described in THIS support article?
    Once I got the story complete, we'll investigate the issue.

    0
    Comment actions Permalink
  • Avatar
    Osku Mattinen

    The environment consists of 6 servers in a VLAN, five of them running ubuntu 20.04 LTS plus one temporary windows server that will be removed later on. All servers are running NX server version 5.0.0.35745, overall recording about 100 cameras.

    2 of the servers are on a remote site, we dont have administration over the clients network so cannot say for sure how they are connected, but they are in the same VLAN as the others.

    Server 1, which is the one that keeps crashing is our main server.

    Server 2, the one running windows serves as a temporary backup for the main server.

    Servers 3 & 4 are running on a different site as a pair, althought the network is physically connected via fiber optic to the main server.

    Servers 5 & 6 run as a pair on the remote site mentioned above.

    Problem is that the main server crashes often, say 2-3 times a week, and comes back online after a minute or two.

    We had a similiar problem before but that turned out to be caused by duplicate IPs. But in this case this doesnt seem to be the problem.

    Below is a quick schematic displaying the rough setup of the system, and a picture of the event log.

    0
    Comment actions Permalink
  • Avatar
    Norman - Nx Support

    Hi Osku Mattinen,

    I shared the dump files with our developers for further investigations.
    I'll provide you with an update as soon as I know more.

    JIRA-VMS-37629

    0
    Comment actions Permalink
  • Avatar
    Norman - Nx Support

    Hi Osku Mattinen,

    We found what caused the crashes, but not yet the whole story.

    Can you tell me which analytics plugin and analytics rules you are using?

    Also, can you share the ecs.sqlite database?
    This can be found here:

    /opt/networkoptix/mediaserver/var/ecs.sqlite

     

    0
    Comment actions Permalink
  • Avatar
    Osku Mattinen

    Hi,

    ecs.sqlite in the link below

    <LINK_REMOVED>

    By analytics plugin, do you mean 3rd party server analytics or camera analytics?

    We are running Zabbix on servers 3-6, but the server in question doesnt have zabbix installed.

    Also all new dahua outdoor cameras have object detection on, but nothing too fancy in terms of rules. Pictures below

    0
    Comment actions Permalink
  • Avatar
    Norman - Nx Support

    Hi Osku Mattinen,

    Thanks. I'll share the database with our developers for further investigations.

    I meant the in-camera analytics plugin indeed, so I'll let them know you're using the Dahua one.

    0
    Comment actions Permalink
  • Avatar
    Norman - Nx Support

    Hi Osku Mattinen,

    Our developers checked the dump files, but unfortunately couldn't find the cause of the crash.
    So they added some additional logging functions to our source code and I have created a private patch with these changes included.

    Could you update your system with the following in-client update credentials:

    Build Number: 36275
    Password: hty51o

    Once updated, can you please provide, once the issue occurred again:

    Thank you.

        
        

         
         
    0
    Comment actions Permalink
  • Avatar
    Osku Mattinen

    Hi.

    I tried to apply the update, servers 1 & 2 are OK but the rest of them failed, and it seems like im unable to revert them back to the previous build. I can install the previous build, but after about 2 minutes the service stops and im unable to restart it.

    Should i revert the main server first and then the others?

    Edit:

    I think i found the problem that made the 4 remaining servers crash with the custom patch; they are running on 22.04 ubuntu, while the main server is on 20.04.

    Edit 2:

    Managed to revert the system, crisis averted. Server 2 now detached, since it is a windows machine and i dont have means to revert that remotely

     

    0
    Comment actions Permalink
  • Avatar
    Norman - Nx Support

    Hi Osku Mattinen,

    That makes sense. Ubuntu 22.04 LTS isn't supported yet, and I have noticed crashing applications with 22.04 as well, hence the reason I'm still using Ubuntu 20.04 LTS on my laptop and servers.

    Starting from v5.1, Ubuntu 22.04 LTS will be supported.

    0
    Comment actions Permalink
  • Avatar
    Osku Mattinen

    Hi,

    Can the ubuntu version on servers 3 to 6 be the reason for server 1 dropping connections?

    0
    Comment actions Permalink
  • Avatar
    Norman - Nx Support

    Hi Osku Mattinen,

    Yes. Communication might stall or never arrive due to 22.04 LTS.

    0
    Comment actions Permalink
  • Avatar
    Osku Mattinen

    Hello,

    This explains it. System works locally so well just wait for the release and see if that fixes it.

    I had to do a second reinstall of nx on the servers, and had to remove the whole /opt/networkoptix path. 

    System stays online now, but this resulted in a Server certificate error, due to licenses being tied to hardware ID:s 

    Is there a way for me to activate the licenses again myself?

     

    0
    Comment actions Permalink
  • Avatar
    Norman - Nx Support

    Hi Osku Mattinen,

    A certificate error is easy to resolve; delete the current certificate, and restart the mediaserver daemon.with:

    $ sudo systemctl restart networkoptix-mediaserver.service

    Licences are tied to hardware IDs (hwid), which are independent of the certificates.
    Normally, hwid should not change. If, for some reason, the hwid did change, please contact your reseller to assist you with deactivating and activating the licences once again.

    If you need temporary licenses keys to continue to record, please let me know the number of channels, and I'll provide you with the proper temporary licence keys.

     

    0
    Comment actions Permalink
  • Avatar
    Osku Mattinen

    It seems like im unable to get rid of the certificate error, licenses work locally on the devices so recording resumes. I'll get in touch with the customers IT next week to see whether they are the ones assigning certificates to the domain.

    0
    Comment actions Permalink
  • Avatar
    Norman - Nx Support

    Hi Osku Mattinen,

    That's unfortunate.

    Please let me know if we can be of any assistance if the IT dept. can't help you. 
    We can always check through TeamViewer if we can do something to resovle the issue.

    0
    Comment actions Permalink
  • Avatar
    Osku Mattinen

    Hi.

    Got an answer, customers IT is not assigning the certificates, so Im back here.

    I was able to revert the windows server back to build 35745, so now all of the servers are back running on the same version. I went ahead and deleted the default.pem license files and restarted all of the servers, but still get the certificate error. 

    All of the systems behave the same way, but when connected to the NX running on the windows server, it shows a red lock icon on server 4 and on the web admin it reports an "unauthorized" status for server 4, ill post a pic below to show the difference.

    Edit: We dont have teamviewer directly to the customer, but you can connect to my computer and use it as a proxy.

    0
    Comment actions Permalink
  • Avatar
    Osku Mattinen

    Hello,

    Is there any progress on this issue?

    This is a big customer and their users are starting to get frustrated with this particular problem. 

    0
    Comment actions Permalink
  • Avatar
    Norman - Nx Support

    Hi Osku Mattinen,

    Please note that the response times in the support community might vary, see also this topic.

    Can you share a screenshot of the certificate issue?

    0
    Comment actions Permalink

Please sign in to leave a comment.