Future (or current?) replacement for Nagios?

Date April 15, 2009

I was unaware of this project, but thanks to stephenpc on twitter, I read an excellent (if a bit dated) article on RootDev which brought OpenNMS to my attention as a possible replacement for nagios.

I was mildly surprised, mostly because I haven't been shopping for a Nagios replacement since I installed and configured Nagios 3. I had looked at Zenoss as a possibility, but decided to stick with Nagios since I was already familiar with the configuration routine (and 3.x had some great improvements in that regard).

Judging by Craig's comments in that article, openNMS solves problems that I don't have, like so many hosts or services that you can't get to the bottom of the summary page without it refreshing. That's a problem I'm glad I don't have, but I know that some of you run very large networks. So my question would be, what do you large-network guys use for monitoring? And if you have a small network, do you use monitoring? I remember back to the time before I knew about monitoring solutions like Nagios, and it scares me to death. I would actually manually check on important services to make sure they were up, and that was my only way of doing it.

Of course, in my defense I was young and naive then, and didn't even backup things to tape. Ah, the folly of youth.

  • Marc Cote

    I'm a smaller sized network/systems guy, we use Nagios and Solarwinds Orion NPM.

  • Matt

    @Marc

    Cool, thanks for commenting. I've looked at some demos of Solarwinds stuff, and they look really nice. I just wish they ran on something other than a Windows platform. Nice software, though.

  • Dan C

    But, it's Java..

    Cacti ;)

  • Jon

    Over here we use Nagios, which has it's configuration generated from our asset tracking database.

    I think we're probably getting close to the point where the host list view won't finish loading before it refreshes, but I don't think anyone uses that view anyway. Our NOC has a page from the asset database up on a big screen which will display any alerts that come in, and allows them to hide anything that's being dealt with.

    Our internal systems are monitored using ZenOSS, which allows us to track much more, and is also what the ops guys know how to use.

  • DigitalBricklayer

    Hyperic would be worth a look... it is Java based like OpenNMS. runs on windows and pretty much everything else. In a relatively recent comparison by Jane Curry Hyperic came out on top.

  • Bill

    About 15 servers and 30 users here. I started using the free version of Zenoss in January and I love it. Took some learning to get everything configured, but once I "got it", I've been off to the races ever since.

  • David

    As a contractor, I have a bunch of client sites I am responsible for. Each one gets Nagios for availability monitoring, and Cacti for drawing the pretty graphs that the suits like to see when we ask for more hardware :)

    At head office, we have a nagios monitoring (today) 112 hosts and 183 services, and yes, the update-before-you've-finished-looking-at-a-scrolled-page problem is annoying -- but not annoying enough (yet) to do anything about.

  • Laura Kedziora

    What if you could use Nagios from your wireless device? Would you use it then?

  • Scott

    You guys know you can set the refresh rate for that page, right? By default it's set to 90 seconds, but you can change that in the cgi.cfg file:

    refresh_rate=90

    We're currently running 183 hosts with 839 services. Originally we were running this as a virtual machine, but we started running into problems keeping up with the 5 minute schedule most of the services are on (especially since we've started using Munin for the pretty graphs on the same box) so we bought a cheapo dell 1950 server. There's plenty of spare power now.

  • -dsr-

    We use mon, with a config generated from individual text-file stanzas. This is pretty good until you have about a hundred services to check on.

  • mray
  • Vide

    I can't understand why Zabbix is so underrated... it simply kicks nagios' ass 'round the clock. Almost all the configuration is DB based and with a couple of trick you can reach 99% (so, no touching files on the clients at all after the first setup), it gives you integrated charts and statistics, it's 100% opensource and freesoftware, it has clients for lots of OSes... really, I love Zabbix :)

  • Jon

    How does Zabbix handle really big systems? Our Nagios setup is currently doing 1992 hosts, with 2048 services - to get checks every 5 minutes we have 4 monitoring slaves which do nothing but report back to the masters.

  • Anonymous

    We use OpenNMS for a truly FOSS enterprise monitoring and data collection system. 2400 nodes, 100,000 interfaces, 14,000 services. About 150,000 rrd files. 1 server (app, webapp, postgres db), 4G RAM, speedy disk. Sorry for the anonymity.

  • Robert Sander

    We are using Argus which works pretty well for us. And (still) mrtg for trend analysis.

  • Dave Atkins

    Like you, I've been doing sysadmin work for over a decade, but I never had time to really figure out nagios. We used SiteScope when it was inexpensive (1996), again when it was pricey (2002) but couldn't afford to put it on all systems, then replaced an early, buggy version of WhatsUp with KSHostmonitor (2003). In 2006, I saw WhatsUpGold had become impressive, but default configurations were unsophisticated and gave a false sense of accomplishment and security as being little more than ping monitors. Nagios give utlimate power to configure...but what if your job is also Engineering Manager, QA, Windows Admin, and blogger? Nagios requires a person who enjoys setting up those scripts and really understands them.
    I have been impressed by my current company's product because it is agentless and has pre-configured role profiles so, for example, you can just select the Microsoft SQL Server configuration and you get a solution that takes much more work in an open source environment. But yes, the product costs money so you have to be able to both justify the value of the software and value that your time is better spent on things other than writing monitoring scripts.