Monitoring Entropy? But…but…but…

So, I’m going through some of the current Big Brother / Hobbit / Xymon checks that we have throughout the infrastructure, I found something interesting. There exists the /var/lib/hobbit/client/ext/entropy check, which has this in its core:

     my $loop = 10
     my $delay = 2
     for (1 .. $loop) {
        sleep $delay;
        open F, '/proc/sys/kernel/random/entropy_avail' or exit;
        $entropy += ;
        chomp $entropy;
        close F;

So essentially, it pulls entropy out of the pool, waits 2 seconds, then does it again, and performs the loop 10 times.

I had my suspicions, so I ran a simple “cat /proc/sys/kernel/random/entropy_avail” a couple of times to see what happened:

  [email protected] ~/code/hobbit $ cat /proc/sys/kernel/random/entropy_avail 
  [email protected] ~/code/hobbit $ cat /proc/sys/kernel/random/entropy_avail 
  [email protected] ~/code/hobbit $ cat /proc/sys/kernel/random/entropy_avail 

This seems counter productive to me. Just saying. If you’re running this check, maybe you want to edit this a little bit to be less aggressive, if you *really* need to alert on entropy.

Related, but do you alert on entropy? Why?

  • It rather depends on the server and its purpose.

    A number of services / processes running on a server consume entropy through /dev/random, so if the entropy runs out on a server then the processes will often pause until the pool has begun to fill up again. Ideally processes should use /dev/urandom rather than /dev/random so that they’ll use entropy if available but at least have a graceful fallback.

  • D.F.

    We did this for a group of servers that were running some type of large Oracle transaction programs.

    A vendor that supplied the program suspected that some problems were related to a low entropy pool. It turns out it wasn’t, so it never progressed to the point where we alerted on it. I ended up writing a quick script that polled it periodically and measured the result with MRTG.

    As to agressiveness of the check here, again it depends on what’s going on. I’ve seen the entropy pool shrink by several thousand, grow by a couple hundred and bounce all over the place in the midst of heavy utilization. Typically that isn’t a problem, but you’d need to verify that with what your particular server is doing.

  • Kyle

    We monitor entropy on machines that terminate SSL, we discovered the pool was exhausted fairly frequently so we added some USB entropy keys.

  • Steve VanDevender

    /proc/sys/kernel/random/entropy_avail just tells you the value of the /dev/random entropy estimator; reading it does not consume entropy (but reading /dev/random would). You just happened to do your tests while something else was consuming entropy, and if you had kept looking eventually you should have seen the number increase as well as decrease.

    [email protected]:~# cat /proc/sys/kernel/random/entropy_avail
    [email protected]:~# cat /proc/sys/kernel/random/entropy_avail
    [email protected]:~# cat /proc/sys/kernel/random/entropy_avail
    [email protected]:~# cat /proc/sys/kernel/random/entropy_avail

  • Great post. Between the post and the comments I learned some new stuff. Just reiterates the truth of the saying “The only stupid question is the unasked question.”

  • Great post. We do monitor available entropy on our java cluster. It was first only monitored with munin but we found out that it wasn’t enough. We needed an alarm if we started to go out of entropy. We would need to do changes to the system if applications where hanging if no entropy was available. I cannot remember that the alarm has been triggered but it’s good to have it in case we run into a problem with it.

  • Everyone,

    Thanks for the comments setting me straight. I really appreciate it!

    Rolf: Have you ever thought about intentionally depleting the pool to see if the alarm works?

  • Ernie

    I can think of a few applications that require entropy. Gaming servers that require randomness for example. Even games like Call of Duty will do a coin toss on occasion when, for example, data comes in from two different clients at exactly the same time, and one one player has to lose the draw.

    Gambling is another example, but one you’re not likely to find a server for in the US. However, they tend to rely on outside sources of randomness, rather than using pseudo-random number generators that computers create.

    Encryption is another example, but we don’t usually generate a lot of encryption keys on a frequent basis, so it doesn’t get used *often*. But if you work at Verisign, you might want to be paged if your source of entropy was running low.