My trouble with bonded interfaces

Date April 21, 2009

In an effort to improve the redundancy of our network, I have all of our blade servers configured to have bonded network interfaces. Bonding the interfaces in linux means that eth0 and eth1 form together like Voltron into bond0, an interface that can be "high availability", meaning if one physical port (or the device it is plugged into) dies, the other can take over.

Because I wanted to eliminate a single point of failure, I used two switches:

The switches are tied together to make sure traffic on one switch hits the other if necessary.

Here is my problem, though: I have had an array of interesting traffic patterns from my hosts. Some times they'll have occasional intermittent loss of connectivity, sometimes they'll have regular time periods of non-connectivity (both of which I've solved by changing the bonding method), and most recently, I've had the very irritating problem of a host connecting perfectly fine to anything on the local subnet, but remote traffic experiences heavy traffic loss. To fix the problem, all I have to do is unplug one of the network cables.

I've got the machine set up in bonding mode 0. According to the documents, mode 0 is:



Round-robin policy: Transmit packets in sequential
order from the first available slave through the
last. This mode provides load balancing and fault
tolerance.

It would be at least logical if I lost 50% of the packets. Two interfaces, one malfunctioning, half the packets. But no, it's more like 70% of the packets getting lost, and I haven't managed to figure it out yet.

If you check my twitter feed for yesterday, I was whining about forgetting a jacket. This is because I was hanging out in the colocation running tests. 'tcpdump' shows that the packets are actually being sent. Only occasional responses are received, though, unless the other host is local, in which case everything is fine.

There are several hosts configured identically to this one, however this is the only one displaying this issue. Normally I'd suspect the firewall, but there isn't anything in the configuration that would single out this machine, and the arp tables check out everywhere. I'm confused, but I haven't given up yet. I'll let you know if I figure it out, and in the mean time, if you've got suggestions, I'm open to them.

18 Responses to “My trouble with bonded interfaces”

  1. Jay said:

    Do the bonded interfaces share a MAC address, or does each interface use its own MAC? Either way there would be some flapping in either the switches' mac-address table (same MAC) or the firewall's arp cache (diff MACs).

    If possible, I would start with a sniffer capturing all VLAN traffic at the switch to see what's going on at layer 2 between the host and firewall.

    And, don't forget about netstat -i for interface errors.

  2. JeffHengesbach said:

    Love the Voltron reference! I use mode=1 (active-standby) with great success with separate switches. It truly is beautiful to power off an entire switch without a hitch in service.

    My growing pains were as follows:
    -Sometimes I needed to reboot the host when fiddling with different bonding modes. I could rmmod all day, but a reboot seemed best.
    -/proc/net/bonding/bondX has good info
    -Triple checking switchport settings: portfast, vlan, tagging and trunk/isl settings.

    In the end my hangup ended up being properly configured trunks and vlan tagging and ensuring the ports were in the correct vlans. If you've got properly working hosts hopefully these points will help guide your comparisions.

    Jeff

  3. JeffHengesbach said:

    Great resource for linux bonding.

  4. Matt said:

    @Jay

    The interface has a shared MAC, I believe that of the first interface (though ifconfig shows the same mac across the board at the moment).

    Excellent call on the netstat, but (un?)fortunately it isn't showing any errors on the physical or bonded interfaces. Good suggestion, though. I need to get my NIDS box up at some point, so I'll have that in place eventually, but getting the sniffer up on the network might be a later resort. I'm going to try to monitor the firewall traffic more finely first, and see what I can get from that. Thanks a lot for the suggestions! :-)

    @Jeff

    The more I've been reading, the more I've been thinking about mode 1. I really like the idea of increasing bandwidth, but HA is more important (and frankly, basic functionality would be nice at this point).

    I've noticed the same thing when it comes to removing the module. I haven't spent enough time tracking down the common denominators, but it seems more or less random as to when it will let me rmmod, so I just reboot to be sure.

    I've got to go back into the switch documentation to make sure that my ISL is setup correctly as such. I've actually got two VLANs, since each switch has an internal and DMZ component, and the ISL is trunking both of them. I think that's functioning normally, but I just want to check that what they think is normal jives with what I think is normal.

    Thanks again for the comments. I'll make sure to post when I get it figured out.

  5. Dan C said:

    If you're using balance-rr mode then the member ports should all be within a trunk/etherchannel group within the switch(es). Of course if you're using multiple switches this may be harder to realise. Without this, the switch will be dumbfounded as to why the same MAC is appearing across multiple ports. There are also some caveats about out-of-order packet delivery. Which, despite being the only mode to provide boosts to single streams, makes it unpopular.

    Although not as fancy, active-backup is far safer to use, if you are only after HA. Alternatively if your switches support it 802.3ad support is good nowadays.

  6. Dan C said:

    Oh, also..

    sysfs support is now good too. So if you're testing and don't mind bypassing your OS's RC support (which may or may not support the new sysfs interface) then just interface with /sys/class/net/bonding_masters and /sys/class/net/bondX/bonding/*. You can even build it static to your kernel.

    There's some more detailed information about how to use it in Documentation/networking/bonding.txt

  7. Matt said:

    @Dan

    You may have hit the head on it, but what confuses me is why only one host is seeing it, when multiple are in balance-rr and not having that problem. It doesn't make a lot of sense to me.

    Also, I'm going to have to learn more about sysfs before I play with it, but thanks for the heads up, and for all of the other suggestions. I really do appreciate it!

  8. Justin said:

    I'm sure I'm stating the obvious here, but you do have some form of Spanning Tree enabled on your switches? Sometimes the simplest answer is best ;-)

  9. Matt said:

    @Justin

    When dealing with multiple switches, spanning tree always needs to be mentioned :-) Sorry I left that out.

    STP is enabled on these switches, but I'm not worried about routing loops.

    I do wonder what the effect of STP would be if it saw the same MAC on the local port and the aggregated port, though

  10. Matt said:

    And by routing loops, I mean switching loops...

    and by STP is enabled, I mean "I'm an idiot, of course that's going to cause a problem"

    Right.

    Though I still don't get why just the one server at the moment...

  11. jawrat said:

    maybe this is too simple, but you might have a bad cable in there....stranger things have happened, and in a rack full of cables....one never knows.

  12. David Magda said:

    This won't work: both NICs need to connected to the same switch. If you have larger switch, with multiple blades, then each NIC can go to a different blade.

    Linux's bonding (aka link aggregation) is based on IEEE 802.3ad (as are most other OSes that do this):

    http://en.wikipedia.org/wiki/Link_aggregation

    If you want to go to different switches, you have to do things a Level 3 (IP) and not 2 (Ethernet):

    http://lartc.org/howto/

    Under Solaris this is called IP multipathing. Not sure what the nomenclature on Linux is.

  13. David Maga said:

    s/Level 2/Layer 2/

  14. David Magda said:

    Disregard my comment. It seems that Linux is using one term ("bonding") to describe multiple completely different things. Depending on which "mode" you define, it does different things on Layer 2 and/or Layer 3:

    http://www.linuxhorizon.ro/bonding.html

    Talk about causing unnecessary confusion; really dumb.

  15. Matt said:

    @David

    Don't worry about it at all. It seems like there have been so many different implementations of people trying to accomplish the same (or similar (or drastically different) ) things that there's no easy way to keep them straight.

    Everyone uses the same generic terms ("clustering", "high availability", "bonding", etc), don't feel dumb for mixing them up. Lord knows everyone else has, too.

  16. Stick said:

    I'm currently using mode 1 everywhere and I see similar issues. My network topology is very similar as well. What we found is that the switch 2 layers upstream (our BC's have an embedded switch) ends up caching the mac during a failover and unless it times out things disappear for a while. Failover works fine however. I'm moving everything over to mode 6 soon because in our testing since it uses arp negotiation it plays much better with all the switches in the equation.

  17. Henry said:

    So have your bonding issue being resolved yet? We are doing similar setup and we are also seeing weird network hiccups. Is is caused by the shared MAC?

  18. jaseywang said:

    We had the same issue with you. By using mode0 with two nics connected to two separate switch, we saw around 50% package loss. After some googling and testing, We found that, mode 0 is not suitable to connect to a separated switch, instead, the server's two nics should be connected to the same peer(single switch).
    So, if you want to connect to two separated switch, mode 6 is a much better choice.

    ref:
    https://www.kernel.org/doc/Documentation/networking/bonding.txt

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

*