Slow speeds transferring data between my machines

Date September 24, 2008

Alright, I'm not sure what I'm missing, or where I'm not testing, so I'm appealing to the people who know more than me.

I've got a couple of machines. We'll call them DB1 and DB2.

If I test the network connection between the two of them, it looks fine:

DB1 -> DB2
1024.75 Mbit/sec

DB2 -> DB1
895.13 Mbit/sec

When you convert those to Gb/s, I'm getting right around the theoretical max for the network.
(this was tested by using ttcp, by the way). So at least my cables aren't broken.

Now, the next thing I thought of was that my disks were slow.

DB1 has an internal array. It's fast enough:


[root@DB1 ~]# dd if=/dev/zero of=/db/testout bs=1024 count=10000000
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 58.2554 seconds, 176 MB/s

DB2 is connected to the SAN, and is no slouch either:

[root@DB2 ~]# dd if=/dev/zero of=/db/testout bs=1024 count=10000000
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 76.7791 seconds, 133 MB/s

When I read from the big array on DB1 to the mirrored disks, I get very fast speeds. Because my free space is small enough (< 4GB free) on my mirrored disks, I can't get a big enough file to make the transfer count. It reports 1,489.63Mb/s, which is baloney, but it lets me know that it's fast.

Reading from the SAN to DB2's local disks is, if not fast, passable
10240000000 bytes (10 GB) copied, 169.405 seconds, 60.4 MB/s
That works out 483.2Mb/s

Now, when I try to rsync from DB2 to DB1, I have issues. Big issues.

I tried to rsync across a 10GB file. Here were the results:

sent 10241250100 bytes received 42 bytes 10935664.86 bytes/sec
(10.93 MB/s or 87.49Mb/s)

Less than 100Mb/s.

I was alerted to this problem earlier, when it took all damned day to transfer my 330GB database image. Here's the output from that ordeal:

sent 68605073418 bytes received 3998 bytes 2616367.39 bytes/se
(2.62 MB/s or 20.93Mb/s)

It only says 68GB because I used the -z flag on the rsync.

To prove that it isn't some sort of bizarre combination of the SAN causing some problem when being read from rsync, here's a mirrored-disk transfer from DB2 to the root partition on DB1:

sent 1024125084 bytes received 42 bytes 11192624.33 bytes/sec
(11.12MB/s or 89.54Mb/s)

I'm willing to say that maybe the network was congested earlier, or maybe the SAN was under stress, but on an otherwise unused network, I should be getting a lot damned more than 89Mb/s between two servers on the same Gb LAN.

Any ideas?

[UPDATE]
Figured it out. Stupid compression flag on rsync

13 Responses to “Slow speeds transferring data between my machines”

  1. Ian said:

    What type of network gear are these servers on? Same physical switch or different ones?

  2. Matt said:

    @Ian

    Each machine has two interfaces, bonded into one virtual interface. The bonding mode is 5 (see this)

    There are two switches, called red and blue, from the color of the cables. One interface from each machine is going to each switch. The switches are standard Netgear 24port Gb. Two VLANs, but they won't enter into this, since they're port based, and all four connections are in the same, untagged vlan

  3. Jack said:

    Could it be the compression? I've heard that compression can work against you on a fast network. Did you check top during the transfer?

  4. Michael Janke said:

    TTcp checks out the network, but it doesn't write to disk.

    Native disk writes are OK, but not with rsync.

    It sounds like rsync is using a small block size to transfer data, or perhaps a small receive window. You'll only get new a Gig if you are streaming TCP. If you are ack'ing too often, I don't think you can fill a gig.

    Netstat might show the receive window.

    A dump of the packet size counters might be interesting.

  5. Kenny said:

    Shouldn't the speed of DB1 -> DB2 and DB2 -> DB1 be the same, or am I missing something?

  6. Frank said:

    I've had similar situations on my network when the switch port is set to 1gb full duplex, and the server nic is set to auto-negotiate. Both have to have the same speed and duplex setting for the connection to work properly.

  7. Ian said:

    Depending on the gear, as frank mentioned, you can have duplex problems. I'm not familiar with netgear, but Cisco equipment get get stupid with auto negotiation, especially when you're linking up to non cisco equipment.

    What's the subnet mask of the servers? How many devices on that vlan/network segment? I'm wondering if you're having broadcast traffic issues.

    If it's not that, what if you break the port channel groups and go with single links? I wonder if you'd actually see performance improvement gains.

  8. Matt said:

    @Jack

    I don't think it's compression, since the 2nd transfer I did came across at 10Mb/s and it wasn't compressed at all. Just zeros.

    @Michael
    You may be on to something. I'll do more investigating in that direction, thnaks!

    @Kenny
    Ideally yes, but it's possible that the cable isn't crimped exactly right, or something similar. They're both fast "enough" for the moment, and I know the bottleneck isn't the cables

    @Frank
    I'll verify the duplex on the links, but I'm fairly sure it's coming across right.

    @Ian
    The subnet mask is /24. If it weren't for the bonding, it would be a simple network arrangement. I'm wondering whether there's an issue with the bonding mechanism, and maybe the effect is only apparent on longer-lasting streams. That would explain why my 16MB ttcp test (which lasted less than a second) showed high, my mid-range test (10GB) showed slow, and my long-range test (68GB) showed ultra slow.

    I'll be doing some investigations. I'm also in the middle of shipping new switches up there. Instead of 24 port Netgear switches, I'm going to be using 3com Baseline 2948+'s

    Since my new switches have a lot more capabilities, I'm hoping that I can configure the aggregate ports between them to perform better and not have this issue. I may have to change the bonding mode too.

    I'm going to be on site next week, which will make things much easier to debug!

    Thanks everyone for your input! I really do appreciate the suggestions, and if you think of anything else, please let me know.

  9. M said:

    Your best bet is to enable jumbo frames. It seems like every time I have slow file transfers over a 1gig or higher network it disappears when enabling them. Best way to tell for sure is look at a graph of network traffic, if it's spikey, that is a sure sign that the nic is sending out all it can and waiting for acks back. Jumbo frames will fix that.

  10. steve said:

    Is rsync using ssh? If so, which cipher? You might try -c blowfish which is MUCH faster than the (normally default) 3des.

    Maybe don't use rsync -- how about netcat? that avoids the encryption altogether. If nothing else you could dd {data} | nc and measure the transfer rates to see if it's rsync or the network.

    Or maybe mount the volumes with NFS or SMB then rsync the local mounts rather than the SSH transport.

    Is CPU maxing out during the slow transfer?

  11. Bob said:

    Matt, are you using ssh/scp? Try looking at:

    http://www.psc.edu/networking/projects/hpn-ssh/

  12. Matt said:

    @Bob

    Yes, I am, and I didn't know about that. Thanks a bunch. That looks really interesting!

  13. StephanJade said:

    Nice article you got here. It would be great to read something more about this theme.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

*