Introduction to LVM in Linux

I’m going to go over the Logical Volume Manager (LVM) today, because it plays an important part in next Fridays’ howto. I’m not going to let the cat out of the bag, but just know that it’s been a long time coming. There are lots of resources for the technical management of LVMs, and they cover the many uses and flags of the commands far better than I could here, so some resources will be listed after the entry. I’m just going to concentrate on the “why” of LVM. I’ve often found that learning why you want to do something is more difficult than the technical process by which you accomplish it.

First off, lets discuss life without LVM. Back in the bad old days, you had a hard drive. This hard drive could have partitions. You could install file systems on these partitions, and then use those filesystems. Uphill both ways. It looked a lot like this:

(click for full size)

You’ve got the actual drive, in this case sda. On that drive are two partitions, sda1 and sda2. There is also some unused free space. Each of the partitions has a filesystem on it, which is mounted. The actual filesystem type is arbitrary. You could call it ext3, reiserfs, or what have you. The important thing to note is that there is a direct one-to-one corrolation between disk partitions and possible file systems.

Lets add some logical volume management that recreates the exact same structure:

(click for full size)

Now, you see the same partitions, however there is a layer above the partitions called a “Volume Group”, literally a group of volumes, in this case disk partitions. It might be acceptible to think of this as a sort of virtual disk that you can partition up. Since we’re matching our previous configuration exactly, you don’t get to see the strengths of the system yet. You might notice that above the volume group, we have created logical volumes, which might be thought of as virtual partitions, and it is upon these that we build our file systems.

Lets see what happens when we add more than one physical volume:

(click for full size)

Here we have three physical disks, sda, sdb, and sdc. Each of the first two disks has one partition taking up the entire spae. The last, sdc, has one partition taking up half of the disk, with half remaining unpartitioned free space.

We can see the volume group above that which includes all of the currently available volumes. Here lies one of the biggest selling points. You can build a logical partition as big as the sum of your disks. In many ways, this is similar to how RAID level 0 works, except there’s no striping at all. Data is written for the most part linearly. If you need redundancy or the performance increases that RAID provides, make sure to put your logical volumes on top of the RAID arrays. RAID slices work exactly like physical disks here.

Now, we have this volume group which takes up 2 and 1/2 disks. It has been carved into two logical volumes, the first of which is larger than any one of the disks. The logical volumes don’t care how big the actual physical disks are, since all they see is that they’re carved out of myVolumeGroup01. This layer of abstraction is important, as we shall see.

What happens if we decide that we need the unused space, because we’ve added more users?

Normally we’d be in for some grief if we used the one to one mapping, but with logical volumes, here’s what we can do:

(click for full size)

Here we’ve taken the previously free space on /dev/sdc and created /dev/sdc2. Then we added that to the list of volumes that comprise myVolumeGroup01. Once that was done, we were free to expand either of the logical volumes as necessary. Since we added users, we grew myLogicalVolume2. At that point, as long as the filesystem /home supported it, we were free to grow it to fill the extra space. All because we abstracted our storage from the physical disks that it lives on.

Alright, that covers the basic why of Logical Volume Management. Since I’m sure you’re itching to learn more about how to prepare and build your own systems, here are some excellent resources to get you started:

As always, if you have any questions about the topic, please comment on the story, and I’ll try to help you out or point you in the right direction

Temporary Ignorance

It’s an interesting perspective, looking back and realizing how long you’ve spent to obtain the knowledge that you have. It’s hard for me, because I think about how much time I spent chasing the wrong angles, or looking in the wrong places.

I spent at least two months trying to get RedHat Cluster Suite running on Fedora Core 6. Heck, I spent months setting up Fedora Core 6, because “It’s what RedHat Enterprise Linux 5 is based on”. Nevermind that I didn’t understand that CentOS was a 1:1 mapping of RHEL, and that FC6 doesn’t have support any more. If only I’d have had that small, tiny bit of knowledge, I wouldn’t have all-but wasted months of my time.

Here’s another example. I’ve been working on trying to get my Dell blades to their maximum availability, going so far as to learn how to bond the devices, running them to individual switches, learning as I went, because there was no guide for what I was trying to do.

Except, of course, there is. I just didn’t find it until today:
Link Aggregation on the Dell PowerEdge 1855 Server Ethernet Switch[pdf]

Know how simple it would have been to do it right the first time if I had seen that? It’s so frustrating, but the one thing that keeps me from utter despair is that it seems to be universal. I might be wrong in this, but it seems that knowledge is expensive to get but cheap to have, if you know what I mean.

Sort of reminds me of the time my boss asked me what was left to complete a project. I said that almost everything was done, but I still had one task I hadn’t figured out (and I don’t remember right now what it was, but it wasn’t something amazingly complex) and he said, in an offhanded way, that it was trivial. I just looked at him and said “Yea, it’s trivial after I figure it out. Until then, I’d say it’s pretty important.” It seems like a lot of things are that way.

Slow speeds transferring data between my machines

Alright, I’m not sure what I’m missing, or where I’m not testing, so I’m appealing to the people who know more than me.

I’ve got a couple of machines. We’ll call them DB1 and DB2.

If I test the network connection between the two of them, it looks fine:

DB1 -> DB2
1024.75 Mbit/sec

DB2 -> DB1
895.13 Mbit/sec

When you convert those to Gb/s, I’m getting right around the theoretical max for the network.
(this was tested by using ttcp, by the way). So at least my cables aren’t broken.

Now, the next thing I thought of was that my disks were slow.

DB1 has an internal array. It’s fast enough:

[[email protected] ~]# dd if=/dev/zero of=/db/testout bs=1024 count=10000000
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 58.2554 seconds, 176 MB/s

DB2 is connected to the SAN, and is no slouch either:

[[email protected] ~]# dd if=/dev/zero of=/db/testout bs=1024 count=10000000
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 76.7791 seconds, 133 MB/s

When I read from the big array on DB1 to the mirrored disks, I get very fast speeds. Because my free space is small enough (< 4GB free) on my mirrored disks, I can't get a big enough file to make the transfer count. It reports 1,489.63Mb/s, which is baloney, but it lets me know that it's fast. Reading from the SAN to DB2’s local disks is, if not fast, passable
10240000000 bytes (10 GB) copied, 169.405 seconds, 60.4 MB/s
That works out 483.2Mb/s

Now, when I try to rsync from DB2 to DB1, I have issues. Big issues.

I tried to rsync across a 10GB file. Here were the results:

sent 10241250100 bytes received 42 bytes 10935664.86 bytes/sec
(10.93 MB/s or 87.49Mb/s)

Less than 100Mb/s.

I was alerted to this problem earlier, when it took all damned day to transfer my 330GB database image. Here’s the output from that ordeal:

sent 68605073418 bytes received 3998 bytes 2616367.39 bytes/se
(2.62 MB/s or 20.93Mb/s)

It only says 68GB because I used the -z flag on the rsync.

To prove that it isn’t some sort of bizarre combination of the SAN causing some problem when being read from rsync, here’s a mirrored-disk transfer from DB2 to the root partition on DB1:

sent 1024125084 bytes received 42 bytes 11192624.33 bytes/sec
(11.12MB/s or 89.54Mb/s)

I’m willing to say that maybe the network was congested earlier, or maybe the SAN was under stress, but on an otherwise unused network, I should be getting a lot damned more than 89Mb/s between two servers on the same Gb LAN.

Any ideas?

Figured it out. Stupid compression flag on rsync