The future of storage technology

Date April 26, 2010

I went to the Tech Field Day in Boston a few weeks ago. I had a great time, hung out with some really smart people, and got to tour Fenway Park. The whole experience was incredible.

It was not a shopping trip, though. The equipment and technologies that I saw are bleeding edge. They're not what we're using this year, or next year, and for a lot of us, probably not within the next five years. That being said, the overarching views on the way that storage and enterprise networks will operate for the foreseeable future was right there in front of me.

If you want to become familiar with the next 5-10 years of IT, get used to the term "unified" and the newer buzzword, "federated". Everything, from a connectivity standpoint, is going to be unified. From a management standpoint, it's going to be federated.

Unification means that all of the connections to our machines will happen over one single fabric. In other words, your networking (currently Cat5(e)/6), your storage connectivity (maybe fiber, maybe Cat5, maybe Coax/SPF) will all be over the same (probably 10Gb) cables, and if Cisco's UCS (Unified Computing System (blog entry forthcoming)) is any indicator, we won't have one cable per physical computer, we're going to have several,but they'll be to the enclosure, and the enclosure itself will deal with presenting things to the machines.

To say that devices will be federated is to say that devices which are physically distinct will be unified through procedural or administrative functions. For instance, if you've got an active directory domain, you could conceivably say that your member machines have been federated, since you can essentially administer them through the same panel,and they can be subject to rules, groupings, and policies. It's not a new concept, but it is a new term for it. Expect to hear it a lot. Lots of people think that it's going to be the new phrase like "cloud" was/is.

To really see the kind of change that all of this entails in the storage world, we've got to examine the way things are right now. I apologize if this is remedial for anyone, but it's important to establish the current technologies, so we can fully appreciate what's going to happen. Your patience will be rewarded.

The type of storage that we are all most familiar with is probably Direct Attached Storage (DAS). This is the storage that is connected to your servers directly via one of a number of buses. It might be SATA cables to internal hard drives, USB cables to external drives, or maybe even several SCSI Ultra320 cables attached to an external array. The main consideration for this storage configuration is that there is no network fabric between your servers and your storage. This storage is always (to my knowledge, anyway) block-level. In other words, your host sees the storage as a block device, can use fdisk and fsck (or fdisk and format, for the Windows users out there).

Next up is Network Attached (or sometimes Addressable or Accessible) Storage. The defining factor of this storage type is that it uses the pre-existing network (typically TCP/IP based) to present storage to the host. The access is nearly always file level; that is to say that the machine addressing the storage is unaware of the actual filesystem that the data resides on. Only the files and their metadata are presented. CIFS (formerly known as Samba or Windows File Sharing) and NFS are NAS technologies. That means that your samba server technically acts as a NAS server.

The next level of technical sophistication is the Storage Area Network. This technology utilizes a network to present block-level devices to the target hosts. If your host is connected to a SAN, the parts that it can see can be utilized with fdisk and fsck (or again, format). Typically, specialized hardware known as a host bus adapter (HBA) is used to present the remote storage as a device, but many modern operating systems can emulate an HBA if they have an appropriate pre-existing network fabric, such as an ethernet card in the case of iSCSI.

Above and beyond a SAN, you can have multiple SANs, either in close proximity or separated by some distance, with varying levels of replication between them. Many SAN storage arrays include the ability to replicate block-level information between themselves and another storage array. Without this technology, the hosts themselves would be responsible for transmitting the data between storage environments. A inter-SAN relationship such as this improves the reliability of the overall storage network by reducing the margin of error in configuring hosts to replicate the data, and can sometimes take advantage of in-array technologies like data deduplication which are transparent to hosts.

Once the size of a SAN grows beyond a small number of arrays, it becomes unwieldy to administer the storage. Keeping track of what data exists where is troublesome and wastes time. A technology known as storage virtualization has been developed which constructs a layer of abstraction above the storage arrays. When configuring storage for a particular server, the administrator interfaces with this virtualization solution, and the product itself manages the underlying storage arrays. This provides a large degree of freedom from decision making for the administrator, who no longer worries (or cares) where the data actually resides. Because of performance requirements, data virtualization is typically limited to a close geographical area.

The most recent, and advanced, technology is federated storage. Rather than virtualizing storage in a layer of abstraction, federated storage arrays are all administered through the same interface. Individual storage arrays can be considered nodes within the federation, and an arbitrarily large number of nodes can be added. Through this method, storage networks can reach previously unprecedented sizes over an astonishingly large geographic distance.


As you can see, the trend in enterprise storage has been becoming increasingly complex, despite the decreasing complexity of management. An additional trend has been to "grow out" storage, rather than growing it up. What this means is that it is now simpler to add more storage in the form of additional arrays than it is to add disks to existing arrays. Previously, this would have increased the administrative workflow linearly as the number of units increased. Using federated storage removes this administrative growth rate, and allows the individuals to spend their time deciding how to divvy up storage, rather than deciding how to manage it.

Along the same vein, major efforts have been made by enterprise storage vendors to streamline all aspects of storage, and to increase the efficiency in a number of ways.

Chad Sakac, VP of the VMware Technology Alliance at EMC, presented a diagram very similar to the following when we visited their global headquarters:

This diagram gives a rough idea of efficiency of various technologies along three axes. If you start in the center, as you progress along one of the axes, the degree of efficiency improves (at least, according to EMC. I suppose they would know). Most of the terms on the diagram should be familiar now, but there are a couple that are important.

The first, "mega cache", is a ultra-fast storage device or segment that is used as cache for the array. Some are flash based, others are pure RAM based. The faster and larger, the better.

The other is very cool. "Auto tier" is a technology wherein the importance, or performance needs, of the data are maintained by the federated storage system, and the data itself is transitioned to an appropriate tier of storage. For instance, if you had old logs that didn't need the fastest available flash drives, they could instead be stored on an old SATA array with 7,200 RPM drives. This all happens automatically, behind the scenes.

While auto tier is currently the most efficient charted technique on the performance axis, it's in second place on the capacity axis behind data reduction. Overall, I suspect we can all agree that reducing our data footprint the best possible solution for lowering storage requirements.


If things end up going this route, then the future of storage technology is an interesting place. The techniques that are going to be in wide use soon are different in many ways than what has been attempted before. Although I won't be buying storage arrays with all (or probably any) of these features in the near future, it is inevitable that this technology won't trickle down to the arrays and servers that we buy in far less than a decade. By familiarizing yourself with the technology now, your learning curve will be much improved when the time comes for implementation.

For some other resources, here is Simon Lowe's Live Blog of the talk, and Devang Panchigar at StorageNerve has coverage with a great video of Chad giving his talk.

Thanks for your time.

Please comment below with any thoughts or questions. If I don't know the answer, I'm certain that another reader will.