"We're not talking about truth, we're talking about something that seems like truth – the truth we want to exist."
– Stephen Colbert
One of the hardest parts of being in the technology business is overcoming preconceived notions that “feel” too good to not be true. In a marketing-driven world, it becomes too easy to lose track of the reality of everything from the limitations of technology to the realities of physics. Sometimes, it becomes too much work to fact-check ourselves, and we go with our gut when making buying decisions – only to find our feelings were just plain wrong.
I keep thinking about this sense of reality, or truthiness, whenever I have made the distinction between caching and tiering in technical conversations over the past few months. We are at a moment where auto-tiering technology has been in the marketplace for a long time and it has become an accepted concept. In this vein, when someone says that they “tier to the cloud” or “tier to archive,” I feel I have to stop and explain what that really means, and why a lot of people have been disappointed by tiering technology.
In data storage, a tier is a permanent, protected home for a piece of data. It is really expensive and complex to have multiple copies of the same data on different tiers in a system, so when the system decides certain data needs to be in a faster tier for performance reasons, it needs to move it. To do that, it needs to move some other data to a slower tier to make room. A system does this while servicing all of the data requests from users and their servers. It’s a lot of work and it takes up a lot of bandwidth in the system. In fact, this churn can double or triple the workload that needs to be handled. To prevent this from being a problem, data moves across tiers very deliberately, or sometimes not at all.
The end result is that data gets served out of low-performance tiers for a long time, offering no benefit to the user. Worse, it’s possible to have cold data sitting in expensive, high-performance storage because it’s too hard to move it away. There are many storage administrators out there who have used tiering systems and become frustrated by the physics of moving data. I have talked to many of them, and I keep explaining to them why something that sounds so good on paper can work so poorly in practice: Even in an integrated system, the bandwidth of busses and disk drives is finite and easy to overrun, never mind your internet connection to the cloud. One must never assume that resources are free.
At ClearSky Data, our global storage network does not tier data. The core of our technology is based on the idea that even the fastest pipes can be oversubscribed. Hence, we have built our service on advanced caching technology and optimization in order to minimize churn as data usage evolves over its lifecycle. To our customers, this means better performance and a happier overall experience. More importantly, it results in more efficient utilization of expensive resources, which positively lowers the cost of operating data storage with our service.
Learn more about making the most of your storage resources.