When we started ClearSky a couple of years ago, it seemed to the early team that we were focused on a particular set of challenges created by the growing adoption of cloud and hybrid cloud architectures: what to do about primary storage? We built a storage network that combines high performance and low latency with the ease of use, agility and scale of the cloud. In the early beta days, we focused on workloads that customers would trust us with: VMware, SQL and other back-office apps that were appropriately low-risk for a pre-GA service.
Over the past year, we have learned a great deal from customers and adapted our thinking to one focused question: WHY do customers really need the distributed caching, metro-based architecture that ClearSky delivers as a fully managed service? The answer is now much clearer: enterprise customers need the ClearSky storage network to solve a growing problem in their IT infrastructure – the out-of-control growth of machine data applications.
By 2020, 42 percent of all data will be machine generated, according to IDC. This data includes log data spilling out 24/7 from source systems across a business, applications, sensors, security systems, business processes, servers, storage, networks…it’s almost overwhelming to contemplate and IT is struggling with great difficulty to keep up.
The obvious solution for such a huge scaling challenge is to put the data in storage clouds like AWS, Azure or Google. This is the best way to move data off enterprise infrastructure, avoid constant overprovisioning and limit the need to build out more data center footprints at a time when IT is determined to shrink them. However, the reality of many of these machine data applications is that they are closely tied to the source systems they monitor and analyze, and they require pretty hefty processing capabilities and low latency for rapid analytics.
Take Splunk, for example. As a powerful tool for monitoring IT operations and analyzing security data, Splunk has gained a substantial customer base among medium and large enterprises, who rely on it to run their businesses and protect themselves from security threats. The machine-generated data that Splunk analyzes grows rapidly, often at a rate of terabytes per day. IT needs to move that data between hot, warm and cold tiers to keep Splunk running efficiently, and for every terabyte of daily raw machine data that it analyzes, Splunk requires up to 23 times that amount in tiered storage (if a CIO wants to store the data for longer than a week). Splunk indexing requires the high performance of flash or near-flash storage, and relies on low latency access to the source systems being analyzed.
Elasticsearch and the ELK stack (Elasticsearch + Logstash + Kibana) are open source alternatives for log analytics and search capabilities across a huge amount of machine-generated data. While the open source approach provides many benefits in terms of flexibility and reduced TCO, it doesn’t protect enterprise customers from investing heavily in deploying and scaling the underlying infrastructure. The ELK node structure can require complex and expensive storage build-outs to support deployment, and as data volumes grow, so must the number of nodes in an ELK cluster. For ELK users, an elastic, highly scalable storage solution is critical.
The bottom line is that most storage systems in use today weren’t designed to handle the volume and growth that stems from machine-generated data. A new approach is needed, and it must be designed specifically to address this huge and growing data challenge. ClearSky’s global storage network is built for machine data analytics. Our fully managed service extends the cloud into the data center and edge data hubs and combines on-prem performance, resiliency and security with cloud agility and economics. With the ClearSky service, enterprise IT can stop managing storage and focus entirely on getting the most value from machine data.