distributed storage system -StorPool-blog

What is a distributed storage system and why is it important?

A new era started at the beginning of the XXI century – the Digital Era. The majority of things now become digital or heavily dependant on technology – starting with things like radio and TV, going through healthcare, even most of our memories. Between 1986 and 2007 the amount of data per person has been growing with 23% per year, as Computer World reports. As a result, there is a huge amount of digital data which is created daily and accumulates to unseen amounts.

Storing data has evolved during the years in order to accommodate the rising needs of companies and individuals. We are now reaching a tipping point at which the traditional approach to storage – the use of a stand-alone, specialized storage box – no longer works, for both technical and economic reasons. We need not just faster drives and networks, we need a new approach, a new concept of doing data storage. At present, the best approach to satisfying current demands for storing data seems to be distributed storage.

This concept has appeared in different forms and shapes through the years. And while there is no commonly-accepted definition of what distributed storage system is, we can summarize it as:

“Storing data on a multitude of standard servers, which behave as one storage system although data is distributed between these servers.”

A Distributed Storage System (DSS) is an advanced form of the “Software-Defined Storage” concept. It is like SDS 2.0 (excuse the buzz-word). Unlike old-fashioned SDS solutions:
– distributed storage systems can run compute workloads on the same physical servers. I.e. they can build efficient Hyper-Converged Infrastructure (HCI);
– DSS can scale-out, i.e. they make one shared storage system out of many, many nodes. Old-fashioned SDS solutions were scale-up systems, which formed 2 node clusters in an active-passive or mirrored configurations;
– DSS systems can achieve performance which is impossible for SDS 1.0 solutions. And this performance is achieved with extremely low usage of compute power (CPU & RAM). This is one of the reasoned why a DSS can run in a hyper-converged manner, unlike old-fashioned SDS solutions.
– Finally, the usability and functionality of a good distributed storage system are qualitatively different than using generation 1 SDS. To give it an analogy – SDS 1.0 has the usability of a button cell/mobile phone. DSS systems have the usability of a modern touch-screen smartphone.

A distributed storage system can relate to any of the 3 types of storage: block, file, and object. In the case of block-level storage systems “distributed data storage” typically relates to one storage system in a tight geographical area, usually located in one data center, since performance demands are very high. It is impossible to do a distributed storage system, delivering high performance over long distance, simply because the laws of physics do not allow it – it takes too much time to sync a system that is spread over 3 continents.

In the case of object-storage systems – they can be both in one location or more locations and here geographically a distributed storage system could work, as the requirements on performance are not as high as for block-level storage. File storage falls in between, depending on the workload the user of the system is running.

Why is the distributed storage system becoming so important?

The main reason is that the current approach to storage does not work anymore: it is not flexible enough, fast enough or the cost is prohibitively high. In many cases all at the same time. By design, a distributed storage system solves all of these issues at once.

Flexibility

Distributed storage systems use standard servers which are now powerful enough (in CPU, RAM and also network connectivity/interfaces), so they allow storage to become a software application just like databases, operating systems, virtualization, and all other applications. It no longer requires a specialized box, to handle just the storage function. Allowing a standard server to run storage, besides other applications is a major breakthrough – it means simplifying the IT stack and creating a single building block for the datacenter – just servers connected to a “flat” network. No more separate storage boxes. This allows scaling by adding more servers and thus increasing capacity and performance linearly. It also means you can have servers which are doubling as storage and compute nodes (converged/hyper-converged infrastructure), but also allows to keep compute or storage separate on different nodes as well.

Speed

If you look into a specialized storage array, you’ll find it is essentially a server – it has CPU, RAM, network interfaces and drives. However, this is a “locked” server which can only be used to do storage. In order to have a fast storage system, you need a high-end storage box, which comes at a very high cost. Also even today in most systems when you add more storage boxes to a storage system, this does not increase the performance of the entire system, as all the traffic goes through the “head node” or master server, which acts as management node. It becomes a bottleneck.

In a distributed storage system any server has CPU, RAM, drives and network interface and they all behave as one group. So any time you add a server you increase the total pool of resources and thus the speed of the entire system.

Cost

Let’s get to the bottom line: with distributed storage organizations are going to minimize the cost of their infrastructure by up to 90%! This is so because distributed storage is not about storage only anymore – it has a positive impact throughout the IT stack – it uses standard servers, drives, and network, which are less expensive. It converges storage and compute, thus increasing the utilization of these standard servers. Consequently less power, cooling, space, etc. are required in the data center. It is simpler to manage a distributed storage system, which means less staff would be required to run the IT infrastructure.

For example, Matt Ayres, CEO of service provider ToggleBox, explains that his company reached higher performance and decreased the total cost of ownership (TCO) after they turned to a distributed storage system.

Most companies who manage their own infrastructure are expected to be running their businesses on a distributed storage system in less than 3 years in order to stay competitive.

Distributed storage has already proven its value, still, there are companies who are hesitant to at least evaluate it. This comes as a surprise at the rule of thumb is that for any $1 spent on servers companies spend $5 on storage. And thus storage is the single most expensive piece in the datacenter. Slashing the cost of storage by up to 90% has a game-changing effect on the Total Cost of Infrastructure.

But what are late adopters going to do in a couple of years when their competitors have already streamlined their IT Infrastructure? Will they be able to catch up or will they get out of business? Time will show, but in technology as in life, the ones who embrace change and adapt are usually the ones who progress the fastest and survive.
During the last decades, storage has innovated steadily thanks to visionaries who have come up with ideas, such as the one for a distributed storage system. We should keep an eye on what is going on in the industry today in order to be prepared for what comes tomorrow. Because, as Robin Harris from StorageMojo puts it, storage is the “fundamental enabler of civilization”. “Writing (the first form of storage) enabled civilization. Digital storage enables digital civilization. Storage is worth doing well.” Harris concludes.

If you have any questions feel free to contact us at info@storpool.slm.dev

Share this Post

Related article

A new study shows that 63% of organizations will adopt distributed storage (SDS) by 2018

Leave a Reply

Your email address will not be published. Required fields are marked *