How much storage do we have and who's using it?
Friday, January 6, 2017
This is a simple question that is often asked within a typical enterprise. The answer to this simple question is more elusive than you may think. Why not just ask your storage administrators how much storage you have and who’s using it? For starters, they don’t know and it’s not their job to know. A good SA knows the language of storage: (LUNS, QTrees, FA Ports, SAN, NAS, RAID, HBA) not how the business is structured and the nuances of cost accounting. They know how to configure storage, allocate storage, replicate storage and migrate storage. They sometimes know what type of storage a particular group of servers may need. Database servers vs. a web servers may require different tiers of storage, but beyond that, there is very little awareness as to how this correlates to the business.
Let’s start by understanding the genesis of this 2-part question: “How much storage do we have?” and, “Who’s using it?”
Who asks this question?
Typically, folks within the company that are in charge of managing the budget and spending the money. This question usually comes up when a request comes in for yet another purchase of yet another really expensive storage array from a favorite storage vendor.
Why do they ask this?
Because storage is expensive. But wait, I thought storage was around 3 cents per GB? That is the cost of the raw capacity of the individual hard disk drive, not the cost of usable capacity.
How expensive is expensive?
According to Forrester Research, $3 to $5 per usable GB. However, when you store 100 TB of data internally, you have to buy much more than that amount. For protection, firms typically hold three copies of their data: primary, local replica, and remote or backup copy. This number can be higher or lower based on availability and regulatory and other requirements, but three is a reasonably conservative copy count. There is also the human cost of managing and protecting the storage; a good SA is not cheap and it usually takes a team of them to manage a typical enterprise. The annual fully loaded cost of 1 PB of enterprise class storage is somewhere around $8 Million dollars.
What is gained by knowing the answer?
Saving lots of money. Visibility and awareness is the first step toward affecting behavior. Data is constantly growing and nobody wants to delete anything. Understanding the true cost of storage, how much the company has invested and who’s using it may impact how it’s used.
Let’s dissect the question:
To answer this question, here’s what needs to go on behind the scene. At a high level you have to figure out what you have for storage, what hosts are attached to that storage and which line of business owns those hosts.
Three primary system code functions are required in any storage management system to fully understand storage usage; 1. data gathering, 2. persisting and 3. Correlation. All three aspects are difficult, but correlation is where most home-grown solutions fail.
In addition, knowledge of the complex environment detail is critical to understanding usage and availability. What kind of physical storage do you have i.e. Flash, Spinning Disk, Tape? What kind of usable storage do you have: Block, File, Object? What is creating the usable storage: Traditional Disk Arrays from EMC, HDS, IBM HP and NetApp, or JBOD with OpenStack or ATMOS, or is your environment using hyperconverged appliances from Nutanix, Tintri and Nimble?
Caveat: Storage pools and Thin Provisioning can make it seem like you have more storage than you actually do! So do you account for what you really have or what you think you have?
Enter: “What is used capacity”.
Used capacity. Used capacity can only be seen from the perspective of the host. The array knows what it has allocated out to the host but the host only knows what it has really used. Caveat: De-duplication. What a host thinks it is really using may not be what it is really using because the data may be de-duplicated down-stream. So, if you save your favorite presentation and you email it to a colleague and they save the same presentation, and the copies are de-duplicated so that only one of them is actually stored, how do you know whether it is your copy or your colleague’s copy that is saved? And more over, who should be charged to store the presentation? The answer is; you don’t know, and it’s impossible to tell.
Caveat: Clustered Servers double count.
Caveat: What is used by the database is not really what is used.
Enter: “Databases are full of air”
Auto-tiering. Many arrays have more than one disk type within their frame. Typically there are 3 types of storage SSD (fast and expensive), FC (fast but less expensive), and SATA (not as fast but cheap). The concept of auto-tiering means that the array controller will detect the read/write patterns and automatically move the data between the expensive Tier 1 SSD’s and the cheaper Tier 3 SATA drives. How do you know if your presentation is on cheap or expensive disk? The answer is; you don’t and it’s impossible to tell.
The Two-headed Capacity Monster. As the density of disks increase, performance now comes into play. Imagine there are 4 servers and each one needs 200GB. There was a time when you would need 4 physical disk drives to do this with each one capable of providing the required IO to handle each server. Now, one physical drive has enough “capacity” to give each server 200GB plus enough to left over to give many other servers 200GB. The problem is, that single spindle can’t handle the IO of multiple servers all trying to access it at the same time - so performance tanks. And now you have plenty of capacity but not IO capacity.
Here are additional questions that we’ll address in future posts… stay tuned:
So, who uses the company’ storage and why does it matter?
What happens when you suddenly run out of capacity without any warning? Surprise! Another storage related outage.
- What impact would a storage outage have on your business?
- Is there value in knowing what applications would be affected if a particular frame or storage pool suddenly fills up?
- Is there value in knowing who is consuming the most storage?
- Is there value in knowing what your usage rate is?
- Is there value in knowing exactly when you will run out?
Are you now wondering if you have backups of your data?
Do you have backup software? Is it integrated with your storage reporting software?