The capacity figure at the top of a storage quote is one of the most quietly misleading numbers in enterprise IT. It is almost never the amount of data you will actually be able to store. It is a marketing figure, built on an assumption about your data that the vendor has every reason to pitch high. Once you know how that number is assembled, you can read any storage quote honestly, and you can compare two of them on the same terms rather than on the vendor’s.
Three capacities, not one
Every array has three different capacity numbers, and quotes slide between them without telling you which one they are quoting.
Raw
Raw is the physical total. Add up every drive at its rated size and that is your raw capacity. It is the number you almost never actually get to use, because protection and overhead come off the top before a single byte of your data lands.
Usable
Usable is what is left after data protection, spare capacity and system overhead. The protection scheme matters a great deal here. A double parity or erasure coded layout, hot spares and formatting overhead can take a meaningful slice off the raw figure before you store anything. Usable is the honest floor, the capacity you are guaranteed regardless of what your data looks like.
Effective
Effective is usable multiplied by an assumed data reduction ratio, the combined effect of deduplication and compression. This is the number the quote leads with, because it is the biggest. The vendor takes your usable capacity, applies an assumption such as four to one, and prints the result in large type at the top of the page.
The assumption is where the risk hides
The problem is not that data reduction is fake. On the right data it is very real. The problem is that your reduction ratio depends entirely on your data, and the assumed ratio on the quote is an average drawn from the workloads that reduce best.
Some data barely reduces at all. Anything already compressed or encrypted, images, video, and databases using transparent encryption, will yield close to one to one. Other workloads reduce beautifully. Virtual desktops, where thousands of near identical images sit side by side, and verbose logs, can exceed the assumed ratio comfortably.
So a quote built on a four to one assumption, applied to a workload that genuinely reduces at two to one, delivers half the capacity you were shown. You did not buy what you thought you bought, and you find out months later when the array fills early and the top up quote arrives.
The guarantee programmes, and their fine print
Most vendors offer a data reduction guarantee, and on the surface it sounds like it removes the risk. Read the fine print before you lean on it.
The exclusions are the point. Pre compressed and encrypted data is almost always excluded from the guaranteed ratio, which carves out exactly the data most likely to underperform. You typically have to enable every reduction feature, even ones you might not want for performance reasons. The remedy, when the array misses the ratio, is usually additional drives shipped to you rather than money returned, so the vendor protects its story while you absorb the rack space and power. And claiming is rarely automatic, it needs a support case and a validation exercise. The guarantee is built to defend the headline number, not your budget.
The questions that pin it down
You do not need to be an engineer to read a storage quote honestly. You need five questions, asked plainly, with the answers in writing.
- What is the raw capacity, and what is the usable capacity after data protection, spares and overhead?
- What data reduction ratio is assumed in the effective figure on this quote?
- Is that ratio guaranteed, and precisely what data is excluded from the guarantee?
- What reduction have you actually measured on workloads like mine?
- If my real ratio comes in lower, what does it cost to add capacity later, and at what unit price?
The last question is the one that matters most commercially, because it exposes whether the cheap headline number is propped up by expensive top ups later. A low entry price with premium priced capacity on demand behind it is not a cheap deal, it is a deferred one.
Comparing two quotes fairly
This is where it pays off. Two arrays quoted at the same effective capacity can be wildly different deals. Normalise both back to usable capacity, apply a conservative reduction ratio based on your real data rather than the vendor’s assumption, and only then compare the price per genuinely usable terabyte. Treat the effective figure as the marketing line it is, and do your sums on the floor, not the ceiling.
None of this means data reduction is not worth having, or that the vendors are acting in bad faith. It means the quote is written to flatter the product, and the work of translating it back into capacity you can actually rely on falls to you. Do that translation before you sign, not after the array fills up.
If you want the wider commercial picture, how storage deals are constructed and where else the margin sits, our guide on buying enterprise storage covers it. And if you are weighing the underlying technology and what data reduction realistically delivers on modern flash, the all flash and NVMe economics guide goes deeper on the numbers.
Send us a storage quote and we will read it back to you honestly, raw, usable and effective, with the questions we would ask the vendor before you commit.