Posts by Drew Thaler
  1. RAM fill time ( Counting comments... )
  2. Know Your Units ( Counting comments... )
  3. Real Unusual I/O Slowdowns (Part 3) ( Counting comments... )
  4. Real Unusual I/O Slowdowns (Part 2) ( Counting comments... )
  5. Real Unusual I/O Slowdowns (Part 1) ( Counting comments... )
  6. Evolved compression in the x86 ISA ( Counting comments... )
Advocacy / Technology/ Code /

Let's talk about one of the fundamental things that you need to know when talking about data access.

Always, always, always know your units.

Binary vs Decimal

Computers have a natural affinity for binary units, or multiples of 1024.

Historically, people (like us!) who work with computers have borrowed the SI prefixes for multiples of 1000 (K-, M-, G-, etc) and abused them to mean the closest multiple of 1024. This tradition continues right up to the present day: "64 KB" in a computing context is quite naturally interpreted as 64 * 1024 = 65536 bytes.

However, in many contexts -- including data storage and bandwidth -- decimal units may be used. This is not the manufacturers trying to "trick" you, but rather it's the normal and unavoidable friction that results from the proper use of SI units as decimal values (in physics, etc) running up against the computer industry's abuse of them as binary values.

You need to be particularly careful when reading. If you accidentally read a decimal unit as a binary unit, you'll be in for a surprise when you've got less than you expected. This only gets worse as the sizes increase.

SI unit Pronounced Meaning ... But Sometimes Difference
k (K) kilo 1,000 1,024 2.4%
M mega 1,000,000 1,048,576 4.86%
G giga 1,000,000,000 1,073,741,824 7.37%
T tera 1,000,000,000,000 1,099,511,627,776 9.95%
P peta 1,000,000,000,000,000 1,125,899,906,842,624 12.6%

Since 1999, the proper, IEEE-approved way to write 1024 has been with the Ki- prefix, and its siblings Mi-, Gi-. These are deliberately unambiguous and clearly denote that you want a multiple of 1024 rather than a multiple of 1000.

Binary unit Unambiguous Meaning
Ki 10241
Mi 10242
Gi 10243
Ti 10244
Pi 10245

The NIST has an excellent page with more details. Go, read it! It's quick and to the point.

Unfortunately, the original SI prefixes will remain ambiguous until everybody stops abusing them to mean different things. It'll be a tough habit for us to break, though - especially when it's become so culturally ingrained. Still, you and I can do our part.

Resolving ambiguity when writing

I can't emphasize this enough: If you're writing and you need to use a value that uses a binary multiple, YOU SHOULD USE THE BINARY PREFIXES. It's not that hard: write "64 KiB" instead of "64 KB", and boom, you're done. All you need to do is practice it a bit and it'll become second nature.

Ah, but what if you're writing and you need to use decimal units? Unfortunately, if I really want to communicate that something can transfer 9 million bytes per second, that's a bit harder to write. I can't just say "9 MB/s" because you may assume I meant binary units but was just too dumb to write "9 MiB/s".

In those cases, I've found that the simplest way to disambiguate is to write both the decimal and binary values: "The speed is 9 MB/s = 8.58 MiB/s".

Resolving ambiguity when speaking

The IEC recommendation is to use special pronunciations with "-bi-" in the middle, as in: "kibibyte", "gibibyte", "mebibyte", etc.

Frankly, this suggestion is crap.

Nobody I know actually does this, and you're likely to confuse people and sound like a dork if you try. Go ahead, try it — say it out loud. My favorite ludicrous example is "gibibyte". Tongue twister, isn't it? Say it three times fast and you're well on your way to rubber baby buggy bumpers.

Instead, I prefer to add an explicit qualifier and say "binary megabyte" or "decimal megabyte". This is perfectly clear to every computer programmer I've ever spoken to, even those who aren't aware of the binary/decimal confusion problem. It works beautifully.

In spoken contexts (and only then) I'll happily shorten this to "megabyte" if I feel the context is clear — just as you might just say "John" if there's only one in the room, but "John Smith" or "John Thompson" if there is more than one John nearby.

Resolving ambiguity when reading or listening

This one ultimately depends on context.

Any discussion of computer memory or RAM will typically use binary units. If someone says "allocate 64K", it's probably safe to assume they mean 65536 bytes.

Discussions of disk capacities or data bandwidths are normally written with decimal units. I like to imagine that this is because cramming bits onto a disk or through a wire is a task laden with physics, so the units are naturally the standard SI units.

In the middle, you'll find a lot of confusion when you talk about transferring from disk (or network) to RAM, or vice versa. If a computer programmer is talking about how fast they can fill memory, they are probably thinking in binary units. If a filesystem or network person is talking about how fast an interface or device performs, they are probably thinking in decimal units. Be careful to use the right one!

What Interpretation Rewritten Without Ambiguity
1 Gb/s Ethernet decimal 1 Gb/s (953 Mib/s) Ethernet
8 GB RAM binary 8 GiB RAM
2 TB hard drive decimal 2 TB (1.81 TiB) hard drive
200 MB file size binary 200 MiB file size
1.5 Mb/s cable modem decimal 1.5 Mb/s (1.43 Mib/s) cable modem
6.9 MB/s 5x DVD speed decimal 6.9 MB/s (6.6 MiB/s) 5x DVD speed
16.6 MB/s 12x DVD speed decimal 16.6 MB/s (15.85 MiB/s) 12x DVD speed
9 MB/s 2x Blu-ray speed decimal 9 MB/s (8.58 MiB/s) 2x Blu-ray speed
9.4 GB DVD capacity decimal 9.4 GB (8.75 GiB) DVD capacity
50 GB Blu-ray capacity decimal 50 GB (46.56 GiB) Blu-ray capacity

What you can do

Start using binary units when writing, and, if necessary, qualify with "binary" or "decimal" when speaking. It's easy, it's safe, and it puts you into the cool club of kids who speak without ambiguity.

If you're reading or talking to someone who doesn't use the binary units, make sure you know the context.

Watch out for translation errors.

And, of course, suggest that the other person pick up the habit of using the binary units too. :-)