Compressing, entropy and information theory have things in common with economics. Nobody's going to turn his head unless you stop talking maths. I truly understand they are a bit boring and complex for your grandmother, but surprisingly most of the technical people doesn't bother to understand the underlying concept either. In this post, I'm going to pass over the following topics in a daily tongue to illustrate the overall scheme on your mind:
- What is image compression and why do we use it?
- A short brief of JPEG compression.
- A review of Google Maps and Live Search Maps for serving images as the primary content.
Introduction to Image Compression
If you understand the term "compression", it doesn't mean we are over with the definition. Hold on.
Image compression, the art and science of reducing the amount of data required to represent an image, is one of the most useful and commercially successful technologies in the field of digital image processing. (Digital Image Processing 3rd Ed., Gonzalez & Woods, page 525)
Let's first try to understand how compression became one of the most commercially successful field in image processing? With the irrepressible popularity of television and Internet (after mid-90s), images and videos become significant elements to represent information. With no compression; a coloured standard TV broadcast, 640x480 wide with refresh rate of 30 frames per seconds, requires 27,648,000 bytes to be transmitted per second. Even in tomorrow's technology, supplying a connection of almost 30Mbytes/second just for a TV cast doesn't seem to be possible with no doubt. It's no surprise to hear many failed quotes back from early 1900s that television will never be able to find opportunity to be on the market.
Data versus Information
When the issue is compression, it refers to the compression of data. Data is being transferred to carry information. Therefore, we might be able to reduce the amount of data to represent a given quantity of information. A parrot in a very populated barber shop in downtown loves to say "Hello" to every new customer that comes in. How would you transmit the words it spells in text most efficiently?
Statistically hello is the most common word. Representing it in a bit is highly acceptable, instead of transferring 5 characters (5*8 = 40 bits). In the example above, it's very clear that statistics say "stranger" is the second word we most likely to hear from parrot and so on. Converting (mapping) the string array into a bit stream saves 94% of bandwidth in this case. Huffman coding, which is going to remind you the method above, guarantees you to use minimum possible number of bits if you have statistical information of data.
Image Compression Techniques
Generally, image compression techniques are separated into two columns: lossy and lossless compression. Lossy methods takes the advantage of capabilities of human vision range, eliminates details and loose information to reduce amount of data. Lossy methods are mostly used for natural images. In lossless compression, encoding process finds a smart way to represent same amount of information in lesser amount of data, just like in the example above with Mr. Parrot. And some may use hybrid models to mix advantages of both sides.
JPEG is a hybrid compression method, optionally can perform as totally lossless. It mainly depends on the fact that correlation between close pixels doesn't change much in natural photos. So, we are able to guess closer pixels around if we know the one in the middle. JPEG's encoding algorithm is separated into four steps:
- Blocking the image into little pieces, usually 8x8 pixel blocks – some use 16x16 blocks. Separating images into little pieces will help us the guess the nearby pixels.
- Discrete Cosine Transform (DCT): In "simple" terms, this process generates a bit stream for each block which starts with the average value of the block continuing the details of how pixels change horizontally and vertically. So, as an addition to the guess, we have extra information to fix it. Therefore, DCT is totally lossless. It only represents the same information in other words. Outputs a stream of 64 numerical elements like: 125 34 –32 43 2 21 –43 1 –3 4... (Smart way to represent information)
- Quantization: Adjusts precision. This is where we loose information. Output stream may turn into: 125 35 –30 45 0 20 –45 –5 5... (Human vision range is not that sensitive)
- Entropy encoding: You may encode the stream coming from quantization with Huffman coding to save bandwidth – just like how we did in the parrot example. (Smart way to represent information)
Maps and Compression
Maps serve images as their primary content. Most of the providers have road, aerial, labelled road, labelled aerial, terrain (Google does), bird's eye (Live Search Maps does) and etc. In this section, I'm going to discuss mostly the aerial imagery and how many bytes they are spreading to share the same amount of information. Both of the providers have road, aerial and aerial (with labels) image sources.
Google Maps choose to separate vector and natural images. Encode vectors as transparent PNGs and aerial images as JPEGs. And adds vector information as an overlay if user prefer to view the area with labels.
Image sizes in the corresponding order are: 27.6KB (JPEG), 13.7KB (PNG), 0KB (overlaid versions of 1st and 2nd image), 25.6KB (PNG). In this case, let's count the bytes Google consumes on the following actions:
- Road imagery views: 25.6KB
- A switch to labelled aerial: 27.6KB + 13.7KB = 41.3KB
- A switch to aerial without labels: cost free
- Total cost = 25.6KB + 41.3KB = 66.9KB
Google only seems to make an encoding decision depending on the image'
s characteristics, JPEG for aerial, PNG for rest. But unfortunately, this leads to the transmission of repeated information of roads on an aerial switch. Take a look at image 2 and 4 above.
I'm guessing most people don't tend to switch to aerial soon until they zoom in to street level. So, instead of pushing them to download two separate images for on road mode, Google Maps may choose this method. HTTP requests are costly. And a dozen of them are more costly than costly. A map control usually has to download 8-12 tiles depending on your screen resolution. Just think of the overhead that those 12 tiles are generating.
We are on the same area, somewhere in the downtown London again with Live Search Maps.
Image sizes in the corresponding order are: 28.52KB (JPEG), 29.82KB (JPEG), 25.6KB (PNG). Repeating the same user actions above for a tile:
- Road imagery views: 25.6KB
- A switch to labelled aerial: 29.82KB
- A switch to aerial without labels: 28.52KB
- Total cost = 25.6KB + 29.82KB + 28.52KB = 83.94KB
If every user were switching between these map modes back and forth on every zoom level, Google Maps would beat LSM with little difference*. Labels on aerial images don't looks as sharp as Google's, but keeps size in an appropriate level although it is encoded to be a JPEG. If users switches to non-labelled aerial very rarely, for example after pointing an area they are searching, this method might be practical and more efficient than Google's overlaying concept.
Above, there are 200% zoomed versions of labelled aerial tiles from San Francisco. The left image comes from Virtual Earth and the other is Google's. In my own personal opinion I find Google's labels more readable than Virtual Earth's, despite being 22-years-old and topping the heap when it comes to sensitivity in vision.
(*) Note: I repeated the process for 100+ random tiles. Ratios don't seem to change much.
What about custom tile integration with Virtual Earth and Google Maps API? Google's method allows you to overlay roads on your custom maps (didn't check it, but it's not a big deal -- they already have isolated labels).