Alright, so you’ve got your rasterized image, and now you want to save it to a file. You want a single file to contain all the data necessary for viewing this image, so that all a person needs is the single image file and an image viewer to look at your image. How do you save the data?

Grayscale

Grayscale images are a bit simpler than color images so we’ll start there.

Number of bytes to store the image

For the moment, let’s consider only the number of bytes that would be required for the pixels in a grayscale image, without thinking about the storage layout or the metadata. Recall that typically for a grayscale image, there’s one value for each pixel. Commonly, an 8 bit value (= 1 byte) is used for a value, allowing the pixel intensity value to range between 0 and 255. A rectangular image that’s W pixels wide and H pixels in height would have WxH total pixels. For example, if your image is 100 px wide & 200 px tall, there would be 100 x 200 = 20,000 pixels total. If each pixel can be represented by 1 byte, storing this image uncompressed would require 20,000 bytes.

Metadata

So you have your 20,000 bytes representing an image that’s 100px wide x 200px high image. Or was that 200px wide x 100px tall? Or 50px wide and 400px tall? How would an image viewer know how to display the 20,000 image bytes using only the image file?

The image file must contain data about the image, which is called metadata. The metadata typically contains information such as width, height, the component colors (aka channels or separations), and much more. This metadata is stored in the image file in a section that’s separate from the image data. Different containers such as JPG, GIF, TIFF, or PNG each have different ways of storing the metadata, and each container can store different metadata. When an image viewer detects a container, it must know where and how the metadata is stored in that container, and how to read the required metadata.

Image metadata is topic worthy of a series of articles. For an idea of the scope of the topic, the TIFF tags reference (metadata in TIFF can be stored as tags) describes hundreds of well-defined tags which describe everything from the basics like width and height all the way to niche information like the host computer used during image creation. And that is only for a single image container, TIFF!

For this article, it’s enough to know that there’s metadata in the image file that describes the image data.

Which byte represents a particular pixel?

So we’ve got our 20,000 bytes of data, and we know how wide and tall the image is. But which of the 20,000 bytes represents the top left pixel? In general, which byte represents the pixel at position (x, y)?

The answer to that question lies in the storage order of the pixels. Let’s assume that the pixels are stored in row-major order – in other words, all the pixels of a row N are stored before all the elements of row N + 1. Below is an example 2×2 image containing 4 pixels. They are labeled pixel(0,0) through pixel(1,1).

pixel(0,0)pixel(0,1)
pixel(1,0)pixel(1,1)

A file saved in row-major storage would contain the pixels in the order below. Note that all the elements of the 1st row appear before all the elements of the 2nd row:

pixel(0,0)pixel(0,1)pixel(1,0)pixel(1,1)

Pixels may also be stored in column-major order, but this is very unusual in the commonly used image layouts and I won’t describe that in this article.

So if we get the image’s height and width from metadata, and we assume that it’s in row-major order with the first data value holding the top left pixel, we can re-assemble this simple image using only the image file.

Color RGB images

Color images differ from grayscale images by containing effectively one image per color. Common color schemes are RGB (ideal for screens) or CMYK (ideal for print). We’ll now consider how to store RGB images.

Number of bytes to store an RGB image

Let’s consider only the number of bytes that would be required for the pixels in an RGB image, without thinking about the storage layout or the metadata.

A color pixel can be represented using a value for each of the component colors. In an RGB pixel, there is one value for each of the red, green, and blue colors. Commonly, 8 bit (=1 byte) values are used for each of the 3 colors. Thus, 3 bytes are used for each RGB color pixel.

A rectangular image that’s W pixels wide and H pixels in height would have WxH total pixels. For example, if your image is 100 px wide & 200 px tall, there would be 100 x 200 = 20,000 pixels total. Thus, for any given image dimensions, the number of pixels is the same regardless of whether the color scheme is grayscale, RGB, or some other color scheme like CMYK. However, while each pixel in a grayscale image commonly requires 1 byte, each pixel in an RGB image commonly requires 3 bytes. Therefore, storing this image uncompressed would require 3 * 100 * 200 = 60,000 bytes.

Which bytes represent a particular pixel?

In contrast to grayscale images where 1 byte represents each pixel value, an RGB pixel commonly requires 3 bytes. Given this requirements, which bytes in an image file’s image data represent a given pixel?

Chances are that you’ve already thought of at least one of the two common methods:

  1. Interleaved: Store all the bytes for pixel N before storing the bytes for pixel N + 1
  2. Planar: Store all the bytes for color C before storing the bytes for color C + 1

Interleaved

In the interleaved storage layout, the bytes for a pixel are contiguous. In the 2×2 image below, each pixel has RGB components.

pixel(0,0)pixel(0,1)
pixel(1,0)pixel(1,1)

This data would be stored in interleaved, row-major layout like in the diagram below. The top row indicates the 0-based index of each byte. For compactness, the RGB components of pixel(x, y) are shown as xy.R, xy.G, or xy.B, respectively.

01234567891011
00.R00.G00.B01.R01.G01.B10.R10.G10.B11.R11.G11.B

As you can see, the RGB components for pixel(0, 0) are contiguous. Furthermore, each pixel is still stored in row-major order. This is called interleaved because the data for the colors are mixed in a repeating pattern.

A notable property for this layout is that the bytes for the colors in the pixel at (x, y) can be found using simple formulas. Given that

* y = 0-based row index
* x = 0-based column index
* w = image width

We can find the red, green, and blue byte positions using the formulas below.

red   byte position for pixel(x, y) = y * w * 3 + x * 3 + 0
green byte position for pixel(x, y) = y * w * 3 + x * 3 + 1
blue byte position for pixel(x, y) = y * w * 3 + x * 3 + 2

Planar

In the planar storage layout, the bytes for a color, rather than a pixel, are contiguous. For an example we can look at our usual 2×2 image, repeated below.

pixel(0,0)pixel(0,1)
pixel(1,0)pixel(1,1)

This data would be stored in planar, row-major layout like in the diagram below. The top row indicates the 0-based index of each byte. For compactness, the RGB components of pixel(x, y) are shown as xy.R, xy.G, or xy.B, respectively.

01234567891011
00.R01.R10.R11.R00.G01.G10.G11.G00.B01.B10.B11.B

As you can see, all the red bytes are stored contiguously in the first 4 elements of the array, all the green bytes are stored contiguously in the next 4 elements, and all the blue bytes are stored contiguously in the last 4 elements. Finally, within each color, the bytes are stored in row-major order.

This layout is called planar because each color’s data is stored contiguously, like a planes of color data that can stacked to create the final image.

The image below shows the same data as above, with the addition of the top row, which indicates the 0-based index of each byte.

Like the interleaved layout, a notable property for this layout is that the bytes for the colors in the pixel at (x, y) can be found using simple formulas. Given that

* y = 0-based row index
* x = 0-based column index
* w = image width

We can find the red, green, and blue byte positions using the formulas below.

red   byte position for pixel(x, y) = 0 * w * h + y * w + x
green byte position for pixel(x, y) = 1 * w * h + y * w + x 
blue  byte position for pixel(x, y) = 2 * w * h + y * w + x 

When to use it: interleaved vs planar

Modern computers are generally designed to optimize for the common behavior that programs tend to access memory with addresses that are close together. This is known as the principle of locality.

This is relevant when deciding what memory layout to use for images. The different layouts store the same data. In interleaved, the bytes for a pixel are located close to each other in memory, while in planar, the bytes for a color are located close to each other. Therefore, it’s easy to conclude that if the program needs to do a lot of processing for each pixel, interleaved is probably going to be faster. If a program needs to do a lot of processing per color channel, planar is probably going to be faster.

Performance in an application’s important algorithms is typically the main reason to choose planar vs. interleaved layout.

Some examples of pixel-level processing tasks include on-screen display or color conversions.

Some examples of color channel-level processing tasks include blurring, sharpening, and simple edge detection.

There are also operations that are more or less layout-independent. These types of operations process per sample (an element of a pixel) rather than per-pixel or per-channel region. Some examples of this kind of processing include noise addition, quantization, and clamping (clipping).

Sometimes dramatic runtime improvements can be obtained by performing an up-front conversion from one layout to the other. Stay tuned for concrete examples of this!

Summary

Interleaved and planar layouts are two different ways of organizing the same data. When the data is packed with no padding, they consume the same amount of memory. Regardless of the layout, there are simple formulas that can be used to obtain the samples associated with a pixel.

There may be significant performance advantages to using interleaved or planar layout, due to the principle of locality. When performing pixel-level operations such as color conversion, interleaved layout may produce a lower runtime. When performing channel-level operations such as blurring, planar layout may produce a lower runtime.

So with my new job at Digimarc, you have surely noticed that I haven’t been writing new posts!  However, I’m learning a lot and decided that it’s time to give something back.  I’ll start with the image processing basics that I’ve acquired.  This post is about a common way that  still images are represented by computers: rasterized.

Vector vs. Rasterized images

There are two basic types of still images commonly used today: vector and rasterized images.  This post is about raster or rasterized images.

Raster images are basically made of contiguous blocks of colors. These blocks are called picture elements, or “pixels” for short. Pixels in an image are usually considered to be square-shaped. They are usually arranged in uniform columns and rows. Thus, a picture is a bit like a wall of square legos.

Is that an eye?

When the pixels are sufficiently small, or when they are viewed from sufficiently far away, our human visual system cannot distinguish individual pixels and they blend together into what appears to be a smooth, continuous image.

It’s Lena’s left eye! Lena is a commonly-used example of a grayscale raster image

By using colored blocks and a sufficient number of pixels, raster images can represent complex imagery such as people, nature, drawn art, and more. In fact, you are probably already familiar with the versatility of raster images – all modern digital photography produces raster images!

We can better understand the details of raster graphics by ignoring color for the moment and considering black & white, or grayscale, images.

Grayscale

In a grayscale image, each pixel is composed of a single number that represents the amount of black (or white) that appears in that pixel.  The number is sometimes referred to as the intensity, and it can be helpful to think of the intensity as a percentage between 0% and 100%.

Zoomed in view of 3×3 image

The image above is a zoomed-in view of a image that’s 3 pixels high and 3 pixels wide. This is typically referred to as a 3×3 image (width x height). In this image, all the pixels are either black or white – none are gray.

One way to think of this image is white paper in a dark room. We can illuminate each pixel with white light. Dim white illumination on a pixel would turn it gray. Bright white illumination on the pixel would turn it white. We can quantify the brightness of the illumination as the intensity, where 0% = no light and 100% = bright white light.

This scheme, where increasing intensity produces lighter colors, is known as additive. The image below has been annotated with the additive intensities for each pixel.

3×3 image, with borders between pixels, and annotated with additive intensities for each pixel

Though many interesting images can be created using only black and white pixels that are uniformly square and located on a uniform grid, more realistic detail can be represented when shades of gray are allowed as well.

In the image of Lena’s eye above, 82% points to a near-white pixel, 14% points to a near-black pixel, and 40% points to a mid-tone pixel. So, using varying intensities of black and white between 0% and 100% allows us to clearly represent this eye, which can be discerned even when highly zoomed in.

Color

In a grayscale image, the image is made of pixels that vary in lightness between white and black. Each pixel has a single intensity: the amount of white illumination on a black pixel.

Rather than varying intensity between white and black, which produces shades of gray, we could instead vary between white and a different color. For example, varying intensity between white and red produces shades of pink.

Red-Green-Blue (RGB) images

Full color images are created by combining multiple single color images. When multiple single color images are combined into a full color image, each color is called a separation or channel. In a full color image, each pixel has multiple intensities – one for each channel. The pixel’s color is the combination of each channel’s color at that pixel.

Full color Lena eye

The full color eye above is composed of a red channel, green channel, and blue channel, which are shown below.

Red channel from the full color Lena eye. Note that this is different from a red colorized version of the grayscale image.
Green channel from the full color Lena eye
Blue channel from the full color Lena eye

One aspect of this image that’s immediately noticeable is that each channel is relatively dark on average, yet the full color image is relatively light. That occurs because in the Red-Green-Blue (RGB) scheme, colors are additive – as intensity increases, the amount of light increases, making the image brighter.

The RGB scheme can be interpreted as the red, green, and blue components of a white light that illuminate a white paper in a dark room. For example, if the blue channel’s intensity is 0% for a pixel, it means none of the blue component of white light is illuminating that pixel. If the blue channel’s intensity is 10% for a pixel, it means a little bit of the blue component of white light is illuminating that pixel. If the blue channel’s intensity is 100% for a pixel, it means all of the blue component of white light is illuminating that pixel.

A pixel with intensities 50% red, 50% blue, and 0% green would have a purplish color. Can you think of how to create a white, gray, or yellow pixel?

This website shows how different RGB intensity percentages can create different colors. The table also includes another component, alpha, which is the transparency of the color. 0% is completely opaque, while 100% is completely transparent. Transparency allows background color to mix into the foreground color.

RGB is a common scheme to use because computer screens very commonly are manufactured to emit red, green, and blue light, which combine to create millions of colors. By using RGB data to store image data, the image data translates directly to the display: each pixel on the display just emits red, green, and blue light at the intensity stored in the image’s pixel.

Conclusion

That’s the absolute basics on raster still images! Raster images represent images by describing a uniform grid made of tiny squares called pixels. Each pixel has a color, which is described by one or more intensities.

Raster images are optimal for complex static imagery, such as photography. The pixels allow for arbitrary variation and irregular shapes, which appear throughout our natural world.

Questions? Leave a comment in the comment section below.

I’m researching image processing techniques for my new job.  I’m finding lots of things that I never took the time to understand, even if I had encountered them.  One of them is color maps.  Color maps are ways to convert a set of scalar values into colors.  They can be used to visualize non-visual data, or enhance visual data.

Continue reading