CS 180: Computer Vision and Computational Photography, Fall 2024

Project 1: Images of the Russian Empire in RGB

Zackary Oon



Overview

In this project, we are given a set of images of the Russian Empire. Each scene contains 3 images, each of which are grayscale, but are taken through a red, a green, and a blue filter. The task is to create a fully colorized (RGB) image from these inputs.


Approach

General Steps

To construct an RGB image from the 3 grayscale images (taken via the R, G, B color filters), I took the following steps:

  1. Break up the concatenated images (stored vertically in the order of B, G, R, from top to bottom).
  2. Crop the images in all directions (I did so by 10% on each side) to remove any padding that may interfere with the similarity metrics.
  3. Using the blue image as the baseline, align the red and green images to it. Discussion on alignment will be in the section below.
  4. Stack the aligned red, green, and blue images, in that order from top to bottom, to form the RGB image.

Alignment

When talking about alignment, imagine one image staying still, and the other image is sliding on top of it, to find the best alignment.There were two ways I aligned the image:

  1. A brute force alignment method that checks the displacements, from -15 to +15 pixels in both the x and y directions, to see which displacement aligned the images the best.
  2. Using an image pyramid, which downsamples the image by a factor of 2 each level. Images are recursively downsampled until they reach less than or equal to 500 pixels in both dimensions.

For the image pyramid, At every level, we'll use the brute force alignment subroutine in the first bullet point; however, we will only check displacements of -15 to +15 pixels at the smallest image level. For the remaining levels (popping up from the recursive stack), we will check displacements of -3 to +3 pixels in the x and y directions, based around the displacement returned from the previously executed recursive call (i.e. based around the coarser estimate obtained from the downsampled image).

Values and Metrics

Note that when checking alignments, rather than checking the pixel magnitudes, I instead used the edges in the images as the values to compare. This is due to the pixel values in corresponding positions in each channel (R, G, B) differing. This ensures that we match by the forms in the image. For example, the pixel value at position (0,0) in the red channel of the blue image will be the same as the pixel value at position (0,0) in the red channel of the red image.

I tried both Sobel edge detection and Canny edge detection — Canny edge detection performed better. Note that when performing edge detection, it is important to denoise the image by doing a bit of blurring first (I used Gaussian blur).

For the similarity metric, I tried L2 distance, Normalized Cross Correlation (NCC), and the Structural Similarity Index Measure (SSIM). L2 did the worst, while NCC and SSIM both did pretty well (and were similar between each other). SSIM took longer to run than NCC.

Problems

The pyramid was required to correctly align the larger .tif images efficiently, and I ran into a couple issues during implementation:
  1. First, when using cv.Canny(), it requires uint8 values. Casting with np.uint8 didn't work, while skimage.uitl.img_as_ubyte did. Using the former would lead to a blank image.
  2. Most egregiously, during the implementation of my pyramid, I had initially forgot to include the very last layer of the alignment (i.e. the original image didn't have an alignment step -- I only aligned up to the half-size image before that). Including the last, largest layer made the images align much better.

Results

Note that the displacements will be given as "Channel: [x, y]" Where "Channel" is either "Red" or "Green", x = the horizontal displacement, and y = the vertical displacement.
Side Trees, Green: [12, 37], Red:[37, 86] (My pick)
Emir, Green: [24, 49], Red:[40, 107]
Monastery, Green: [2, -3], Red:[9, 97]
Grass Water, Green: [-2, 43], Red:[9, 97] (My pick)
Church, Green: [4, 24], Red:[-4, 58]
Three Generations, Green: [19, 55], Red:[9, 111]
Melons, Green: [10, 80], Red:[15, 122] (Misaligned, see below)
Onion Church, Green: [24, 52], Red:[35, 107]
Tree Passage, Green: [-17, 28], Red:[-34, 65] (My pick)
Train, Green: [9, 43], Red:[29, 86]
Tobolsk, Green: [3, 3], Red:[3, 6]
Icon, Green: [16, 39], Red:[23, 90]
Cathedral, Green: [2, 5], Red:[3, 12]
Self Portrait, Green: [29, 77], Red:[71, 133] (Misaligned, see below)
Harvesters, Green: [18, 60], Red:[14, 124]
Sculpture, Green: [-10, 33], Red:[-27, 140]
Lady, Green: [10, 56], Red:[13, 120]

Explanations on Misalignment

There were two images that were noticeably misaligned. The image captioned "melons" and the image captioned "self portrait".

Upon further inspection, the potential reason why the both images failed is likely pretty similar. When looking at the edge detection results, we realize both images have highly repetitive patterns of edges (many melons and stones). Additionaly, for the melons, because it is dark inside the vendor's stall, not all the edges of the melons are picked up. The lack of contrast for the self portrait also makes it challenging to align.