Color-Based Image Retrieval - Query by Example
Color-Based Image Retrieval - Query by Example
Theodoros Giannakopoulos
E-mail: tyiannak@di.uoa.gr
Website: www.di.uoa.gr/~tyiannak
Introduction
Content-based image retrieval is the task of searching images in databases by analyzing the image contents. In this demo, a simple image retrieval method is presented, based on the color distribution of the images. The user simply provides an "example" image and the search is based upon that example (query by image example). For this first version of the demo no relevance feedback is used.
Method Description
(A) Training
Almost 1000 images have been used for populating the database. For each image a 3-D histogram of it's HSV values is computed. At the end of the training stage, all 3D HSV histograms are stored in the same .mat file.
(B) Query
In order to retrieve M (user-defined) query results, the following steps are executed:
- The 3D (HSV) histogram of the query image is computed. Then, the number of bins in each direction (i.e., HSV space)is duplicated by means of interpolation.
- For each image i in the database:
- Load its histogram Hist(i).
- Use interpolation for duplicating the number of bins in each direction.
- For each 3-D hist bin, compute the distance (D) between the hist of the query image and the i-th database image.
- Keep only distances (D2) for which, the respective hist bins of the query image are larger than a predefined threshold T (let L2 the number of these distances).
- Use a 2nd threshold: find the distance (D3) values which are smaller than T2, and let L3 be the number of such values.
- The similarity measure is defined as: S(i) = L2 * average(D3) / (L3^2).
- Sort the similarity vector and prompt the user with the images that have the M smaller S values.
.
Provided Matlab files
getImageHists.m: Computes the (3D) HSV histogram of an image.
searchImageHist.m: This is the main m-file. It computes the histogram of the given image and then it returns the similar images based on the training data.
model1Hist.mat: This is the .mat files that contains the training data, i.e., the histograms of the almost 1000 image samples.
Also, in folder \images2 the thumbnails of the training images are stored. Finally, in the root folder 8 test query images are given.
.
Execution Example
Supose that we want to execute a query based on image 'redflower.jpg', and that we want 11 images to be returned:
>> searchImageHist('redflower.jpg', 'model1Hist', 11);
The execution contains two basic steps (as described above):
(a) First, the 3-D histogram of the query image is calculated. This may take almost 0.5 seconds for a 800x600 color image.
(b) When the histogram is calculated, the search algorithm described above is executed. During the searching step, for user interface reasons, some images (NOT all images of the database) are selected to be plotted, based on a simple thresholdin criterion (i.e., only images that correspond to a similarity measure smaller than a pre-defined threshold are presented - see Figure 1). When the searching is completed the 11 closest images are presented (Figure 2).

Figure 1: While the searching is being executed, some similar images (based on a pre-defined threshold) are presented.

Figure 2: When the process is competed, the query image, along with the (here, 11) closest images are presented.