Jump to: navigation, search

We are evaluating image classification ideas. We are starting with A.R.T. and fuzzy ART and smart ART. The code and ideas are being led by Nikhil Gupta.

We worked on evaluating the image classification ideas presented in the paper "SOVEREIGN: An autonomous neural system for incrementally learning planned action sequences to navigate towards a rewarded goal." by Grossberg et al (2008). The paper describes an unsupervised shape classification system based on some basic computer vision image preprocessing to get an input vector, which is fed into a Fuzzy Adaptive Resonance Theory (ART) classifier for learning shape categories.


[edit] Introduction

We describe the system as we understand it from the SOVEREIGN paper. We first talk about the type of input presented to the system and the image preprocessing performed before passing to the classifier.

The high level flow of the system is as follows:

  1. 2D images.
  1. Learned weight vectors for a Fuzzy ART classifier.
  2. Image classification results.
  1. For each image, preprocess it to get a an input vector.
  2. Feed the input vector into a Fuzzy ART classifier to get categorization result and learn the category weights.

[edit] Types of images presented

The images presented to the system are 2D views from a virtual environment. Each 2D view has 384 x 288 pixels resolution RGB image. The images have no noise, background is uniform blue and objects are uniformly yellow (no shading). There can be orientation, rotation and scale variances when the view is taken from different positions and orientations. Four shapes are presented to the system, and only one occurrence of one shape can be present in a view.

[edit] Image Preprocessing steps for Fuzzy ART classification

This is mostly from the SOVEREIGN paper -

  1. A 2D view (RGB color format) from the 3D visual front-end is input into the system for classification. Output of the system is a category number.
  2. Convert the 2D image to grayscale (YUV format) via linear combination of RGB values. Y stands for the luminance component (the brightness), the weighted values of R, G and B are added together to produce a single Y signal, representing the overall brightness, or luminance, of that spot -
    1. Y = R/2 + G/3 + B/6
    2. A unique grayscale value is guaranteed under these experimental conditions since uniform lighting ensures that each color will always have a unique luminance. For our implementation this value was 216.
  3. Figure ground separation - Keep only areas having the unique grayscale value corresponding to target objects (for our case this was 216 for the target's yellow colour). Continue only if the area of the visual target object in the resulting image map exceeds a minimum threshold. (The minimum area threshold is 25 pixels.)
  4. Edge detection - Perform edge detection via a 2D Laplacian-of-Gaussian filter. This step is implemented by performing a 2D convolution on the grayscale image using a 3 x 3 kernel. The variance parameter, σ = 0.1, is a constant which defines the sharpness of the filter.
  5. Find the centroid of the target object and center it by shifting horizontally and vertically.
  6. Apply a log-polar transformation.
  7. Find the centroid of the transformed image and center it by scaling and rotating. The resulting representation of the visual target object should be invariant to 2D variations in position, size and rotation. (The rotation invariance never gets checked in the original system, but we check for this invariance and write about it later).
  8. Update the Invariant Visual Target Map by coarse-coding the centered log-polar image. A coarse-coded image requires less memory, can help to correct 3D foreshortening effects, and can improve generalization when categorized.
    1. Coarse-coding is accomplished first by computing a 2D convolution of the image with a Gaussian kernel having standard deviation, σ = 12.
    2. The result is sampled on an evenly spaced grid, with a sample period of 24 pixels, yielding an Invariant Visual Target Map with 12 x 16 pixels (height x width) of resolution.
  9. Learn to categorize by unsupervised Fuzzy ART learning. The input to the Fuzzy ART module is a 192-element vector (12 x 16), which is then complement-coded. The resulting 384-element vector is categorized using the Fuzzy ART classifier.

[edit] Fuzzy ART - 101

Fuzzy Adaptive Resonance Theory classifier is the "fuzzy logic" version of the ART classifier. It was first introduced in this paper by Grossberg et al.

This classifier does complement coding of all the input presented to it. Input always ranges from values in range 0 to 1. Since it works on fuzzy logic, so a quick refresher on Fuzzy Logic might be helpful.

It is a two layer network, with the nodes in second layer representing different learned categories. The algorithm ensures that only one node in the second layer is ON for any given input. If the excitation level of this node is below a user defined threshold, a new second layer node (and weigths) are added to correspond to the current category of inputs.

[edit] Flow

A flowchart describing the Fuzzy ART algorithm (from this link) -

Fuzzy art flowchart.jpg

[edit] Algorithm

Fuzzy ART Algorithm (again from this link) -

Fuzzy art pseudocode.jpg

[edit] Implementation details

The system has been implemented in MATLAB R2007b.

The system can be run by executing the script "s_art_fuzzy.m", it classifies the images in 'images' directory. Output includes the categories given to the input images by Fuzzy ART classifier. The steps performed by the script are:

  1. Preprocess the images (finally gets a blurred log-polar transformed image).
  2. Coarse code these images.
  3. Run Fuzzy_ART classifier to assign categories.

We used the Fuzzy ART implementation available at MATLAB Central - here. Also, as log-polar transformation functionality is not built into MATLAB, we used a function for that available on Peter Kovesi's site.

The image preprocessing part of the system works as shown below -

Art image preprocess flow.png

[edit] Results and Analysis

We tested the capabilities of the system using a few other basic cases other than the plain yellow-blue images, as listed in the example images subsection below.

[edit] Examples of types of evaluation images used

In addition to 20 plain yellow-blue images of the four objects, the following types of test images will be used to test Fuzzy ART classifier:

Other test cases:

  1. Gradually changing shapes (like a Square deforming into a rectangle).
  2. Images with more than 1 objects together, non overlapping, overlapping.

[edit] Works well for

Plain images - Works okay with the simple yellow-blue images, classifying around 16 of the 20 objects correctly.

Position - As the image is centered after edge detection, the effect of variations in position of object within the scene is nullified.

Scaled objects - After polar transformation, variations in size of the object become displacement along the "radius" axis. Since the images are log-polar transformed and centered, the effect of scaling is nullified.

Scale 1.png

Scale 2.png

Certain variations in the background - During figure-ground separation, the system just picks the colours corresponding to yellow in the luminance channel of YUV (which is 216 grayscale in our case). Therefore, any variations in the background that are not of the same value in Y channel are discarded, and do not effect the results.

[edit] Performs badly for

Shaded images - During the figure-ground separation step of image preprocessing, we pick only the gray value corresponding to yellow originally. This leads to most of the shaded area to be discarded.


Images with noise - Noise added in the initial image persists as empty dots within the object after figure-ground separation. Edge detection marks these dots as small regions with edges, and this leads to noisy log-polar images.


Multiple objects - The system is not designed to handle such cases, so, when we have more than one objects in the image, they are detected as a completely new pattern.

Rotation - Polar transformation converts the rotation of an object in catersian space to a translation along the "angle" axis. Current system has no way of knowing what the actual position of an object should be, or how much it has been translated in polar co-ordinates.


[edit] SMART - Synchronous Matching Adaptive Resonance Theory

We wanted to compare the results RGC spiking simulator to the SMART simulator. The Synchronous Matching Adaptive Resonance Theory (SMART) network is based on a technical paper available here.

The relevant code for KInNeSS (KDE Integrated NeuroSimulation Software) has been provided by the authors here. We had some difficulties in installing this simulator, so we are providing the instructions for installation (so that others do not face similar problems).

  1. Instructions for installing KInNeSS on a Kubuntu machine -
    1. Install Kubuntu 8.04
    2. Download KInNeSS and SANNDRA from -
    3. Install the following packages -
      1. automake (gets rid of "WARNING: `aclocal-1.10' is missing on your system.")
      2. perl
      3. build-essential (gets rid of "configure: error: C compiler cannot create executables")
      4. kdebase-dev (for fixing "checking for X... configure: error: Can't find X libraries. Please check your installation and add the correct paths!")
    4. SANNDRA needs "unsermake" Install Python 2.4, this is an outdated automated make system, get the .deb package here -
      1. unsermake needs Python 2.4, so get it using apt-get
      2. make Python 2.4 default version of Python on your system by issuing this command - "update-alternatives --install /usr/bin/python python /usr/bin/python2.4 900"
      3. Install unsermake using - "dpkg -i <unsermake-debian-package-name>.deb"
    5. Finally install SANNDRA before KInNess, both can be installed as usual using -
    6. ./configure
    7. make
    8. sudo make install
  2. Congratulations, KInNeSS simulator is installed!
Personal tools