Skip to content

Stanford University

Mobile Visual Search







Memory-Efficient Image Databases for Mobile Visual Search


Contributors:

Many mobile visual search (MVS) systems compare query images captured by the mobile device's camera against a database of labeled images to recognize objects seen in the device's viewfinder. Practical MVS systems require a fast response to provide an interactive and compelling user experience. Thus, the recognition pipeline must be extremely efficient and reliable. Congestion on a server or slow transmissions of the query data over a wireless network could severely degrade the user experience.

We show how a memory-efficient database stored entirely on a mobile device can enable on-device queries that achieve a fast response. The image signatures stored in the database must be compact to fit in the device's small memory capacity, capable of fast comparisons across a large database, and robust against large geometric and photometric visual distortions. We first develop two methods for efficiently compressing a database constructed from feature histograms. The popular vocabulary tree is included in this framework. Our methods reduce the database memory usage by 4-5x without any loss in matching accuracy and have fast decoding capabilities. Subsequently, we then develop a third database representation based on feature residuals that is even more compact. The residual-based database reduces memory usage by 12-14x, requires only a small codebook, and performs image matching directly in the compressed domain.

With our compact database stored on a mobile device, we have implemented a practical MVS system that can recognize media covers, book spines, outdoor landmarks, artwork, and video frames out of a large database in less than 1 second per query. Our system uses motion analysis on the device to automatically infer user interest, select high-quality query frames, and update the pose of recognized objects for accurate augmentation. We also demonstrate how a continuous stream of compact residual-based signatures enables a low bitrate query expansion onto a remote server when network conditions are favorable. The query expansion improves image matching during the current query and updates the local on-device database to benefit future queries.


Defense Presentation


Android Demo


References:
  1. D. Chen and B. Girod, "Memory-efficient image databases for mobile visual search", IEEE MultiMedia Magazine, Vol. 21, No. 1, January 2014. [Paper]
  2. D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B. Girod, "Residual enhanced visual vector as a compact signature for mobile visual search", Signal Processing, Vol. 93, No. 8, August 2013. [Paper]
  3. D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, H. Chen, R. Vedantham, R. Grzeszczuk, and B. Girod, "Residual enhanced visual vectors for on-device image matching", IEEE Asilomar Conference on Signals, Systems, and Computers, November 2011. [Paper]

Back to Top



Compressed Histogram of Gradients: A Low Bitrate Descriptor


Contributors:

For many mobile visual search applications, a query photo is taken by a mobile device and compared against a database on a remote server. The size of the data sent over the network needs to be as small as possible to reduce latency and improve user experience. We have studied descriptor compression techniques and shown that compressed descriptors can reduce query latency significantly in mobile image retrieval systems. Here, we make publicly available a Linux binary of a new low bitrate descriptor called Compressed Histogram of Gradients (CHoG) [1-3] with location coding [4].



Download Linux Binary


References:
  1. V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, Y. Reznik, R. Grzeszczuk, and B. Girod, "Compressed histogram of gradients: a low bitrate descriptor", International Journal on Computer Vision, Vol. 94, No. 5, May 2011. [Paper]
  2. V. Chandrasekhar, Y. Reznik, G. Takacs, D. Chen, S. Tsai, R. Grzeszczuk, and B. Girod, "Quantization schemes for low bitrate compressed histogram of gradients descriptors", IEEE International Workshop on Mobile Vision, June 2010. [Paper]
  3. V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, R. Grzeszczuk, and B. Girod, "CHoG: compressed histogram of gradients", IEEE International Conference on Computer Vision and Pattern Recognition, June 2009. [Paper]
  4. S. Tsai, D. Chen, G. Takacs, V. Chandrasekhar, J. P. Singh, and B. Girod, "Location coding for mobile image retrieval", International Mobile Multimedia Communications Conference, September 2009. [Paper]
  5. B. Girod, V. Chandrasekhar, R. Grzeszczuk, Y. A. Reznik, "Mobile Visual Search: Architectures, Technologies, and the Emerging MPEG Standard", IEEE MultiMedia, vol. 18, no. 3, pp. 86-94, July-September 2011. [Paper]

Back to Top



Query-by-Image Video Search


Contributors:

We demonstrate a novel multimedia system that continuously indexes videos and enables real-time search using images, with a broad range of potential applications. Television shows are recorded and indexed continuously, and iconic images from recent events are discovered automatically. Users can query an uploaded image or an image in the web. When a result is served, the user can play the video clip from the beginning or from the point in time where the retrieved image was found.




References:
  1. A. Araujo, D. Chen, P. Vajda, and B. Girod, "Real-time query-by-image video search system", ACM Multimedia (MM), November 2014. Accepted.
  2. A. Araujo, M. Makar, V. Chandrasekhar, D. Chen, S. Tsai, H. Chen, R. Angst, and B. Girod, "Efficient video search using image queries", IEEE International Conference on Image Processing (ICIP), October 2014. [Paper]



Analysis of Visual Similarity in News Videos


Contributors:

The memory-efficient image retrieval techniques we develop for Mobile Visual Search problems can also be applied to great effect in a different domain: news video analysis. We demonstrate that effectively exploiting the visual similarities pervasive in all news videos can enable the dynamic retrieval and mixing of small news video fragments. Two new algorithms are developed to accurately detect two important sources of visual similarity: (1) similar preview and story frames, and (2) repeated appearances of a news anchor. As a result, valuable sources of preview clips and informative clues about story boundaries are obtained from identification of these visual similarities. The retrieval engine implemented in both algorithms employs compact global image signatures and requires a small memory footprint, so that many instances of the detection algorithms can run concurrently on the same server for fast processing of a large collection of news videos. At the same time, the retrieval engine is robust to the large appearance variations encountered in the preview matching and anchor detection problems.






References:
  1. D. Chen, P. Vajda, S. Tsai, M. Daneshi, M. Yu, H. Chen, A. Araujo, and B. Girod, "Analysis of visual similarity in news videos with robust and memory-efficient image retrieval", IEEE Workshop on Media Fragment Creation and Remixing (MMIX), July 2013. [Paper]

Back to Top



Book Spine Recognition for Asset Tracking


Contributors:

We have implemented a mobile book spine recognition system. Each time the user snaps a photo of part of a bookshelf, our system automatically recognizes and localizes each spine in the photo, and the recognized books are added to the inventory database. When the recognition result is sent to the phone, the user can also select each spine that appears in the query photo. A selected spine has its boundary highlighted, its title displayed, and the corresponding front cover shown in the phone's viewfinder, enabling easy visualization of the new books added to the inventory. Additionally, we have prototyped the location tracker on a smartphone. Our system uses sensor readings from the WiFi sensor, accelerometer, and digital compass to determine the location where each photo is taken, and the location is appended to the spine identification in the inventory.


Photo Snapshot Mode


Augmented Reality Mode




References:
  1. S. Tsai, D. Chen, H. Chen, C.-H. Hsu, K.-H. Kim, J. P. Singh, and B. Girod, "Combining image and text features: a hybrid approach to mobile book spine recognition", ACM Multimedia (MM), November 2011. [Paper]
  2. D. Chen, S. Tsai, C.-H. Hsu, J. P. Singh, and B. Girod, "Mobile augmented reality for books on a shelf", IEEE Workshop on Visual Content Identification and Search (VCIDS), July 2011. [Paper]
  3. D. Chen, S. Tsai, C.-H. Hsu, K.-Y. Kim, J. P. Singh, and B. Girod, "Building book inventories using smartphones", ACM Multimedia (MM), October 2010. [Paper]
  4. D. Chen, S. Tsai, K.-Y. Kim, C.-H. Hsu, J. P. Singh, and B. Girod, "Low-cost asset tracking using location-aware camera phones", SPIE Workshop on Applications of Digital Image Processing (ADIP), August 2010. [Paper]

Back to Top



Recognizing Video at a Glance


Contributors:

In our mobile video retrieval system, (1) a user takes a picture with the mobile phone of a video playing on the TV screen at home, (2) our recognition system automatically performs a visual search based on the picture taken and identifies the video and temporal position within the video, and (3) the user can resume watching the video on the mobile device while traveling on the road, from the S.e temporal position when the picture was taken. No additional set-up box for the TV is needed; all the user does is download a software application on the mobile phone.




References:
  1. D. Chen, N.-M. Cheung, S. Tsai, V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B. Girod, "Dynamic selection of a feature-rich query frame for mobile video retrieval", IEEE International Conference on Image Processing (ICIP), September 2010. [Paper]

Back to Top



Streaming Mobile Augmented Reality for Media Cover Recognition


Contributors:

We have developed a real-time CD/DVD/book cover recognition system for mobile phones. The user can point the phone's camera at a media cover and see the object's identity in the viewfinder in about 1 second. The boundary of the object is also displayed for easy visibility against a cluttered background. Both the object's identity and the geometry are obtained from a server hosting a large database of media covers. As the user pans across a set of media covers, the system automatically recognizes new objects that come into view, without the user ever having to press a button. Since we employ state-of-the-art recognition algorithms, our system accurately recognizes objects in challenging environments and is robust against occlusions, clutter, geometric deformations, and photometric distortions.


Android


Symbian



References:
  1. D. Chen, S. Tsai, R. Vedantham, R. Grzeszczuk, and B. Girod, "Streaming mobile augmented reality on mobile phones", International Symposium on Mixed and Augmented Reality (ISMAR), October 2009. [Paper]

Back to Top



Photo Search for CD, DVD, Book Cover Recognition


Contributors:

Photo Search automatically recognizes CD/DVD/book covers with a mobile phone. A user simply snaps a picture on the mobile phone of a CD/DVD/book cover, and our recognition will quickly and reliably identify the product. Then, useful information about that product--such as prices from vendors, a summary of the contents, and music/video clips--is retrieved and shown on the phone. We have a large database of 1 million products in our image database. Photo Search uses state-of-the-art feature compression algorithms on the phone to reduce transmission latency and costs and state-of-the-art visual search techniques on the server to return a fast, accurate identification.





References:
  1. S. Tsai, D. Chen, V. Chandrasekhar, G. Takacs, N.-M. Cheung, R. Vedantham, R. Grzeszczuk, and B. Girod, "Mobile product recognition", ACM Multimedia (MM), October 2010. [Paper]
  2. S. Tsai, D. Chen, J. P. Singh, and B. Girod, "Image-based retrieval with a camera-phone", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2009. [Paper]
  3. S. Tsai, D. Chen, J. P. Singh, and B. Girod, "Rate-efficient, real-time CD cover recognition on a camera-phone", ACM Multimedia (MM), October 2008. [Paper]

Back to Top



Tree Histogram Coding for Low-Rate Mobile Image Search


Contributors:

We have also developed a coding scheme for large-scale image search, which significantly reduces bit rate compared to other state-of-the-art feature coding techniques. Previous image retrieval systems transmit compressed feature descriptors, which is well suited for pairwise image matching. For fast retrieval from large databases, however, scalable vocabulary trees are commonly employed. In our work, we demonstrate a rate-efficient codec designed for tree-based retrieval. By encoding a tree histogram, our codec can achieve a more than 5x rate reduction compared to sending compressed feature descriptors. By discarding the order amongst a list of features, histogram coding requires 1.5x lower rate than sending a tree node index for every feature. Probability models are developed for the tree histogram symbols to enable arithmetic coding. Recently, our codec has been integrated into a real-time system for CD/DVD cover recognition with a camera-phone.





References:
  1. D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, J. P. Singh, and B. Girod, "Tree histogram coding for mobile image matching", IEEE Data Compression Conference (DCC), March 2009. [Paper]
  2. V. Chandrasekhar, D. Chen, Z. Li, G. Takacs, S. Tsai, R. Grzeszczuk, and B. Girod, "Low-rate image retrieval with tree histogram coding", International Mobile Multimedia Communications Conference (MobiMedia), September 2009. [Paper]

Back to Top



Multiview Vocabulary Trees for Severe Perspective Queries


Contributors:

A vocabulary tree (VT) built from fronto-parallel database images is ineffective at classifying query images that suffer from perspective distortion. In this work, we propose an efficient server-side extension of the single-view VT to a set of multiview VTs that may be simultaneously employed for image classification. Our solution results in significantly better retrieval performance when perspective distortion is present. Multiview VTs are used in our real-time CD/DVD cover recognition demo.





References:
  1. D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, J. P. Singh, and B. Girod, "Robust image retrieval using multiview scalable vocabulary trees", SPIE Visual Communications and Image Processing (VCIP), January 2009. [Paper]

Back to Top