Reply to comment

CHI 2006: Games

I apologize for the delay in getting these account out and on the blog; the wireless connection at the conference has been less than idea. With any luck, I will have my notes refined an out by mid-next week.

The first session was entitled "Games," though it could be more accurately referred to "How Cernegie Mellon University is Exploiting Procrastination to Their Benefit". Essentially, CMU is using games to collect data that otherwise would be highly expensive to generate.

While computers are hailed at being much better than humans in terms of processing power and prowness, there are certain things that humans can do that computers can. For example, computer vision algorithms, though becoming more refined and powerful, cannot do a good job of distinguishing objects in images, especially in comparison to humans. In order to generate better computer vision algorithms, computers need to be trained on test data to refine the procedure. However, in order to get this data, you need to get labels for the pictures.

One possible way is to scour the web, much like Google Images, and associate keywords with images. However, this approach has severe limitiations. The results are usually based from the image name and the HTML text around the image. If you ever do a search for a common term, such as "dog", you'll probably get a page or two of good results, but after that the pictures no longer really look like dogs. In addition, there are many other pictures of actual dogs that will never be discovered by Google since the filename and the context could contain no reference to dogs. Alas, the best way to generate this data reuires humans taking the time to look at images and dissect them into their components. In order to generate enough data, they need to get a lot of people to spend a great deal of time labeling these images. This requires a lot of man hours, and can become prohibitly expensive quite quickly. How do you get a mass of people to work on this project for as little (a.k.a. free) as possible?

The fine folk at CMU realized something: in 2003, there was 9 billion hours of human solitare played. When compared to the 7 million human hours needed to construct the Empire State Building, we waste spend a lot of time on the computer playing games: at work, at school, and a home. What, if just like Mary Poppins, we turn this task into a game and invite people to come "play?" What you get is the ESP game.

The ESP game, an earilier product of CMU, was the first game used to collect this data to train computer vision algorithms. In the game, you are paired with another random player on the Internet; the objective of the game is for you and your partner to come up with the same word. You recieve points for every word that you match, and lose points for every round you pass. Seems impossible, right? We'll, to help the players out the game gives them an image from which they can base their words. From this, CMU can, given an image, have a database of words that describe it. How effective is it? CMU has been able to generate 15 million agreements from 75,000 players. Some players play in excess of 20 hours per week (not up there with WoW, but close.) Since players are paired randomly and anonymously, it's hard for people to collaberate and circumvent the game, forcing them to play it as CMU intended. In addition, since the game is points based with a leaderboard, people are motivated to be accurate with their descriptions of pictures.


While this method is effective in determining what is in the picture, it's not that good in determining where or in what content these labels are in the image, which limits the value of this data. For that reason, the authors of paper used the success and results of the ESP game to create Peekaboom.

Like ESP, this game involves two randomly assigned players who work together to get through as many images as possible. In this game, however, only one of the players (the "peek") gets to see the image; the other player (the "boom") only sees a black screen. The peek is given a word (which was collected from the ESP game) and must use the image to convey that word to the boom. When the boom left-clicks on the screen, a small area around that click is uncovered, revealing part of the picture. In addition, the peek can "ping" an image and convey a certain point/area to the boom (indicated on the boom's screen by a pink pulsating dot.) Finally, the peek can give one of four clues, "noun," "related noun," "verb," and "text".

In the game, not only are you given points for correct guesses, but also for giving hints. While this seems counterintuative, the researchers want to know the context of the label in relationship to the image, so by offering more points for determining this information, they encourage more players to do this.

Over time, the researchers collect enough information to be able to create a bounding box where that label refers to in the image. In addition, they can collect enough bounding boxes to identify many of the relevant features of the image (see paper for examples.)


The authors of this paper argue that, while computer can do very well with dealing with facts, that computers lack common-sense knowledge about the world (though, I feel that there are is a significant portion of the population that also lack this knowledge.) For example, if I enter into Google that "My cat is sick," I should get something about vets; however, I will probably get something else. The solution, as it seems, is to use humans to generate this information, and, once again, CMU has turned to games to provide this solution.

The inspiration for this game came from the game Taboo. In the game, a narrator must convey a word or phrase to a team without using one of five "taboo" words given to the narrator. Usually, the narrator must rely on common-sense knowledge to get the team to guess the word (though, there are definitely other ways of doing this.)

Again, two players are randomly assigned to one another. Each play rotates being the "narrator." The narrator is given a word and must use a selection of sentence models with specific spots blank where he/she must provide the missing keywords. For example, if the word was "turkey," the narrator might use the model "It's a type of ______" and fill in the blank with "bird". Again, points are awarded for each correct match.

Unlike the earlier games, this open model could provide disagreement in terms of whether or not a user-provided phrase accurately describes a word. For example, it might be up for debate whether or not the phrase "This is a type of god" applies to the word "Bhudda," but the researchers stated that 85% of the phrases are rated good. Phrases are validated by having a single player being paired with a bot, and the bot uses user-submitted phrases to see how effective they are (as determined by the success of the human player in guessing the word.)


The last game aimed to provide a detailed descriptive paragraph for each image. The motivation behind this project was to make the web more accessible to visually impaired people by providing more than just an "alt" tag to replace each image. To be brief (since it follows from the other games quite similarly...heck, all three presentations had the exact same first author, though he couldn't come because he forgot to get a visa to travel to Canada,) one player is given an image and must write a paragrah to describe the image so that the other player can select that image from a collection of similar images. I had issues with "how descriptive" they could get with these images (since you are timed pressured and discouraged from doing more than what is necessary to fully distinguish the image,) but I applaud the effort.

I have to mention that their was another presentation that didn't quite fit in this category: DRUID is a 2.5D drawing program, where illustrators can easily draw figures that cannot easily be defined by layers (e.g. figures that intersect and are both in front and behind the other image...think the interlocking Olympic rings.) Though it had nothing to do with games, it was pretty cool in how quick it was able to determine all of the loops and generate the resulting image.

Next time: a "critique" of the design of the XBOX 360...let the ranting begin!


  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Images can be added to this post.

More information about formatting options