80 Million Tiny Images: A Visual Dictionary

Each of the tiles in the mosaic is an arithmetic average of images relating to one of 53,464 nouns. The images for each word were obtained using Google's Image Search and other engines. A total of 7,527,697 images were used, each tile being the average of 140 images. The average reveals the dominant visual characteristics of each word. For some, the average turns out to be a recognizable image; for others the average is a colored blob. The list of nouns was obtained from Wordnet, a database compiled by lexicographers which records the semantic relationship between words. Using this database, we extract a tree-structured semantic hierarchy which we use to arrange tiles within the poster. We tessellate the poster using the hierarchy so that the proximity of two tiles is given by their semantic distance. Thus the poster explores the relationship between visual and semantic similarity. For a large part of our language the two are closely correlated as shown by the extent of visual clustering within the poster. The large-scale groupings correspond to broad categories such as plants or people. Within the plant cluster, for example, tighter semantic groupings are visible such as flowers or trees. In turn each of these clusters contains further groupings all the way down to individual, highly specific nouns.



