Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



70 Commits

Repository files navigation

Test Collection for Image Search Result Diversity in Flickr

To create this test collection the following process was followed. First, a set of 20 ambiguous keyword queries were identified. Every query was manually annotated with the possible set of interpretations (categories). The keyword queries and their possible interpretations is shown in Table 1. Next, the Flickr APIs were used to fetch the images and the related metadata for each query. This was done by utilizing the API. This API takes as input a query string and performs a free text search on Flickr images. Images who's title, description or tags matches the query terms are returned. Different Flickr APIs are then used to retrieve the relevant metadata for each image in the result set. Note while crawling images without tag or Flickr Group/photoset information were discarded. The number of images crawled for each query is shown under column titled # of images in Table 1. For each query a resulting json file containing the crawled metadata is created. The schema of this json file is shown Figure 1.

Three human evaluators were then asked to label the resulting data set. Each human labeler was shown images from the query result set and was asked to label it with one of the categories associated with that query. The user was allowed to add additional categories if the situation required. Any image that was judged irrelevant or whose category label was not evident was assigned to the other category. Finally, ground truth category was determined by using a majority voting scheme. This labeling task was performed on all the 20 queries. The result of this categorization is stored in a separate json file the schema for which is shown in Figure 2.

Note: For every query two JSON files are created:
This file contains all the images related to the keyword query and all the relevant social tags associated with the images in the result set
This file contains the result of the categorization/labeling
Both these files are stored in folders named after the query number (first column of `Table 1`).

Project Structure

├── queries
|	|
|	|──
|	|
|   ├──
|   |
|   ├── <query_name>
|   |	├── query_data.json
|   |	└── query_result_categorization.json
|   |
|	|──
|	|

JSON Schema for query_data.json

Figure 1

@object(2) {											\\ top-level object
	"about": @object(2) {								\\ about object
		"query": "string",								\\ query name
		"date": "string",								\\ date when the query was crawled
		"photos_license_creativecommons": "string" 		\\ number of photos with some creative commons license
	"data": @array [									\\ data object, list of objects for each photo
		@object(11) {									\\ object for an photo
			"id": "string",								\\ photo ID in flickr
			"photo_physical": "string",					\\ url of photo in jpg format of size medium 640
			"photo_web_url": "string",					\\ url of photo page in flickr
			"photo_tags": @array [						\\ list of tags of the photo in flickr
			"photo_owner": @object(3) {					\\ details of the owner of the photo
				"owner": "string",
				"username": "string",
				"realname": "string"
			"photo_metadata": @object(2) {				\\ metadata of photo in flickr
				"title": "string",
				"description": "string"
			"owner_groups": @array [					\\ list of groups the owner is a member of
				@object(2) {
					"nsid": "string",
					"name": "string"
			"photoset": @array [						\\ list of visible photosets to which the photo belongs to
				@object(4) {
					"id": "string",						\\ ID of photoset
					"title": "string",					\\ title of photoset
					"photoset_web_url": "string",		\\ url of photoset in flickr
					"top_tags": @array [				\\ most frequent tags from first 50 photos in photoset (atmost 20)
			"photogroup": @array [						\\ list of visible flickr photogroups to which the photo belongs to
				@object(3) {
					"id": "string",						\\ ID of photogroup
					"title": "string",					\\ title of photogroup
					"photogroup_web_url": "string"		\\ url of photogroup in flickr
			"photo_comment": @array [					\\ list of comments for the photo
				@object(3) {
					"id": "string",						\\ ID of comment
					"author_id": "string",				\\ ID of author of the comment
					"date": "string"					\\ timestamp when comment was posted
			"photo_favourite": @array [					\\ list of people who have favorited the photo
				@object(3) {
					"nsid": "string",					\\ ID of user
					"username": "string",				\\ username of user
					"date": "string"					\\ timestamp of the action
			"license": @object(2) {						\\ license associated with the photo
				"name": "string",						\\ type of license
				"url": "string"							\\ details of the license

Note: The user IDs, usernames and realnames have been anonymized to USER_<number>. A particular user is referred with same anonymized name across the whole dataset.

JSON Schema for query_result_categorization.json

Figure 2

@object(2) {											\\ top-level object
	"about": @object(1) {								\\ about object
		"query": "string"								\\ query name
	"categorization": @array [							\\ categorization object
		@object(2) {									\\ object for a category
			"name": "string",							\\ category name
			"images": @array [							\\ list of photo IDs in the category

Query Statistics

Table 1

	<!-- <table>
			<th>Category name</th>
			<th># of images</th>
	<a href="" title="" target="_blank"></a>
# Query # of images Categories
1 argos 434
Category name # of images
Argos ancient city in Greece 32
Argo Racing Car 23
Argo train network 31
Argo Gold Mine and Mill 34
Argo (oceanography) 14
UK retailer 126
ben affleck movie 17
football team 19
oil company 9
others 129
2 cardinal 425
Category name # of images
cardinal flower 5
beetle 7
cardinal football team (NFL) - american football 14
official of the catholic church or church related 15
bird 341
professional baseball team 1
others 42
3 eagle 469
Category name # of images
eagle scout 2
traffic signal system 6
military aircraft 11
galaxy/star system 11
Sports team 3
Rock band 2
Automobile 4
bird 289
others 141
4 greyhound 415
Category name # of images
Greyhound Tank LEGO 5
Greyhound train 5
The Grumman Greyhound (cargo aircraft) 3
Greyhound long distance bus service and terminals 279
dog breed 79
others 44
5 indian 375
Category name # of images
Indian Gap School Texas 13
Indian Paintbrush Plant 4
Indian Summer Festival 12
bird specie 34
native american indian (red indian) 37
motorcyle brand 62
india related dance, food etc 130
others 83
6 jaguar 429
Category name # of images
animal 136
car 271
military aircraft 3
others 19
7 java 438
Category name # of images
Insects at Java indonesia island 28
Java plum 11
Java Sparrow 42
java Indonesia island 160
coffee 29
Java (including javascript) programming language related 8
others 160
8 kings 414
Category name # of images
Places having name - kings (general) 39
King-Vulture 7
King penguins 10
Tank / Lego Tank 7
King Abdulaziz Endowment 6
King American Ambulance company 9
King County, Washington 6
Kings of Leon Rock band 19
The Red Devils - blues band 27
London Kings Cross railway station 22
King Cobra 12
royal family (general) 51
kings college 32
others 167
9 oasis 426
Category name # of images
Oasis Monorail 6
Oasis of the Seas (Cruise Ship) 155
airlines 6
desert oasis (vegetation in desert) 91
rock band 95
others 73
10 queen 442
Category name # of images
Flowers 11
Places having name - queen (general) 71
Queen Anne plant 12
Queen Butterfly 21
Queens of Noise - Musical Album - American rock band The Runaways 20
Queens New York City 22
Queens of the Stone Age - an American rock band 10
Queen Farida 8
Queenadreena - an English alternative rock 4
queen monarch (UK) 57
queen bee 9
ships 88
rock band 6
others 103
11 saturn 422
Category name # of images
Saturn V Spacecraft 16
ship 7
Sega Saturn Video Game Console 2
god : roman mythology 4
car 105
planet Saturn 244
others 44
12 scorpion 462
Category name # of images
scorpion fly 39
Vehicle or Vessel 24
Heavy metal band 131
Fish 11
Animal 146
others 111
13 seal 410
Category name # of images
Seal (musician) 8
Seal Rocks 30
Seal Beach Pier 11
seal (emblem) 12
navy seal (special ops unit) 15
animal 296
others 38
14 spider 412
Category name # of images
Spider Lily and Plants 8
spider's web 35
movie character 4
automobile 4
insect 325
others 36
15 triumph 392
Category name # of images
Triumph Fashion Show 6
Cars 182
Motorcycles 189
others 15
16 giant 408
Category name # of images
Giant Redwood 6
Giant Statues/Sculptures/Structures (general) 24
Giant Food (supermarket chain) 10
Giant Pacific octopus 5
Giant Cuttlefish 5
Giant anteater 9
Camera Obscura (San Francisco, California) 3
Giant Squid 3
The Giant’s Causeway (Ireland) 39
giant swallowtail - butterfly specie 28
giant bike company 21
giant panda 16
giant schnauzer (dog breed) 42
baseball team 7
others 190
17 tesla 387
Category name # of images
Nikola Tesla (Scientist) 8
Tesla Coil 32
Tesla Motors 266
american hard rock band 26
others 55
18 tiger 417
Category name # of images
Tiger Stadium 4
Common Tiger (butterfly) 11
Oncilla (tiger cat) 14
tiger bettle (insect) 5
tiger lilly (flower) 7
buses 6
tiger shark 17
tiger airlines 1
military tank 8
golfer tiger-woods 10
animal 289
others 45
19 prince 424
Category name # of images
Prince Poppycock (singer) 31
The Prince of Wales 11
automobile 5
fictional/movie/animation character 17
place/city/street (location) e.g. prince ontario 101
prince (musician) 48
UK Royal Family 108
others 103
20 wilson 430
Category name # of images
Mount Wilson 15
Wilsons Promontory National Park 33
Wilson's Storm 8
Wilson's Snipe 47
Wilson's Phalarope 27
Wilson's Plover 18
Robert J. Wilson (Theater Director) 3
geographical location (mountain, city view etc) 11
president of USA 49
bird 47
others 172


Building a Test Collection for Image Search Result Diversity in Flickr







No releases published


No packages published