To create this test collection the following process was followed. First, a set of 20 ambiguous keyword queries were
identified. Every query was manually annotated with the possible set of interpretations (categories). The keyword queries
and their possible interpretations is shown in Table 1
. Next, the Flickr APIs were used to fetch the images and the
related metadata for each query. This was done by utilizing the flickr.photos.search
API. This API takes as input a query
string and performs a free text search on Flickr images. Images who's title, description or tags matches the query terms are
returned. Different Flickr APIs are then used to retrieve the relevant metadata for each image in the result set. Note
while crawling images without tag or Flickr Group/photoset information were discarded. The number of images crawled for
each query is shown under column titled # of images
in Table 1
. For each query a resulting json file containing the
crawled metadata is created. The schema of this json file is shown Figure 1
.
Three human evaluators were then asked to label the resulting data set. Each human labeler was shown images from the query
result set and was asked to label it with one of the categories associated with that query. The user was allowed to add
additional categories if the situation required. Any image that was judged irrelevant or whose category label was not
evident was assigned to the other
category. Finally, ground truth category was determined by using a majority voting
scheme. This labeling task was performed on all the 20 queries. The result of this categorization is stored in a separate
json file the schema for which is shown in Figure 2
.
- query_data.json
- This file contains all the images related to the keyword query and all the relevant social tags associated with the images in the result set
- query_result_categorization.json
- This file contains the result of the categorization/labeling
.
├── queries
| |
| |──
| |
| ├──
| |
| ├── <query_name>
| | ├── query_data.json
| | └── query_result_categorization.json
| |
| |──
| |
|
├── README.md
└── LICENSE
@object(2) { \\ top-level object
"about": @object(2) { \\ about object
"query": "string", \\ query name
"date": "string", \\ date when the query was crawled
"photos_license_creativecommons": "string" \\ number of photos with some creative commons license
},
"data": @array [ \\ data object, list of objects for each photo
@object(11) { \\ object for an photo
"id": "string", \\ photo ID in flickr
"photo_physical": "string", \\ url of photo in jpg format of size medium 640
"photo_web_url": "string", \\ url of photo page in flickr
"photo_tags": @array [ \\ list of tags of the photo in flickr
"string"
],
"photo_owner": @object(3) { \\ details of the owner of the photo
"owner": "string",
"username": "string",
"realname": "string"
},
"photo_metadata": @object(2) { \\ metadata of photo in flickr
"title": "string",
"description": "string"
},
"owner_groups": @array [ \\ list of groups the owner is a member of
@object(2) {
"nsid": "string",
"name": "string"
}
],
"photoset": @array [ \\ list of visible photosets to which the photo belongs to
@object(4) {
"id": "string", \\ ID of photoset
"title": "string", \\ title of photoset
"photoset_web_url": "string", \\ url of photoset in flickr
"top_tags": @array [ \\ most frequent tags from first 50 photos in photoset (atmost 20)
"string"
]
}
],
"photogroup": @array [ \\ list of visible flickr photogroups to which the photo belongs to
@object(3) {
"id": "string", \\ ID of photogroup
"title": "string", \\ title of photogroup
"photogroup_web_url": "string" \\ url of photogroup in flickr
}
],
"photo_comment": @array [ \\ list of comments for the photo
@object(3) {
"id": "string", \\ ID of comment
"author_id": "string", \\ ID of author of the comment
"date": "string" \\ timestamp when comment was posted
}
],
"photo_favourite": @array [ \\ list of people who have favorited the photo
@object(3) {
"nsid": "string", \\ ID of user
"username": "string", \\ username of user
"date": "string" \\ timestamp of the action
}
],
"license": @object(2) { \\ license associated with the photo
"name": "string", \\ type of license
"url": "string" \\ details of the license
}
}
]
}
Note: The user IDs, usernames and realnames have been anonymized to USER_<number>
. A particular user is referred with same anonymized name across the whole dataset.
@object(2) { \\ top-level object
"about": @object(1) { \\ about object
"query": "string" \\ query name
},
"categorization": @array [ \\ categorization object
@object(2) { \\ object for a category
"name": "string", \\ category name
"images": @array [ \\ list of photo IDs in the category
"string"
]
}
]
}
<!-- <table>
<tr>
<th>Category name</th>
<th># of images</th>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</table>
<a href="" title="" target="_blank"></a>
-->
# | Query | # of images | Categories | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | argos | 434 |
|
||||||||||||||||||||||||||||||||
2 | cardinal | 425 |
|
||||||||||||||||||||||||||||||||
3 | eagle | 469 |
|
||||||||||||||||||||||||||||||||
4 | greyhound | 415 |
|
||||||||||||||||||||||||||||||||
5 | indian | 375 |
|
||||||||||||||||||||||||||||||||
6 | jaguar | 429 |
|
||||||||||||||||||||||||||||||||
7 | java | 438 |
|
||||||||||||||||||||||||||||||||
8 | kings | 414 |
|
||||||||||||||||||||||||||||||||
9 | oasis | 426 |
|
||||||||||||||||||||||||||||||||
10 | queen | 442 |
|
||||||||||||||||||||||||||||||||
11 | saturn | 422 |
|
||||||||||||||||||||||||||||||||
12 | scorpion | 462 |
|
||||||||||||||||||||||||||||||||
13 | seal | 410 |
|
||||||||||||||||||||||||||||||||
14 | spider | 412 |
|
||||||||||||||||||||||||||||||||
15 | triumph | 392 |
|
||||||||||||||||||||||||||||||||
16 | giant | 408 |
|
||||||||||||||||||||||||||||||||
17 | tesla | 387 |
|
||||||||||||||||||||||||||||||||
18 | tiger | 417 |
|
||||||||||||||||||||||||||||||||
19 | prince | 424 |
|
||||||||||||||||||||||||||||||||
20 | wilson | 430 |
|