Community Wishlist Survey 2017/Archive/Image recognition and tagging

Image recognition and tagging

  • Problem: Image description and categorising is way too long work, that many users, don't do it.
  • Who would benefit: Wikimedia Commons, Wikipedia, contributors
  • Proposed solution: There are two solutions, which may be used alone, or combined. One is to retrieve some information from GPS. If the GPS is harvested by camera, it is usually not correct, but we can get an estimate for a place, where picture was taken (in which settlement (village, municipality)). This estimate may be written to the file description and from this estimate can be created a category. At the same time, we may tag those files with a template announcing this technology was used and this information should be controlled by a human. If the coordinates are set manually the guess is more precise more likely correct.

The other technology is the image recognition provided e.g. by Google, which recognised basic objects (church).

Combination of these technologies, may provide better recognition (Church of St. Peter).

  • More comments:
  • Phabricator tickets:

Discussion edit

As someone who categorizes images, I welcome any information that can be generated automatically from the GPS coordinates. Often the building or village name is not enough. Sometimes the city name is not enough. The Panoramio and Flickr tags are often not enough. I have to know the province or district or country name, and the only way to find that is to load a Google map. I agree that the automatically generated information should be tagged as temporary within the file, or alternatively the image should be placed in a category requiring human attention. Downtowngal (talk) 03:16, 9 November 2017 (UTC)[reply]

Yup, thats correctly said: category requiring human attention.
Anyway if you have a village name, we/they can automate to provide categories of higher administrative units, as we are able to write a code, which will do it.--Juandev (talk) 09:37, 9 November 2017 (UTC)[reply]
Even having a country or region name would be useful in order to narrow it down. Commons has a huge backlog of uncategorized images and media, some of which could be auto-tagged based on GPS info, and then placed in commons:Category:Media_needing_category_review (or perhaps even location-based subcategories, for instance, "Media needing category review geotagged in India") for human review. Dragfyre (talk) 18:26, 9 November 2017 (UTC)[reply]
I have been just a little bit digging in this area. Coders know two ways of recognition and tagging. One is provided by Google. They provide 1000 image/month/user in Google Cloud for free and than each image for some cents of USD. But I have seen around internet other companies, who work on recognition. As far as I know, they work for business and develop recognition for them. So the technology of image recognition is probably widely know. I don't think so, our devs of WMF would work on its own recognition bot, but we may test the technology on those 1000 free recognised images by Google and then if there is a high success rate WMF may negotiate a partnership and negotiate some mass usage.
The other way of recognition is done somehow via search and I don't know much about it, how it works, I have seen just an example. I guess weather this harvest all available data and metadata about our image (image on Wikimedia Commons) and than using the algorithm to try to find out most probable depiction. But this is just guessing. Maybe they do it via coords, because there is free tool called Jeffrey's Image Metadata Viewer, which do this work using coords of the images and different map application, to name photographer position. In this case you get the house number. So the expample for the file.jpg might be: Czech Republic, Klatovy Region, Klatovy District, Train Street no. 44, i.e. Country, NUTS x, NUTS x, Street, House number.
But the question is how precise is the automate coordination harvest. For those, who add coords, getting them from satellite imagery and adding also the way of which the photograph was taken, may this be very useful and precise. I wonder if we are able to write the address of the photographer position in words, that it might be possible to write other objects in words using POI (like Church of St. Peter). There is also a technology, which can show and write down all objects in the way in which photograph was taken. It can also remove non visible objects caused by elevation, but it probably is not ably to hide non visible objects, behind human construction. So, if there is a picture of house and there is a Church of St. Peter behind this house, but it is not visible on this image, because the house hides it, the code is able to say there is the Church in the range, but I am not sure if the code can also say that in this case is not visible, because there is a house in behind. This tool is called Hey Whats That!--Juandev (talk) 20:29, 9 November 2017 (UTC)[reply]
In the case of location tagging, categorization needn't be too precise. Geotagging on some phones isn't always so accurate in the first place; if you're in an area with poor cell coverage, for instance, a photo you take might be tagged with a location that's several kilometres away from your actual position. In most cases, though, it'll at least be in the same country or region: The coordinates may not tell me the correct street or town it was taken in, but I may know it's in Maharashtra, which is in India. Even drilling down this far will help a great deal with categorization. Dragfyre (talk) 18:34, 10 November 2017 (UTC)[reply]
I support Dragfyre's suggestion of location-based subcategories "media...geotagged in India." For popular names like "San Fernando," that tells us which country it is in. It directs people with expertise in that country to work on that subcategory. However, for countries with many English-speaking Wikipedians who participate as uploaders and categorizers, a lower level subcategory (state, province, region) would speed up categorization. If I know that all the images are in, for example, Florida, my work will be faster than if I only know they are in the United States. I have a question about this proposal. Often people upload several images from one location. Will there be a way for the categorizer to know that images 4, 19 and 25 tagged "Florida" are from the same upload, without looking at each individually? Can we at the same time be shown a map with the geotagged but unclassified images pinpointed, like Open Street Map? Downtowngal (talk) 01:50, 10 November 2017 (UTC)[reply]
A map would be a huge plus, but at the same time, I think I'd be happy with a tool that does the categorization. I seem to remember that there are some geotagging tools that will show you all the images in a certain category on a map. I'll have to look it up. Dragfyre (talk) 18:40, 10 November 2017 (UTC)[reply]
I want to add something to my comment. The tool used should harvest information from Google or another map that provides all the information (village name, region name, etc.) in English. Why? I want the English spelling of place names, because most of the time Commons categories use English spelling of place names. I don't want a map that uses only the local language, like OpenStreetMap, and generates names with their local language spelling. That is not much help. Downtowngal (talk) 01:37, 10 November 2017 (UTC)[reply]
As long as Google's API is usable for these purposes, I suppose. There might be a possibility of incorporating data from Wikidata, too; locations there are generally tagged with their coordinates. Dragfyre (talk) 18:40, 10 November 2017 (UTC)[reply]
I just had an idea. Is it possible to sort the images in "media needing category review geotagged in India" by coordinate? I think that ability would be useful to a person who categorizes images. That way, all the images from a single village will be close together. Downtowngal (talk) 01:58, 10 November 2017 (UTC)[reply]
Good idea. Dragfyre (talk) 18:40, 10 November 2017 (UTC)[reply]
To sort images from Florida by uploader should be possible. Such options have allready "Perform Batch Task".--Juandev (talk) 22:09, 10 November 2017 (UTC)[reply]
But I want to sort only images from Florida that have not already been categorized. If the uploader has been active earlier, some of their uploads may already be categorized. Can I restrict my sorting to images in "media needing category review geotagged in Florida"? Also, please remember that the tools should be easy to use for people who are not IT specialists and who are not native speakers of English. I am a layperson and I am afraid to "perform batch task" because if I make a mistake I don't know how to fix it. Downtowngal (talk) 22:40, 10 November 2017 (UTC)[reply]
You could make a decent CAPTCHA out of this. MER-C (talk) 07:04, 10 November 2017 (UTC)[reply]

Hey all, the Structured Data on Commons team is planning to implement image recognition in 2019 as part of the Structured Data on Commons program. Abittaker (WMF) (talk) 17:01, 13 November 2017 (UTC)[reply]

Archived edit

Per Abittaker above. Since this work is already scheduled, no point in voting. Thanks for participation in our survey. Max Semenik (talk) 23:26, 20 November 2017 (UTC)[reply]

I see that image recognition is scheduled, but does that also include the idea of auto-adding images to categories based on the coordinates included in EXIF data? Dragfyre (talk) 14:49, 29 November 2017 (UTC)[reply]