User talk:Duesentrieb/UntaggedImages

Active discussions

This page is for discussion about UntaggedImages [1], a tool written by User:Duesentrieb. If I do not respond, please leave a quick note on my talk page at the german wikipedia.


Untagged imagesEdit

moved here from commons:User talk:Duesentrieb -- Duesentrieb 16:50, 9 January 2006 (UTC)


I encountered a small problem with your untagged images script, it looks like it does count images with language templates as tagged, see commons::Image:Lahore.jpg and commons::Image:Lahore 1.jpg. Both are tagged with {{de}} and {{en}} language templates and did not show up as untagged. Maybe something for the blacklist ? (Language templates and all redirects to them)--Denniss 00:49, 28 December 2005 (UTC)

currently, all templates are recognized as tags (there's a blacklist for categories, but not yet for templates). I'll look into it, thanks for the suggestion. -- Duesentrieb(?!) 12:26, 29 December 2005 (UTC)
All acceptable license templates should be in the blacklist, some additional ones like incomplete license or insignia not. I hope this won't boost the list too much, there's enough work left with the current version. P.S: What do you think about the private tags some users have ? Shoudld'n they be moved to a template to use the correct namespace ?--Denniss 19:15, 29 December 2005 (UTC)
  • Bug or server/database trouble ? Untagged now shows thousands of images maybe requiring a null-edit. Two days ago I only had about ten in this category. --Denniss 14:02, 3 January 2006 (UTC)
since yesterday, template inclusions are stored in the database separately from "normal" links. I'll have to update my tools to use the new table. I hope I can fix this today or tomorrow. -- Duesentrieb(?!) 14:11, 3 January 2006 (UTC)
  • Another suggestion to improve things: We need an option to show (or not show) images by tagging status, as long as there are so many indicated as null-edit it may be good to disable showing these images via checkbox or similar stuff. Such a checkbox should contain: Show Untagged imges, show untagged images with indicated license, show images with private or manual tag, show images requiring a null-edit. --Denniss 15:58, 5 January 2006 (UTC)
As for tagged files showing as untagged, I guess we simply have to wait until the link tables are rebuild for all projects - I was told that that will take about a week or so.
Filtering by the indicated status is problematic: the initial filtering (and offset and limit) take place in the database - at that stage, the detaild "tagging status" is not known. I could of course just hide things according to a filter, after i get them from the database - but that would mean the you may so only a few (or even no) images on one page, but clicking "next 20" would get you more pictures at some point. That's confusing...
This means I can really only do filtering by thing I can access directly in the database efficiently -- Duesentrieb(?!) 16:39, 5 January 2006 (UTC)
Currently images are sorted by date, why not sort by tag status ? Just give the option what tag status to be first. --Denniss 01:50, 6 January 2006 (UTC)
for the same reason filtering by tag status is not possible, sorting isn't either (it would involve loading entries for all images into memory - that would not work). The tagging status is not in the database, it's calculated on the fly for display. -- Duesentrieb(?!) 02:41, 6 January 2006 (UTC)
I love it! Using it on the English Wikipedia, I find a lot of image redirect pages (which are correctly marked as untagged). It would save some time if the "redirect=no" parameter were already there. Other than that, it works great! commons:User:dbenbenn 23:31, 5 January 2006 (UTC)
will be in teh next version, thanks for the suggestion -- Duesentrieb(?!) 02:43, 6 January 2006 (UTC)

Some initial commentsEdit

moved here from commons:Commons talk:Tools/UntaggedImages -- Duesentrieb 17:06, 9 January 2006 (UTC)

Wow! That's great. OK, a few questions/comments:

  • It said this commons::Image:Verbreitungsgebiet des heutigen Mitteldeutschen.PNG "seems to use a manual tag" - does that mean the user subst:ed the tag? And by "tag" we are just referring to copyright/licencing information, correct?
    • yes, that's what it means
  • For several images like commons::Image:Woodduck95.jpg, it said "not tagged, but indicates public domain". Actually it is tagged. I think the uploader also subst:ed the tag in this case too. Maybe we should ask people not to subst: copyright info tags.
    • yes, like above
  • Is using private tags discouraged? commons:User:Halibutt has been doing that a fair bit, also commons:User:Blaite.
    • it only reason to discurage it would be my tool, i guess: it can not recognize private tags in the initial filtering step; it can only take a guess later (i.e. say "private tag")
  • "seems to be tagged - try a null-edit" - why? these ones are all tagged (well, I'd bet).
    • This is a database problem. Sometimes, the fact that a template is used does not get written to the links table. Until some months ago, for instance, this always happened if you supplied the tag directly in the upload comment. usually, just re-saving the page without changing anything resolves that problem - but apperently, not always.
  • The date should probably include HH:MM time, which will be especially useful as commons gets busier.
    • I can add that, but i'm not sure how useful that would be.
  • Could be handy to include a link to the users' Talk page, in case they need to have reminder notices
    • Yes, I have thought that myself... not sure how to do it layout-wise, without making it look crowded.

How far back does the history kind of go? I noticed when I tried to filter by my name, it didn't pick anything up. :) But if you only put it up today, that's why. Will it show all the results "in the last X days" or since you've put it up (ie since now)?

    • The history gues back all the way - all images ever uploaded are looked at. If it didn't show anything for your name, this should mean that you have tagged all your images.

pfctdayelise 01:09, 22 December 2005 (UTC)

thanks for your comments! -- Duesentrieb(?!) 10:14, 22 December 2005 (UTC)
Number one and two did not use a license template, they copied the source text of the english license templates instead of using the templates available here. Changed this. --Denniss 02:24, 22 December 2005 (UTC)
thanks! -- Duesentrieb(?!) 10:14, 22 December 2005 (UTC)

Great tool! Much better then commons:Special:Unusedimages :-) I would like to add some comments too:

  1. Add filter by severity problem. Like "untagged images", "null-edit", etc. So untagged images could be processed first.
  2. Select untagged images with one history entry in separate group. They are candidates to bot-assisted {{unknown}} adding.
  3. Detect images with plain GFDL/PD tags and categories only. They are candidates for {{no source}}.

EugeneZelenko 16:09, 22 December 2005 (UTC)

Thanks for your suggestions. I'll see if I can implement some of that soon - though not until next year, i'm quite busy right now.
About your first request: sorting or filtering by the status column would be hard to do, because of how this tool works: first i look at the pagelinks table to find description pages that do not link to a page in the template namespace ("inclusion" is treated like a normal link in the database). For the result, I then load the page text and guess the status based on that. Sorting/filtering by the status would require to first load the description text of all images i get from the links table (currently about 4800). That would be pretty slow. -- Duesentrieb(?!)

Filter redundant imagesEdit

There should be a filter to exclude images tagged as "redundant". They are listed as "no license tag!" although they are scheduled for deletion (seems to be a very slow process though) --Baikonur 14:06, 6 February 2006 (UTC)

Well, technically, it's true: "redundant" is not a license tag (and license tags should not be removed if an image is tagged as redundant). In fact, I would have to remove it from the filter, so it is treated as a license tag... which might or meight not make sense, depending on what you are trying to do.
Currently, all templates that are not in a special blacklist are considered to be license tags. That list is very inclumplete, so many images that are untagged are not listed. Maybe it would be better to explicitely maintain a list of license tags... I'll have to think about it. -- Duesentrieb 21:09, 6 February 2006 (UTC)

A suggestion and a questionEdit

I think that it would be helpful to have the page display a direct link to editing the image description page.

How does the script determine when to issue the "no license tag!" message? I've seen it flag a few that seem to have a valid image copyright tag.

Thanks, JYolkowski 01:20, 13 February 2006 (UTC) (en:User:JYolkowski)

a direct link may be useful... maybe i'll add it.
to your other question: basically, it looks for image that don't have any tag on them, ignoreing a few (like {{information}}, etc). The it looks at the page text and tries to find out more. "no license tag!" means there is some template, but it is on the list of non-license templates. If there's no tag at all, and no indication of the intended license, then it sais "untagged".
The way it's coded now, there are probably lots of false negatives (images that are not properly tagged, but will not show up in this list), but there shouldn't be fals positives (i.e. tagged images showing up) - and if they are listed (due to a database inconsistency), they should have a purple "appears to be tagged" message. If you find any correctly tagged image that is labeled "untagged" or "no license tag", please give me a link. -- Duesentrieb 00:27, 14 February 2006 (UTC)

Database LagEdit

I was just wondering what is causing the current database lag, it hasn't been updated for over a day now. Is this a permanent problem? Martyman 22:57, 16 February 2006 (UTC)

I should point out I am refering to the use of this tool with regards to en.wikipedia.org Martyman 23:00, 16 February 2006 (UTC)
Database lag for all wikis is the same. And I don't know what the cause is, i'm not an admin at the toolserver, and I don't know anything about how the replication works. I'll try to ask Kate about it - I'd like it to work, too ;) -- Duesentrieb 12:27, 18 February 2006 (UTC)
Ah, so as I understand it. The toolserver uses a local copy of the database which is replicated through some proccess hidden to users of the toolserver. Oh, well it seems to have been updatign again anyway. Thanks for the useful tool. Martyman 02:19, 19 February 2006 (UTC)

Wonderful tool!Edit

This is very useful. I've suggested the en.wikipedia Untagged Images project switch from mannual lists to this tool; it's more versitile and generally better. Great work! 134.10.45.108 11:58, 8 March 2006 (UTC) (actually en:User:JesseW)

Error report:Edit

I got the following error just now; copying it here just so you are aware of it. 66.81.17.7 18:12, 10 April 2006 (UTC) (actually w:en:User:JesseW)

Warning: mysql_query(): Unable to save result set in /home/daniel/MediaWiki/phase3/includes/Database.php on line 440 A database error has occurred Query: SELECT img_name, img_user, img_user_text, img_timestamp, img_description FROM image JOIN page ON page_title = img_name LEFT JOIN templatelinks ON tl_from = page_id AND tl_title NOT IN ("en", "EN", "de", "DE", "fr", "FR", "jp", "JP", "es", "ES", "ru", "RU", "German", "Deutsch", "English", "Thumbnail", "Redundant", "Information", "PAGENAME", "NAMESPACE") WHERE page_namespace = 6 AND page_is_redirect = 0 AND tl_from IS NULL ORDER BY 'img_timestamp' ASC LIMIT 0, 100 Function: Error: 1053 Server shutdown in progress (sql)

Backtrace:

   * GlobalFunctions.php line 581 calls wfbacktrace()
   * Database.php line 474 calls wfdebugdiebacktrace()
   * Database.php line 424 calls databasemysql::reportqueryerror()
   * WikiQuery.php line 259 calls databasemysql::query()
   * WikiQuery.php line 679 calls wikiquery::printdata()
   * UntaggedImages.php line 206 calls wikiquery::printpage()


Either the query took more than five minutes and got shot down, or the server went down for maintanance. -- Duesentrieb 13:13, 11 April 2006 (UTC)

I've got another error, now:

Database Error: Lost connection to MySQL server at 'reading initial communication packet', system error: 111 (zedler.ts-local) - failed to connect to log database - failed to log script start!

Database Error: Lost connection to MySQL server at 'reading initial communication packet', system error: 111 (zedler.ts-local) - failed to connect to WikiList database

Fatal error: Uncaught exception 'MWException' with message 'failed to connect to WikiList database ' in /home/daniel/MediaWiki-live/phase3/includes/GlobalFunctions.php:668 Stack trace: #0 /home/daniel/public_html/WikiSense-live/common/WikiSense.php(158): wfDebugDieBacktrace('failed to conne...') #1 /home/daniel/public_html/WikiSense-live/common/WikiSense.php(168): getWikiListDB() #2 /home/daniel/public_html/WikiSense-live/common/WikiSense.php(290): getWikiInfo(Array, 1) #3 /home/daniel/public_html/WikiSense-live/web/common/WikiSelector.php(107): getWikiInfoFromDomain('commons.wikimed...') #4 /home/daniel/public_html/WikiSense-live/web/common/WikiQuery.php(103): WikiSelector->WikiSelector() #5 /home/daniel/public_html/WikiSense-live/web/UntaggedImages.php(150): WikiQuery->WikiQuery('Untagged Images', 'SELECT img_name...') #6 {main} thrown in /home/daniel/MediaWiki-live/phase3/includes/GlobalFunctions.php on line 668 - script start not logged! logging end anyway. 217.232.55.191 14:23, 26 February 2007 (UTC) commons:User:Koernerbroetchen

En seems to be more broken than I expectedEdit

The results for the (7 day+ lagged) english wikipedia seem to be more broken than I would have expected. Not only is the data 7 days old, but a vast number of the entries say "Page not found" in the status column, and show as tagged when I review them mannually. Any ideas why this might be happening, or what we could do about it? 66.81.17.34 03:55, 27 April 2006 (UTC) (actually en:User:JesseW)

Yes, the toolserver database for the english wikipedia is corrupt. It's a result of the effort to get data from the new master db. This is being worked on, there's nothing I can du but wait. -- Duesentrieb 17:36, 28 April 2006 (UTC)

UntaggedImages [2] is a tool written by Duesentrieb for finding images missing a license tag. See User:Duesentrieb/Tools for other tools I wrote.

Please use the talk page for questions and comments. If I do not respond, please leave a quick note on my talk page at the german wikipedia.

dt: Nach welchen Kriterien funktioniert es?Edit

Hallo Daniel Düsentrieb, nach welchen Kriterien funktioniert Dein Tool? Werden die Bilder mit unvollstädnigen Tags manuell gekennzeichnet oder durch einen Bot oder durch was? Nach welchen Kriterien? Warum reagiert das Tool mit 24stündiger Verspätung auf Veränderungen? In Hoffnung auf eine schnelle Antwort -- Jlorenz1 15:57, 21 January 2007 (UTC) in der deutschen wikipedia jlorenz1@web.de

Die Verspätung hängt davon ab, wann der Toolserver die Daten aus der deutschen WIkipedia bekommt. Idealeerweise sind das nur Sekunden, im Moment sind es aber ~11 Stunden.
Das Kriterium ist einfach: es listet alle Bildseiten, die kein Template enthalten - einige Templates werden dabei ignoriert ({{information}} zum beispiel). -- Duesentrieb 16:52, 21 January 2007 (UTC)
Hallo Daniel Duesentrieb, hast Du eine Ahnung wie zwei Tools dieses und dieses zu unterschiedlichen Ergebnissen kommen? Und was man in den Bildern besser machen kann. Mir gehen trotz Bemühen die Ideen aus ... unbd obwohl ich die Textvorlage benutze. Zudem arbeite ich selber an einem Hochladetool für Bilder. Aus diesem Grunde wäre erst recht interessant wie man unnötigerweise Ärger vermeidet bzw. womöglich Dein Tool entsprechend umstellt (nicht böse sein ;-)Insbesondere: Hängt es vom Markieren als BLUs Dritter ab??? -- Jlorenz1 20:20, 22 January 2007 (UTC) in der dt.Wikipedia jlorenz1@web.de
Nein, mein Tool interessiert sich nicht für BLU tags - bzw zweigt nur Bilder die weder einen Lizenz- noch einen BLU-Baustein haben. Das Problem ist hier, dass der Toolserver im moment ca. 13 Stunden hinterherhinkt - daher hat er z.B. diesen Edit noch nicht mitbekommen. {{information}} is kein Lizenzbaustein, du must schon einen solchen zusätzlich verwenden. -- Duesentrieb 21:41, 22 January 2007 (UTC)

TimeoutEdit

The since:ever choice doesn't seem to work, am I just hammering the toolserver trying? It would be useful, even if it were only cached. - Cohesion 21:00, 4 August 2007 (UTC)

I have always had troubles getting untagged Commons images older than 7 days. This is the last message:

commons.wikimedia.org, by Date, until 2007-08-13 14:45:42

Warning: mysql_query() [function.mysql-query]: Unable to save result set in /home/daniel/MediaWiki-live/phase3/includes/Database.php on line 789

A database error has occurred Query: SELECT img_name, img_user, img_user_text, img_timestamp, img_description, (select count(*) from imagelinks where il_to = img_name) as usage_all, (select count(*) from imagelinks join page on page_id = il_from where il_to = img_name and page_namespace = 0) as usage_main FROM image JOIN page ON page_title = img_name LEFT JOIN templatelinks ON tl_from = page_id AND tl_title NOT IN ("en", "EN", "de", "DE", "fr", "FR", "jp", "JP", "es", "ES", "ru", "RU", "German", "Deutsch", "English", "Thumbnail", "Redundant", "Information", "PAGENAME", "NAMESPACE") WHERE page_namespace = 6 AND page_is_redirect = 0 AND tl_from IS NULL AND img_timestamp < '20070813144542' ORDER BY img_timestamp ASC LIMIT 0, 100 Function: Error: 1053 Server shutdown in progress (sql-s2) OOPS! exception in wsfExceptionHandler: A database error has occurred Query: UPDATE log SET `status` = 'error', `time` = 320, `comment` = 'A database error has occurred\nQuery: SELECT img_name, img_user, img_user_text, img_timestamp, img_description,\n (select count(*) from imagelinks where il_to = img_name) as usage_all, \n (select count(*) from imagelinks join page \n on page_id = il_from where il_to = img_name and page_namespace = 0) as usage_main\n FROM image \n JOIN page ON page_title = img_name\n LEFT JOIN templatelinks ON tl_from = page_id AND tl_title NOT IN (\"en\", \"EN\", \"de\", \"DE\", \"fr\", \"FR\", \"jp\", \"JP\", \"es\", \"ES\", \"ru\", \"RU\", \"German\", \"Deutsch\", \"English\", \"Thumbnail\", \"Redundant\", \"Information\", \"PAGENAME\", \"NAMESPACE\")\n WHERE page_namespace = 6 AND page_is_redirect = 0 AND tl_from IS NULL AND img_timestamp < \'20070813144542\' ORDER BY img_timestamp ASC LIMIT 0, 100\nFunction: \nError: 1053 Server shutdown in progress (sql-s2)\n' WHERE id = 1805887 Function: wsfScriptLogUpdate Error: 2006 MySQL server has gone away (sql) original error message: A database error has occurred Query: SELECT img_name, img_user, img_user_text, img_timestamp, img_description, (select count(*) from imagelinks where il_to = img_name) as usage_all, (select count(*) from imagelinks join page on page_id = il_from where il_to = img_name and page_namespace = 0) as usage_main FROM image JOIN page ON page_title = img_name LEFT JOIN templatelinks ON tl_from = page_id AND tl_title NOT IN ("en", "EN", "de", "DE", "fr", "FR", "jp", "JP", "es", "ES", "ru", "RU", "German", "Deutsch", "English", "Thumbnail", "Redundant", "Information", "PAGENAME", "NAMESPACE") WHERE page_namespace = 6 AND page_is_redirect = 0 AND tl_from IS NULL AND img_timestamp < '20070813144542' ORDER BY img_timestamp ASC LIMIT 0, 100 Function: Error: 1053 Server shutdown in progress (sql-s2) --Jusjih 14:54, 20 August 2007 (UTC)

Feature requestEdit

Is it possible to have this tool optionally give 2 outputs: one wikilinked list of the files and one wikilinked list for the corresponding users? This will make it very easy to tag untagged images and notify their uploaders using AWB. Mike.lifeguard | talk 22:50, 28 September 2007 (UTC)

bug?Edit

Got this looking for untagged images on en.wikiversity (all users) between ever and now.

Warning: mysql_select_db(): supplied argument is not a valid MySQL-Link resource in /home/daniel/MediaWiki-live/phase3/includes/Database.php on line 1566
Fatal error: Call to a member function selectField() on a non-object in /home/daniel/MediaWiki-live/phase3/includes/ExternalStoreDB.php on line 107

Mike.lifeguard | @en.wb 00:57, 10 October 2007 (UTC)

de:Kurze Erklärung wäre gut.Edit

Es wäre gut, wenn es eine kurze Erläuterung gäbe. Wann genau spricht dein Tool von "untagged" anstatt von "no licence tag" ? Was bedeuten bei Usage zweierlei Zahlen ?

Das sollte an gut zu findender Stelle stehen. 84.150.196.210 09:49, 14 October 2007 (UTC)

Return to the user page of "Duesentrieb/UntaggedImages".