Tech/Archives/2011

< Tech‎ | Archives

Looking for a tool that checks YES or NO for transclusions

Background

At Wikisource, we (well, most of the language subdomains) have Page: and Index: namespaces for our general work area in which we correct text, before transcluding the text into the main namespace for wrapping and public display. Our general process is to upload an image file (djvu with text layer) to Commons, create a corresponding index page at Wikisource which basically oversees all the corresponding pages in the Page: ns (one Page: ns page per scanned page). From there we use Extension:ProofreadPage to extract text (also see oldwikisource:Wikisource:ProofreadPage
So by example

Issue

Our issue is that for various reasons we can need to move or remove large numbers of Page: namespace pages (long list of reasons, however, can include better image file available, reloaded shortened or lengthened file, etc.). What we can need to do is to check an Index: file to see which Pages in the <pagelist> are transcluded (simple check of YES this page is or NO this page is not) so we can do bulk tidy up (be it a move or a delete). Another example of the use of the tool is that pages may be proofread, however, we have slipped up and not transcluded the text, so we want to identify those pages for the displayed work.

Using the above Index: page as an example, the text on the saved Page: pages are out of alignment with the images. We have purged the page and now havetext<->scan alignment for the non-created pages, however, I want to know which Page: pages have been transcluded, so should be kept, and which pages have not been transcluded, so they can be deleted by a bot/separate process. While this is not a frequent task, it happens sufficiently that the ability to dynamically generate a list would be most useful. Ideally, I would like to be able to build into the Index: page template a link that could be clicked that would dynamically generate the list for the page. I am told that there is sufficient data in the API to that can iterated the pages and to generate the list with jquery. That is beyond my simple skill set, so here I am with the request. billinghurst sDrewth 03:38, 22 April 2011 (UTC)

Supplementary: After being quizzed in IRC, some clarify components, and talking from bare functioning to pie-in-the-sky.
  • want to look at all of the links on the Index page and check to see if their correspnding pages have been transcluded?
  • output two lists based on the results. (YES list, NO list; the more choice of YES or NO or YES & NO the better. Output sorted would be great, wiki or html format are okay, something that can be imported to w:WP:AWB would be generous and very useful
  • (pie-in-the-sky) generate output that could have a list of all the pages sorted by the main ns (and subpages) work to which they are transcluded, eg. chapter one has pp/7,8,9,...15, chapter2 pp/15-24, ...; not transcluded p/25, p/347.
Live example
I am envisaging this all with regard to a per Index: page basis as that is usually the level of the query, and generating for the Index page gives the most useful result. If the scope is it is easier to take/build a wholistic (grandiose) approach eg. http://toolserver.org/~magnus/catscan_rewrite.php?project=wikisource, and then give a filtering per Index page in each query, I am really happy for more bang for the buck. :-) Looking for Index files in which to play, try s:Category:Index Proofread billinghurst sDrewth 05:14, 22 April 2011 (UTC)
First pass: <http://toolserver.org/~mzmcbride/checker/?db=enwikisource_p&title=Index%3AThe+Dictionary+of+Australasian+Biography.djvu>. Let me know what you think.
I guess status isn't stored in the database (or isn't easily accessible). That needs to be poked at. --MZMcBride 21:47, 22 April 2011 (UTC)
Sweet. Can we tell it to ignore pages that don't exist (red links), alternatively just sort them to different list. I will ask about the page status. billinghurst sDrewth 02:05, 23 April 2011 (UTC)
Easy enough to exclude them.
I'm still poking around at the Proofread Page extension. It's almost so bad that I'm having difficulty believing it. --MZMcBride 02:20, 23 April 2011 (UTC)
As it turns out, the extension does store this information in the database, it's just hidden away in the categorylinks table. I've filed two bugs about implementing an API to allow users to retrieve, set, and/or change the page status of a page (bugs 28893 and 28894). --MZMcBride 15:23, 9 May 2011 (UTC)

Sidebar not working correctly here at Meta-Wiki

The sidebar is currently broken at Meta-Wiki. I've filed bug 28892 to monitor this issue. If anyone has an idea of how to debug or fix this issue, that would be appreciated. It might be a matter of creating /en subpages or something, but that would still mean that the fallback behavior is broken. --MZMcBride 15:24, 9 May 2011 (UTC)

Per the reply to the bugzilla bug request, I asked a couple of admins to make null edits on MediaWiki:Babylon-text etc, and it seemed to fixed the problem. Obviously not fixing the root of the problem, so might be best to leave that bugzilla request as still open. Hope this helps, The Helpful One 17:48, 9 May 2011 (UTC)
Right. All you've managed to do was destroy every test case of the bug, making fixing the problem substantially more difficult. --MZMcBride 19:24, 17 May 2011 (UTC)

Requesting trialling extensions - is there a how to or where to?

At English Wikisource, there was a discussion from a new user about the ability to download our completed works in the EPUB format and others. After a couple of leads, and some reading I see mw:Extension:EPubExport is listed as possibly having the capability. Is there a means to test the extension for capability and suitability? Where would such a request be listed? Is this something that WMF has a sandbox to test the implementation? billinghurst sDrewth 10:32, 17 May 2011 (UTC)

I have been told to at least start with a Bugzilla. billinghurst sDrewth 11:48, 17 May 2011 (UTC)
You might be able to test that out at Testwiki. I'm not sure if you need to file a bugzilla to enable it there or how you'd go about doing that. Killiondude 17:35, 17 May 2011 (UTC)
You need to file a bug about getting the extension reviewed and then deployed. Even to be enabled at test.wikipedia.org (or one of the prototype sites), the extension must be reviewed first. You should also add the bug to the mw:Review queue. If/when you file a bug, please note it here (on this page) for cross-referencing purposes. --MZMcBride 19:27, 17 May 2011 (UTC)
Cross-reference: bugzilla:29023. --MZMcBride 03:19, 18 May 2011 (UTC)

Commons' Files Report

Hi. It would be nice if Commons had a report (maybe @ commons:COM:DBR) that listed "Most used files globally" or something. Then those files could have upload protection enabled to minimize potential vandalism targets. This made me think of it, though I doubt that is one of the top 500 most used files. Killiondude 21:15, 17 May 2011 (UTC)

Assuming the query completes successfully and there are no silly typos in my code, the report will appear at commons:Commons:Database reports/Files with the most uses globally in the next day or so. --MZMcBride 04:02, 18 May 2011 (UTC)
Thank you. Killiondude 05:12, 18 May 2011 (UTC)

List of articles covered under Pending Changes (Enwiki)

Could we please have a report in list format that meets the following criteria:

  • English Wikipedia articles that do or have had pending changes level of protection applied for any duration between June 1, 2010 and present
  • Limited to articles that are categorized within the "biography of living persons" group
  • If possible, the information for each article should include:
  • Date pending changes started
  • Date pending changes removed
  • Current level of protection

This is to assist the project in its devolution of the now-completed trial of pending changes. Thanks! Risker 02:24, 22 May 2011 (UTC)

Done here. Please note that the toolserver is a little out of date so very recent changes may not be reflected in the report. When the replag has gotten better, the query could always be re-run. --Shirik 21:27, 22 May 2011 (UTC)
Oh, to explain it to non-technical users, a "NULL" in pr_type/pr_level indicates it is not currently protected at all. A "NULL" in lastRemoved means PC was never manually removed. isConfiguredForPC is 0 if the page is not currently under PC, or 1 if it currently is (as of however up-to-date the toolserver was at the time this query was run, which was about 24 hours). Also, every so often the toolserver is saying my account is expired, so you might have trouble seeing the report. Just refresh the page a few times and it seems to fix itself. --Shirik 21:37, 22 May 2011 (UTC)
Thanks for handling this; it looks pretty good. A few comments:
  • more line breaks in SQL queries are always nice;
  • by default, ZWS doesn't specify a charset in the Content-Type header, so text files in a browser go to the default (ISO-8859-1) usually; you can fix this by looking at tswiki:ZWS#Content-Type;
  • in the query, you didn't specify page_namespace = 0 or page_is_redirect = 0; because you're using the join on categorylinks with cl_to = 'Living_people', this isn't a huge deal, but it's generally better to be explicit.
Risker: Let me know if this is what you're after; if not, I can take another look at this. --MZMcBride 23:04, 22 May 2011 (UTC)

Tool to seek out specified pages which have redlinks?

Over at English Wikisource, I am working on the project to transcribe the Dictionary of National Biography. We are doing many internal links to pair with the internal q.v. references. What I would like to do is to identify works that have redlinks upon them from either looking at a compiled list of works, or from a category listing. From there I want to go and audit and correct, update, etc. Does such a tool exist? billinghurst sDrewth 10:25, 22 May 2011 (UTC)

I think modifying yanker would be your best bet here. What I need from you is more detail. An example input page and some example results would be very helpful. That way I can write a module for yanker (select --> list --> red links from a page) and then have it output those links for you. You don't need to list every red link, but just a few so I know I'm headed in the right direction. --MZMcBride 23:12, 22 May 2011 (UTC)
Page s:Dictionary of National Biography, 1885-1900/Vol 53 Smith - Stanger and s:Category:DNB biographies are two living examples of lists, and a demonstration page would be s:Smith, Robert Percy (DNB00). So the bare bones would be to give a list of pages that have redlinks within them. I can think of a range of improvements upon that depending on ease. Making the list sortable based on the redlink, ie. finished volumes can have inbound links checked and corrected. For the specific project the links of specific interest are those that are prepended with (DNB00). Extending the idea I could see that for other works (most have subpages)that we may be interested to get a list of redlinks for a work, eg. for s:The Life of Captain Matthew Flinders, R.N. and its subpages we would be interested in what other works to which it points. billinghurst sDrewth 13:53, 23 May 2011 (UTC)
So I can see two modules here:
  • Category members containing red links
  • Subpages containing red links
In the first example, you'd input "DNB biographies" and then yanker would return s:Abbadie, Jacques (DNB00) (along with other category members containing red links).
In the second example, you'd input "The Life of Captain Matthew Flinders, R.N." and then yanker would return s:The Life of Captain Matthew Flinders, R.N./Chapter 1 (along with other subpages containing red links).
Does that sound about right? It helps to not overcomplicate matters. You eat an elephant one bite at a time. --MZMcBride 19:10, 23 May 2011 (UTC)

Comparing two users in a single SQL query

Hi

I am trying to create a tool similar to http://toolserver.org/~mzmcbride/cgi-bin/wikistalk.py tailored for Persian Wikipedia. I'd like to know if there is a way to fetch the list of mutual pages of two user accounts using MySQL only (that is, not by fetching the pages of one and looping the pages of the other one to run a separate query each time).

Please advise,

Huji

So, let's say you have two users, "MZMcBride" and "Brandon". For each user individually, you'd do queries like these:
SELECT DISTINCT
  page_id,
  page_namespace,
  page_title
FROM page
JOIN revision
ON rev_page = page_id
WHERE rev_user_text = 'MZMcBride';
SELECT DISTINCT
  page_id,
  page_namespace,
  page_title
FROM page
JOIN revision
ON rev_page = page_id
WHERE rev_user_text = 'Brandon';
If you want to do a JOIN of these two queries in a single query, you'd do something like this:
SELECT
  tbl1.page_namespace,
  tbl1.page_title
FROM (
SELECT DISTINCT
  page_id,
  page_namespace,
  page_title
FROM page
JOIN revision
ON rev_page = page_id
WHERE rev_user_text = 'MZMcBride'
) AS tbl1
JOIN (
SELECT DISTINCT
  page_id,
  page_namespace,
  page_title
FROM page
JOIN revision
ON rev_page = page_id
WHERE rev_user_text = 'Brandon'
) AS tbl2
ON tbl1.page_id = tbl2.page_id;
Simple enough for two users. If you want to do more than two users, it gets a bit more complicated, but not much so.... --MZMcBride 01:04, 23 May 2011 (UTC)
Thank you. It appears the query works efficiently only if you set an alias for the subqueries (tbl1 and tbl2 in your exmaple). Otherwise, MySQL 5 will run the subqueries on each iteration of the mother query which would make it totally inefficient. Huji 12:58, 23 May 2011 (UTC)
If I understand what you're asking, you may be interested in tools:~pietrodn/intersectContribs.php. Nemo 08:33, 29 May 2011 (UTC)

List of articles

Hi - following my query on Wikipedia Review and being directed here, I wanted to see if it was possible to have a list of all articles existing on a certain date, by name, together with size in words, and in bytes/kbytes, and (nice to have) some indication of what the article is about (e.g. main category). Let me know, and thanks for the kind offer of help. Peter Damian 10:11, 27 May 2011 (UTC)

I'll need a few more specifics, namely:
  • for which wiki are you trying to list articles?
  • by "articles" do you mean pages in a particular namespace?
  • by "articles" do you include redirects?
  • what exactly do you mean by "word count" (or rather, do you have a programmable way of determining what constitutes a word and what does not)?
  • do you have any programmable way of determining what the "main category" is?
  • what's your preferred output format?
Once you can provide a bit more specificity, it should be fairly easy to scan a database dump to match your criteria. --MZMcBride 14:51, 27 May 2011 (UTC)
OK if you query this you are getting back what I had in mind. Specifically, English Wikipedia only. Article space only. Include redirects. I don't know how the word count is obtained, but that query obtains it. I don't know a programmable way of getting the main category, but it is 'nice to have' only. Preferred output is csv file, or pipe separated or something like that. Thanks again Peter Damian 17:02, 27 May 2011 (UTC)

I think something like this will be fine

Select
p.page_id as "Page ID", 
p.page_title as "Page Title", 
p.page_is_redirect as "Redirect?",
p.page_len as "Length",
r.rev_id as "Revision ID"
from

wiki_page p 
inner join wiki_revision r 
on p.page_latest = r.rev_id 

WHERE page_namespace = 0
That looks about right. The reasons I was suggesting database dumps instead of a direct query (even though a direct query is much faster) are that it would be reproducible (dumps are static, the replicated databases are not) and it includes page text (which could be used for word count). Your query looks pretty much perfect, though, so I can just run that now. I'm not quite sure why you want revision ID, but that's fine. --MZMcBride 13:11, 28 May 2011 (UTC)
Done (8,586,925 rows): tools:~mzmcbride/damian-all-articles-2011-05-28.txt.gz. Even compressed it's about 150 MB. I compressed it to avoid your Web browser trying to download and load an enormous text file. --MZMcBride 17:48, 28 May 2011 (UTC)

For reference, the query I ran:

SELECT
  p.page_id AS "Page ID",
  p.page_title AS "Page Title",
  p.page_is_redirect AS "Redirect?",
  p.page_len AS "Length",
  r.rev_id AS "Revision ID"
FROM page AS p
JOIN revision AS r
ON p.page_latest = r.rev_id
WHERE page_namespace = 0;

--MZMcBride 17:50, 28 May 2011 (UTC)

Many thanks. I won't have time to download it today - also I need to think where to put it. That is a pretty enormous file :~ I wanted revision ID as a proxy for how old the page is, but only guessing. Peter Damian 06:17, 29 May 2011 (UTC)

Actually I did just download it (wasn't too long) and it turns out the Revision id is crucial. The non-roman letters did not translate too well into the text file. E.g. Červinjan turns into ÄŒervinjan, so the revision id is the only route to the file. Peter Damian 06:54, 29 May 2011 (UTC)
Uh, not to be rude, but it's you, not me. :-)
mzmcbride@gonzo:Desktop$ zgrep ervinjan damian-all-articles-2011-05-28.txt.gz 
24418905	Červinjan	1	35	315253977
The file is fine. It's encoded as UTF-8. I imagine the issue is that you're either using a shitty text editor that lacks UTF-8 support or the editor is set to the wrong encoding (looks like ISO-8859-1 from what you pasted). Either way, the file is fine, you just need to use a tool with proper UTF-8 support to read it. --MZMcBride 16:26, 29 May 2011 (UTC)
Very probably. I was using a Microsoft editor. I am working on a query to include categories and page views - may need your help again, but thanks for everything you have done so far :) Peter Damian 13:25, 1 June 2011 (UTC)
PS I changed the import setting to UTF-8 and IT WORKED FINE! Thanks Peter Damian 13:45, 1 June 2011 (UTC)

Views and categories

I wonder if I could trouble you to run another query - this is to find out the categories that each page belongs to, and to find the total page views of that page. I think the query below should work. Many thanks yet again. Peter Damian 14:14, 1 June 2011 (UTC)

SELECT
  p.page_id AS "Page ID",
  p.page_counter as "Total views",
  c.cl_to as "Category"

FROM page AS p
JOIN categorylinks AS c
on p.page_id = c.cl_from

WHERE page_namespace = 0;
page.page_counter is not incremented/updated on Wikimedia wikis (and hasn't been for pretty much forever). Page view stats are pushed out in bulk to <http://dammit.lt/wikistats/> and then people have (or one person in particular has) aggregated this data into a more usable form: <http://stats.grok.se>. If you want page view stats, you generally have to run individual queries for each page title and get the number of views on a per-month basis. (Per-year or "total" may be available as well, though the data set is far from complete.) More information about Henrik's stats tool is available at w:en:User:Killiondude/stats.
As for categories, your query looks about right. I think it's important to note that page ID is not always as reliable as some people assume it is. If a page is deleted and then restored (or re-created), it will have a different page ID. If a page is moved, it can affect page ID. Et cetera. Your query in particular will exclude all pages that have 0 categories (which is fairly rare on a large site, I suppose). It will also output multiple rows per page. You can account for 0 categories by doing a LEFT JOIN instead of a regular JOIN (this will print "NULL" where there were no categories, but will still print a row for every page). You can alter the multiple rows per page output by doing a GROUP BY with a GROUP_CONCAT if that's going to be an issue for you. With or without the GROUP BY, it will be an enormous output file.
You're slowly discovering the joys of database querying. :-) Think some of this over and then let me know your thoughts. --MZMcBride 16:40, 1 June 2011 (UTC)
OK Let's omit the 'page views'. And yes, it will be a large file, I appreciate that. If you recommend the left join, I will go with that, but can't imagine any uncategorised pages, and would they be very interesting anyway? Peter Damian 17:15, 1 June 2011 (UTC)
Because you're not specifying "page_is_redirect = 0", I imagine most of the entries with no categories will be redirects. Redirects and brand new articles. Do you want multiple rows per page or one row per page? Right now there is database normalization between the page and categorylinks tables. These can be "un-normalized" using GROUP BY. It just depends how you want the data. --MZMcBride 20:12, 1 June 2011 (UTC)
I'm a bit confused by the GROUP BY part - I thought this had to be used with an aggregate function, and the whole point of this query is not to aggregate. A page will have many categories and so of course I want all of those. The point is to understand how much of the encyclopedia is taken up by a given category. Peter Damian 07:35, 2 June 2011 (UTC)

A fair curiosity, I suppose. I'm saying it's a matter of how you want the data and what data you want. I think examples might help.

Extended content
Query 1 — A regular JOIN
SELECT
  page_id,
  page_title,
  cl_to
FROM page
JOIN categorylinks
ON cl_from = page_id
WHERE page_namespace = 0
AND page_title LIKE 'Michele_Bachman%';
+---------+------------------+----------------------------------------------------------------------+
| page_id | page_title       | cl_to                                                                |
+---------+------------------+----------------------------------------------------------------------+
| 2064489 | Michele_Bachmann | 1956_births                                                          |
| 2064489 | Michele_Bachmann | All_articles_with_dead_external_links                                |
| 2064489 | Michele_Bachmann | American_Lutherans                                                   |
| 2064489 | Michele_Bachmann | American_anti-communists                                             |
| 2064489 | Michele_Bachmann | American_evangelicals                                                |
| 2064489 | Michele_Bachmann | American_people_of_Norwegian_descent                                 |
| 2064489 | Michele_Bachmann | American_pro-life_activists                                          |
| 2064489 | Michele_Bachmann | Articles_with_dead_external_links_from_April_2011                    |
| 2064489 | Michele_Bachmann | Female_members_of_the_United_States_House_of_Representatives         |
| 2064489 | Michele_Bachmann | Foster_parents                                                       |
| 2064489 | Michele_Bachmann | Intelligent_design_advocates                                         |
| 2064489 | Michele_Bachmann | Living_people                                                        |
| 2064489 | Michele_Bachmann | Members_of_the_United_States_House_of_Representatives_from_Minnesota |
| 2064489 | Michele_Bachmann | Minnesota_Republicans                                                |
| 2064489 | Michele_Bachmann | Minnesota_State_Senators                                             |
| 2064489 | Michele_Bachmann | Minnesota_lawyers                                                    |
| 2064489 | Michele_Bachmann | Oral_Roberts_University_alumni                                       |
| 2064489 | Michele_Bachmann | People_from_Stillwater,_Minnesota                                    |
| 2064489 | Michele_Bachmann | People_from_Waterloo,_Iowa                                           |
| 2064489 | Michele_Bachmann | Tea_Party_movement                                                   |
| 2064489 | Michele_Bachmann | William_&_Mary_Law_School_alumni                                     |
| 2064489 | Michele_Bachmann | Winona_State_University_alumni                                       |
| 2064489 | Michele_Bachmann | Women_state_legislators_in_Minnesota                                 |
+---------+------------------+----------------------------------------------------------------------+
23 rows in set (0.07 sec)
Query 2 — A LEFT JOIN
SELECT
  page_id,
  page_title,
  cl_to
FROM page
LEFT JOIN categorylinks
ON cl_from = page_id
WHERE page_namespace = 0
AND page_title LIKE 'Michele_Bachman%';
+---------+------------------+----------------------------------------------------------------------+
| page_id | page_title       | cl_to                                                                |
+---------+------------------+----------------------------------------------------------------------+
| 7830969 | Michele_Bachman  | NULL                                                                 |
| 2064489 | Michele_Bachmann | 1956_births                                                          |
| 2064489 | Michele_Bachmann | All_articles_with_dead_external_links                                |
| 2064489 | Michele_Bachmann | American_Lutherans                                                   |
| 2064489 | Michele_Bachmann | American_anti-communists                                             |
| 2064489 | Michele_Bachmann | American_evangelicals                                                |
| 2064489 | Michele_Bachmann | American_people_of_Norwegian_descent                                 |
| 2064489 | Michele_Bachmann | American_pro-life_activists                                          |
| 2064489 | Michele_Bachmann | Articles_with_dead_external_links_from_April_2011                    |
| 2064489 | Michele_Bachmann | Female_members_of_the_United_States_House_of_Representatives         |
| 2064489 | Michele_Bachmann | Foster_parents                                                       |
| 2064489 | Michele_Bachmann | Intelligent_design_advocates                                         |
| 2064489 | Michele_Bachmann | Living_people                                                        |
| 2064489 | Michele_Bachmann | Members_of_the_United_States_House_of_Representatives_from_Minnesota |
| 2064489 | Michele_Bachmann | Minnesota_Republicans                                                |
| 2064489 | Michele_Bachmann | Minnesota_State_Senators                                             |
| 2064489 | Michele_Bachmann | Minnesota_lawyers                                                    |
| 2064489 | Michele_Bachmann | Oral_Roberts_University_alumni                                       |
| 2064489 | Michele_Bachmann | People_from_Stillwater,_Minnesota                                    |
| 2064489 | Michele_Bachmann | People_from_Waterloo,_Iowa                                           |
| 2064489 | Michele_Bachmann | Tea_Party_movement                                                   |
| 2064489 | Michele_Bachmann | William_&_Mary_Law_School_alumni                                     |
| 2064489 | Michele_Bachmann | Winona_State_University_alumni                                       |
| 2064489 | Michele_Bachmann | Women_state_legislators_in_Minnesota                                 |
+---------+------------------+----------------------------------------------------------------------+
24 rows in set (0.00 sec)
Query 3 — A regular JOIN with GROUP BY and COUNT(*)
SELECT
  page_id,
  page_title,
  COUNT(*)
FROM page
JOIN categorylinks
ON cl_from = page_id
WHERE page_namespace = 0
AND page_title LIKE 'Michele_Bachman%'
GROUP BY page_id;
+---------+------------------+----------+
| page_id | page_title       | COUNT(*) |
+---------+------------------+----------+
| 2064489 | Michele_Bachmann |       23 |
+---------+------------------+----------+
1 row in set (0.00 sec)
Query 4 — A LEFT JOIN with GROUP BY and GROUP_CONCAT
SELECT
  page_id,
  page_title,
  GROUP_CONCAT(cl_to SEPARATOR '|')
FROM page
LEFT JOIN categorylinks
ON cl_from = page_id
WHERE page_namespace = 0
AND page_title LIKE 'Michele_Bachman%'
GROUP BY page_id;
+---------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| page_id | page_title       | GROUP_CONCAT(cl_to SEPARATOR '|')                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+---------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2064489 | Michele_Bachmann | 1956_births|All_articles_with_dead_external_links|American_Lutherans|American_anti-communists|American_evangelicals|American_people_of_Norwegian_descent|American_pro-life_activists|Articles_with_dead_external_links_from_April_2011|Female_members_of_the_United_States_House_of_Representatives|Foster_parents|Intelligent_design_advocates|Living_people|Members_of_the_United_States_House_of_Representatives_from_Minnesota|Minnesota_Republicans|Minnesota_State_Senators|Minnesota_lawyers|Oral_Roberts_University_alumni|People_from_Stillwater,_Minnesota|People_from_Waterloo,_Iowa|Tea_Party_movement|William_&_Mary_Law_School_alumni|Winona_State_University_alumni|Women_state_legislators_in_Minnesota |
| 7830969 | Michele_Bachman  | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+---------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

As you can see, the LEFT JOIN in query 2 adds a row for w:en:Michele Bachman, as it's an uncategorized redirect. In queries 3 and 4, a GROUP BY function is used to list either category counts or page categorization. Depending on your needs, a LEFT JOIN might make more sense than a regular JOIN or GROUP_CONCAT might make more sense than COUNT(*). --MZMcBride 13:14, 2 June 2011 (UTC)


Thanks – I would prefer to go with a modification of the original one (below). Note this does not give page title, as not needed (I already have page titles mapped to page id’s per the original query). I am not sure of the value of the LEFT join. Thanks again!

SELECT
  p.page_id AS "Page ID",
  c.cl_to as "Category"

FROM page AS p
JOIN categorylinks AS c
on p.page_id = c.cl_from

WHERE page_namespace = 0;

Peter Damian 09:50, 3 June 2011 (UTC)

Sorry, forget to give a status update here. This query is too big for the Toolserver. The process got killed by the memory management daemon. Since you just want the whole categorylinks table, essentially, I'd recommend using a dump of the SQL table, for example: <http://dumps.wikimedia.org/enwiki/20110526/>. --MZMcBride 21:14, 5 June 2011 (UTC)
thanks, will give it a try.Peter Damian 14:41, 6 June 2011 (UTC)

One more try

That file is too large for my computer. One last try - you will see where I am coming from

SELECT
  cl_to,
  COUNT(cl_from)
FROM categorylinks
GROUP BY cl_to;

This should be much smaller and at least tells us which are the heavily used categories and which not. Peter Damian 18:54, 7 June 2011 (UTC)

A further question

Actually a further question now I am here. I recently installed a wiki [1] on my website. Slightly fiddly to install, given that many features (such as citations using ref) are add-ons, but no major difficulties. I never realised it would be so easy for a non-technical person such as myself to install (so well done guys). But a question. The wiki I have moved from (MyWikiBiz) has a very useful 'semantic' feature which means you can search on a combination of simple categories. I.e. instead of creating categories like 'Philosopher', 'Scholastic', and then 'Scholastic Philosopher', the semantic feature allows you to use just the first two, then create a query for Philosopher+Scholastic. This obviously makes a great deal of sense, particularly when there are more basic categories (what if you want to search on scholastic philosophers who were also theologians, or restract that group to just the thirteenth century). Just asking. Peter Damian 10:18, 27 May 2011 (UTC)

Actually I think I have answered my own question. I go to the Semantic MediaWiki site, and install it as an extension. Yes? Peter Damian 10:32, 27 May 2011 (UTC)
Yeah, it looks like you can download the extension from mw:Extension:Semantic MediaWiki or from semantic-mw:Help:Download. I don't know too much about the extension as it isn't used on Wikimedia wikis, but it has a fairly large userbase, so it shouldn't be very difficult to install and play around with. --MZMcBride 14:46, 27 May 2011 (UTC)
Indeed it's not that difficult, and here is a page to prove it. The beauty is that though the page looks like a long list, it's only a few lines of code. Instead of compiling lists, you just tag the articles you want and then query like an ordinary database. The downside is that it is fairly impenetrable to someone without a basic knowledge of how databases work, which is 90% of the population I would say. Maybe not. Peter Damian 16:58, 27 May 2011 (UTC)
Huh, yeah, that's pretty neat. I know that Wikimedia has been in some discussions about implementing something similar, but it's a huge project, particularly given that infoboxes/templates/etc. are already so widespread. Plus the extension has to be massively scaleable to ensure that the servers don't melt serving sites such as the English Wikipedia. :-) --MZMcBride 13:15, 28 May 2011 (UTC)
It's ideal for a (relatively) small project such as mine. The problem, as you suggest, is that every time someone draws on that list, they are running a query on the database behind. I think but not sure, that the list is stored statically so you have to force a refresh, so it's not quite so bad as that. But I can't see it working on Wikipedia, at least not yet. Thanks, once again, for the database query. Peter Damian 06:25, 29 May 2011 (UTC)

Discussion on account security

There is currently a discussion on account security here. Any developers or interested users should participate. fetchcomms 17:13, 3 June 2011 (UTC)

Help with checker tool

Does anybody know how can I use the checker tool on Portuguese Wikisource? I've tried to use it with the page s:pt:Galeria:Elementos de Arithmetica.djvu but it didn't work. Compare with the result corresponding to s:Index:A Book of Dartmoor.djvu. Helder 14:49, 8 June 2011 (UTC)

That seems like an MZM type question. I couldn't get it to shake anything loose. As there are a number of differences between how each of the wikis implemented the proofread system that it is either a name or namespace issue, or one of the host of quirks of difference. billinghurst sDrewth 12:15, 13 June 2011 (UTC)
I sort of fixed this for ptwikisource, but the whole system is pretty much broken (links are still broken in checker). I think some MediaWiki changes are needed to make this sane. The Wikisources are completely neglected.... --MZMcBride 06:31, 14 June 2011 (UTC)
Okay, I've now rewritten the tool to work around the horribleness of the Proofread Page extension. I filed bugzilla:29396 to track the issues I encountered. sDrewth and Helder: Let me know how this works now. --MZMcBride 17:21, 14 June 2011 (UTC)
Thank you very much! It seems to be working now. =D Helder 19:43, 14 June 2011 (UTC)

Needing help to get button functionality for the new Extension:WikiEditor toolbar

WMF has set the default toolbar for enWS to be the Extension:WikiEditor, and presumably the others, and it is problematic as the functionality of certain buttons is encoded in proofread.js. From my uneducated reading of the code, the functions behind those buttons cannot be called external from the script.

Aside from that the code of the editor does not show how to run an application and just shows how one to encapsulate text. m:Extension:WikiEditor/Toolbar_customization. Am I missing something? billinghurst sDrewth 14:29, 19 June 2011 (UTC)

Try something like this:
$( function() {
	$( '#wpTextbox1' ).wikiEditor( 'addToToolbar', {
		'section': 'main',
		'group': 'insert',
		'tools': {
			'my-test': {
				label: 'Just a test',
				type: 'button',
				icon: 'http://upload.wikimedia.org/wikipedia/commons/2/27/Vector_toolbar_signature_button.png',
				action: {
					type: 'callback',
					execute: function() {
						//Do something
						alert('Done!');
					}
				}
			}
		}
	} );
} );
Helder 15:38, 19 June 2011 (UTC)
The link you mentioned (mw: Extension:WikiEditor/Toolbar_customization) does include a brief example of how to use the API, however it's more advanced than the old toolbar. It has many new features that people have asked for over the years and have partially worked around with local scripts and hacks. As a result the API may have become more complicated for 'simple buttons'. However this can be solved by using a shortcut function (a utility function, if you will) that allows simple syntax. I have made an attempt at that with InsertWikiEditorButton (maybe it will be integrated into the extension one day), example:
krInsertWikiEditorButton({
	"id": "mw-toolbar-editbutton",
	"icon": mw.config.get('wgExtensionAssetsPath') + '/ProofreadPage/button_category_plus.png',
	"label": mw.msg( 'proofreadpage_toggleheaders' ),
	"callback": pr_toggle_visibility
});
For more info on the available options you can pass, see the script documentation here or check the source code! –Krinkletalk 06:58, 20 June 2011 (UTC)
I am not doubting that the information on the page is exact and covers the necessary technical detail, it is just not helpful for an iggorant savage like myself. I tried the new callback coding without fortune, for me it breaks the script and defaults to the original toolbar look. To note, that I wasn't looking to complain, just frustrated that I was unable to do it. I feel like Silvester to Tweetybird. ;-)billinghurst sDrewth 12:13, 20 June 2011 (UTC)
Managed to get Helder's code to display the icon, now to see if I can get it to function. billinghurst sDrewth 12:18, 20 June 2011 (UTC)
Great! Just remember to add "()" after the name of the function you want to be executed, i.e. use "add_ocr();". Helder 13:04, 20 June 2011 (UTC)

Commons' DBR

Hello. I was wondering if commons:COM:DBR could include a report similar to w:Wikipedia:Database reports/User preferences. I think given that Commons is multilingual and a project with users from different projects the stats would be interesting to see. tyvm. Killiondude 02:29, 25 June 2011 (UTC)

Hi. commons:Commons:Database reports/User preferences. --MZMcBride 06:26, 25 June 2011 (UTC)
Thanks!!! As an aside, I wonder if ownwork is supposed to be (real) English... Killiondude 07:10, 25 June 2011 (UTC)

I am not sure whether there is any interest in addressing the outstanding components in the Bugzilla to get Lilypond extension functional for the Wikisource sub-domains. billinghurst sDrewth 14:18, 25 June 2011 (UTC)

I think it would be really nice if someone could improve the current implementation of the extension so that it could be usable on Wikimedia projects.
One of the current issues with the current implementation of the extension is sintetized on Bug 29630 (Make mw:Extension:LilyPond safe against DoS attacks). Helder 18:10, 28 June 2011 (UTC)
Ting just posted on wikitech-l about it (no reply yet). Nemo 15:25, 18 October 2011 (UTC)
The new "Score" extension is under active development to fulfill this need. Sumanah 02:57, 31 January 2012 (UTC)

Help with needed display option

Is there anyway to add the functionality produced by

self.proofreadpage_numbers_inline = true;

... so that it can be toggled on and off along with/in addition to 'hide page links' in the display option menu found in the sidebar of en.wikisource under dynamic layouts? TIA. -- George Orwell III 15:18, 28 June 2011 (UTC)

Hi. I would like a link from Special:ListFiles to Special:Nuke that passes the person's username from the first to the latter. This is for use on Commons where we get a lot of contributors who only upload crap. If someone could help, that'd be great!

Example: commons:Special:ListFiles/Killiondude would have a link that leads me directly to commons:Special:Nuke/Killiondude. Killiondude 04:37, 17 August 2011 (UTC)

$ yes poke. Killiondude 23:54, 23 August 2011 (UTC)
Where do you want the link? I don't see a great spot for it. I guess it could go near the top of the page... or it could be a tab... or it could go in the sidebar. What's your preference?
I didn't realize that the "Nuke" extension already supports "Special:Nuke/<username>" syntax, so this should be fairly trivial to resolve, as it can be done purely and simply with jQuery. --MZMcBride 03:53, 24 August 2011 (UTC)
Ooh. A tab at the top would probably be nice. Because it's a special page there are no other tabs, so it would work nicely. And would whatever magic you employ be an opt-in js thing? Killiondude 05:20, 24 August 2011 (UTC)
I worked on this at testwiki:User:MZMcBride/killionboob.js. It kind of works currently, but only with Monobook. Something strange about Vector... I asked Krinkle about it. --MZMcBride 03:20, 25 August 2011 (UTC)
I only use Monobook. I tried it out on a case where I would have needed it and it worked indeed. Thanks!!!!! Killiondude 06:24, 25 August 2011 (UTC)
Works fine in Vector for me on testwiki. See here. What browser / os are you using and what is the url to the page where it's not showing up ? –Krinkletalk 11:49, 25 August 2011 (UTC)
I put the code at testwiki:User:MZMcBride/killionboob.js (imported into /monobook.js). I have a "*" tab in Monobook, but in Vector I only see protect/delete/move in the drop-down menu (using a URL like http://test.wikipedia.org/wiki/Easytimelinetest?useskin=vector). Chrome/OS X. --MZMcBride 19:34, 25 August 2011 (UTC)
Again, working fine for me. But ahm, guess what, the issue isn't in the killionboob.js script. monobook.js is only loaded in Monobook  . Do the same in vector.js (or use common.js instead)  . –Krinkletalk 21:14, 25 August 2011 (UTC)
Yeah, I'm a moron. Thanks! :-) --MZMcBride 23:38, 25 August 2011 (UTC)

The mwbot log search is broken

The mwbot log search is broken. All searches return "You searched for an invalid regex, try again!" Emufarmers 15:29, 24 August 2011 (UTC)

Fixed. --MZMcBride 04:02, 25 August 2011 (UTC)
Thanks! Emufarmers 04:16, 25 August 2011 (UTC)

How can I make list numbering continue after inserting a table?

I am writing a test procedure so the steps are numbered. On several of the steps, I have inserted a table of several similar tests to be performed and a blank cell in each row for a check when it is done. When I try to continue my numbered list, there seems to be no way I can make the numbering automatically continue:

  1. Perform continuity checks on the following items to verify correct polarity:
    1. 24VDC supply
DC Supply Location Test Point Actuate Check Location Test Point Actuate Check Notes
24VDC Left SP K91:Vin+ Close F52 Right SP K91:Vin+ Close F52
24VDC Left SP A91:VIN Close F52 Right SP A91:VIN Close F52
24VDC Open all breakers, F52, F53, F54, F55


    1. 12VDC supply:

Has anyone found a trick to make this work? The preceding unsigned comment was added by 169.231.253.50 (talk • contribs) .

Use HTML. Wikimarkup is limited in this respect.

  1. Perform continuity checks on the following items to verify correct polarity:
    1. 24VDC supply
      DC Supply Location Test Point Actuate Check Location Test Point Actuate Check Notes
      24VDC Left SP K91:Vin+ Close F52 Right SP K91:Vin+ Close F52
      24VDC Left SP A91:VIN Close F52 Right SP A91:VIN Close F52
      24VDC Open all breakers, F52, F53, F54, F55
    2. 12VDC supply

Hope that helps. --MZMcBride 22:48, 9 September 2011 (UTC)

Wikipedia.org (all languages) Full-Text Search

Does anybody know what specific search engine(s) the wikipedia.org sites use to search the full text of article titles and content? There is a list of many possible search engines that can be used with the wikimedia platform on this MediaWiki page, but I don't see any discussion of what search tools are actually used on wikipedia.org (for various languages). It seems like they might be using MySQL fulltext indexes, but I don't believe those work for some foreign languages. I am working on a linguistic project that involves querying dumps of wikipedia databases from multiple languages, and I'm curious what kinds of search indexes are in use on the live sites.

Well, wikitech:Search is probably the best page that explains Wikimedia's search backend from a technical perspective currently. At one point I know that Wikimedia was using MyISAM instead of InnoDB for search servers, specifically because InnoDB doesn't support full-text searching, but I don't think Wikimedia uses MySQL searching at all at this point. There might be more useful information in the archives of wikitech-l. (Or you could post there with more specific questions.) Hope that helps. --MZMcBride 12:20, 10 September 2011 (UTC)

Commons gadget spawning error

Recorded at Commons:MediaWiki talk:Gadget-Cat-a-lot.js#Error message billinghurst sDrewth 11:56, 10 October 2011 (UTC)

Fixed by Krinkle. --MZMcBride 19:15, 16 October 2011 (UTC)

Feedback on toolserver move idea

Please see http://meta.wikimedia.org/wiki/Wikimedia_Forum#Move_of_toolserver.org_to_toolserver.wikimedia.org Petrb 15:13, 21 October 2011 (UTC)