Wikisource Wishlists

2015

Allow Copy of Pages

As a Wikibooks user, I would like to see the feature "copy this page" (source, target) next to the move-feature. This would be great for normal user extending a book without touching the original one.

Example: Lets say you would like to improve a book about the programming language "Java". The book describes version 6 and you would like to start a new book on version 8. If it is worth having both books, than I see no way to preserve the original book and edit the text with respect to the original editors.

Example: Lets say two contributors have different but strong opinions about the future of a book. This typically leads to conflicts, where some user leave the community. The copy-the-page feature would be a good technical solution to settle the conflict. -- Qwertz84 (talk) 23:17, 9 November 2015 (UTC)[reply]

Earlier discussion and endorsements
@Qwertz84 and Ernest-Mtl: There is an Extension to duplicate pages (with their edit history): mw:Extension:Duplicator. It works fine. -- Reise Reise (talk) 17:21, 14 November 2015 (UTC)[reply]

Votes

  1. Oppose Oppose per Wikipedia:Content forking. Also looks like a duplicate of 2015 Community Wishlist Survey/Miscellaneous#Support for version branching for pages. MER-C (talk) 10:46, 1 December 2015 (UTC)[reply]
  2. Support Support Would ease the work to be done when we have to work on different editions of a same book. --Ernest-Mtl (talk) 14:53, 1 December 2015 (UTC)[reply]
  3. Support Support, this sounds like a good way to deal with s:Annotations#Clean texts Beleg Tâl (talk) 15:16, 1 December 2015 (UTC)[reply]
  4. Support Support. We sometimes have different versions of the same text on he.wikisource (I assume similar usefulness on other languages as well), and a copy function will definitely be useful in duplicating existing text to edit each page according to specific edition.--Nahum (talk) 19:27, 1 December 2015 (UTC)[reply]
    Nahum, could you explain an example on he.wikisource? John Vandenberg (talk) 08:34, 2 December 2015 (UTC)[reply]
    Nahum may have a different example in mind, but an obvious one to me is the text of the Hebrew Bible ("Old Testament"), which is available on HEWS with and without diacritics, and with and without cantillation marks. Ijon (talk) 09:54, 14 December 2015 (UTC)[reply]
  5. Support Support --Usien6 (talk) 21:12, 1 December 2015 (UTC) // For Wikibooks only !![reply]
  6. Support Support--Manlleus (talk) 15:57, 2 December 2015 (UTC)[reply]
  7. Support Support --Le ciel est par dessus le toit (talk) 10:38, 3 December 2015 (UTC)[reply]
  8. Support Support --Kasyap (talk) 15:40, 7 December 2015 (UTC)[reply]
  9. Support Support ----Nrgullapalli (talk) 09:51, 8 December 2015 (UTC)[reply]

Better support for djvu files

Djvu files are a very interesting open format for full book digitalization, but mediawiki uses them only as "proofreading tools". On the contrary, they could be an interesting output of wikisource work, working about thoroughly editing of text layer and fully using their metadata. Even when they are used simply as "proofreading tools", much work could be done using details of text layer mapping, since it contains interesting suggestions about formatting (text alignment and indentation, text size, paragraphs, blocks...) presently not used at all.

Here a list of ideas:

  1. to shift to indirect mode of djvu structure (so allowing a faster access to individual pages with a djvu reader extension of browsers);
  2. to add a set of API requests, as an interface to all read-only DjvuLibre routines;
  3. to add some API too for editing text layer by djvuxmlparser;
  4. to allow minor changes of djvu files (i.e. editing some words into text layer) without the need of re-uploading the whole djvu file (the history of text edits could be saved with something like reviews history).

--Alex brollo (talk) 14:07, 10 November 2015 (UTC)[reply]

Earlier discussion and endorsements
Endorsed Endorsed really really useful in wikisource context. --AlessioMela (talk) 14:12, 10 November 2015 (UTC)[reply]

Votes

  1. Support Support Goldzahn (talk) 12:53, 30 November 2015 (UTC)[reply]
  2. Support Support--Alexmar983 (talk) 16:42, 30 November 2015 (UTC)[reply]
  3. Support Support --YodinT 17:34, 30 November 2015 (UTC)[reply]
  4. Support Support John Vandenberg (talk) 01:29, 1 December 2015 (UTC)[reply]
  5. Support Support Risker (talk) 04:15, 1 December 2015 (UTC)[reply]
  6. Support Support--Kippelboy (talk) 05:41, 1 December 2015 (UTC)[reply]
  7.  Support Support--Candalua (talk) 08:57, 1 December 2015 (UTC)[reply]
  8. Support Support --Accurimbono (talk) 09:05, 1 December 2015 (UTC)[reply]
  9. Support Support--Shizhao (talk) 09:27, 1 December 2015 (UTC)[reply]
  10.  Support Support --Anika (talk) 09:35, 1 December 2015 (UTC)[reply]
  11. Support Support --Xavier121 (talk) 09:42, 1 December 2015 (UTC)[reply]
  12. Support Support--C.R. (talk) 12:02, 1 December 2015 (UTC)[reply]
  13. Support Support--Jayantanth (talk) 14:33, 1 December 2015 (UTC)[reply]
  14. Support Support--David Saroyan (talk) 14:46, 1 December 2015 (UTC)[reply]
  15. Support Support --Arnd (talk) 15:00, 1 December 2015 (UTC)[reply]
  16. Support Support--KRLS (talk) 15:07, 1 December 2015 (UTC)[reply]
  17. Support Support Anubhab91 (talk) 15:54, 1 December 2015 (UTC)[reply]
  18. Support Support--Arxivist (talk) 20:08, 1 December 2015 (UTC)[reply]
  19. Support SupportGeorge Orwell III (talk) 23:27, 1 December 2015 (UTC)[reply]
  20. Support Support--Yodaspirine (talk) 12:59, 2 December 2015 (UTC)[reply]
  21. Support SupportNickK (talk) 16:04, 2 December 2015 (UTC)[reply]
  22. Support Support --AlessioMela (talk) 20:06, 2 December 2015 (UTC)[reply]
  23. Support Support --Pymouss Tchatcher - 20:13, 4 December 2015 (UTC)[reply]
  24. Support Support - Bcharles (talk) 22:16, 8 December 2015 (UTC)[reply]
  25. Support Support --Davidpar (talk) 14:20, 14 December 2015 (UTC)[reply]

To implement a Internet Archive-like digitalization service

Just as many other wikisource users I appreciate a lot Internet Archive digitalization service, and I use it as deeply as I can (djvu files being only one from many uses of the rich file set that can be downloaded: collection of high-resolution jp2 images and abbyy xml being really extremely interesting).

I'd like that mediawiki should implement a similar digitalizing environment, but with a wiki approach and a wikisource-oriented philosophy, to share the best possible applications to pre-OCR jobs of book page images (splitting, rotating, cropping, dewrapping... in brief, "scantailoring" images), saving excellent lossless images from pre-OCR work; then the best possible OCR should be done, with ABBYY OCR engine or similar software if any, saving both text and full-detail OCR xml; then excellent images and best possible OCR text should be used to produce excellent seachable pdf and djvu files; finally - and this step would be really "wiki" - embedded text should be fixed by usual user revision work done into wikisource.

This is a bold dream; a less bold idea is, to get full access to best, heavy IA files (jp2.zip and abbyy xml) and to build tools for use them as thoroughly as possible. --Alex brollo (talk) 07:08, 11 November 2015 (UTC)[reply]

Earlier discussion and endorsements
Endorsed Endorsed. Jayantanth (talk) 17:42, 11 November 2015 (UTC)[reply]
Endorsed Endorsed --Yann (talk) 14:06, 17 November 2015 (UTC)[reply]

Votes

  1. Support Support --YodinT 17:35, 30 November 2015 (UTC)[reply]
  2. Support Support--Accurimbono (talk) 09:07, 1 December 2015 (UTC)[reply]
  3. Oppose Oppose since I believe it is out of scope of the current process, there are solutions for this outside of Wikisource, although I acknowledge this is something useful. Alleycat80 (talk) 09:08, 1 December 2015 (UTC)[reply]
    Comment Comment A surprising statement. Are you a busy user of wikisource proofreading? --Alex brollo (talk) 17:30, 1 December 2015 (UTC)[reply]
  4. Support Support--Shizhao (talk) 09:28, 1 December 2015 (UTC)[reply]
  5. Support Support --Xavier121 (talk) 09:42, 1 December 2015 (UTC)[reply]
  6. Support Support --Jayantanth (talk) 14:52, 1 December 2015 (UTC)[reply]
  7. Support Support --Natkeeran (talk) 14:54, 1 December 2015 (UTC)[reply]
  8. Support Support--KRLS (talk) 15:07, 1 December 2015 (UTC)[reply]
  9. Support Support --Artem.komisarenko (talk) 19:28, 1 December 2015 (UTC)[reply]
  10. Support Support--Barcelona (talk) 12:08, 2 December 2015 (UTC)[reply]
  11. Support Support--Manlleus (talk) 15:57, 2 December 2015 (UTC)[reply]
  12. Support SupportNickK (talk) 16:03, 2 December 2015 (UTC)[reply]
  13. Support Support --AlessioMela (talk) 20:08, 2 December 2015 (UTC)[reply]
  14. Support Support--Alexmar983 (talk) 23:23, 2 December 2015 (UTC)[reply]
  15. Support Support - Wieralee (talk) 17:13, 4 December 2015 (UTC)[reply]
  16. Support Support Lionel Scheepmans Contact French native speaker, désolé pour ma dysorthographie 23:09, 4 December 2015 (UTC)[reply]
  17. Support Support --Yeza (talk) 10:45, 7 December 2015 (UTC)[reply]
  18. Support Support --Kasyap (talk) 15:40, 7 December 2015 (UTC)[reply]
  19. Support Support --Davidpar (talk) 14:20, 14 December 2015 (UTC)[reply]
  20. Support Support --Rahmanuddin (talk) 15:11, 14 December 2015 (UTC)[reply]

Tool to upload from Panjab Digital Library

Panjab Digital Library has 1791 manuscripts and 8996 books on their website. All the manuscripts are in public domain and many books are also in public domain. Most of the manuscripts and books are in Punjabi language but some of them are in English, Hindi and Persian as well. They have digitized everything in form of images and they are not searchable. They have uploaded images in such a form that it is quite difficult to download them. I think a tool should be created to download all the manuscripts and books which are in Public domain. This will help in developing Punjabi Wikisource as well as Punjab related content on other Wikisources. This will again help in improving other projects as well. --Satdeep Gill (talk) 07:31, 13 November 2015 (UTC)[reply]

Earlier discussion and endorsements
Endorsed Endorsed. We definitely need these sources to be made available in multiple formats and multiple sources as government sites always go missing all of a sudden. An another reason to upload these sources to wikimedia project is to get away from technical difficulties in finding and researching further on the material through collaboration. Omshivaprakash (talk) 10:34, 14 November 2015 (UTC)[reply]
Endorsed Endorsed the idea of fixing the BUB or DLI Downloader. --Subhashish Panigrahi (talk) 12:31, 14 November 2015 (UTC)[reply]
Endorsed Endorsed--Charan Gill (talk) 15:14, 14 November 2015 (UTC)[reply]
Endorsed Endorsed--Hundalsu(talk)
Endorsed Endorsed--Dineshkumar Ponnusamy (talk) 08:43, 17 November 2015 (UTC) 08:42, 17 November 2015 (UTC)[reply]

Votes

  1. Oppose Oppose One off task that does not result in long-lasting improvement to editor productivity, impact is limited to a small number of wikis. MER-C (talk) 10:03, 30 November 2015 (UTC)[reply]
  2. Oppose Oppose per MER-C. We have bot tools to do one-off tasks like this. It may not be easy to find someone able and willing to do it. Sounds like a good Hackathon project. John Vandenberg (talk) 01:33, 1 December 2015 (UTC)[reply]
  3. Neutral Neutral, could be implemented at https://tools.wmflabs.org/bub/index . Jayantanth (talk) 14:45, 1 December 2015 (UTC)[reply]
    BUB is collapsed since August.--KRLS (talk) 15:07, 1 December 2015 (UTC)[reply]
    Comment Comment: By the way, BUB SHALL be fixed ASAP. It is a tool very useful for all Wikisources. --Accurimbono (talk) 08:17, 2 December 2015 (UTC)[reply]
  4. Comment Comment: this is a perriennial proposal at grants; we need to get a Swartz interested enough to migrate texts to internet archive. good global south project. Slowking4 (talk) 02:41, 3 December 2015 (UTC)[reply]


Tool to use Google OCRs in Indic language Wikisource

Tracked in Phabricator:
Task T120788

For a long time Indic languages Wikisource projects depended totally on manual proofreading, which not only wasted a lot of time, but also a lot of energy. Recently Google has released OCR software for more than 20 Indic languages. This software is far far better and accurate than the previous OCRs. But it has many limitations. Uploading the same large file two times (one time for Google OCR and another at Commons) is not an easy solution for most of the contributors, as Internet connection is way slow in India. What I suggest is to develop a tool which can feed the uploaded pdf or djvu files of Commons directly to Google OCRs, so that uploading them 2 times can be avoided. -- Bodhisattwa (talk) 13:50, 10 November 2015 (UTC)[reply]

Earlier discussion and endorsements

Votes

  1. Support Support --Tobias1984 (talk) 11:35, 30 November 2015 (UTC)[reply]
  2. Comment Comment This sounds like it relies on an OCR service hosted by Google, similar to Yandex (see SaaSS). Which Google service is this? Has the legality of using the service been checked? John Vandenberg (talk) 01:37, 1 December 2015 (UTC)[reply]
  3. Oppose Oppose. Yeah, this is SaaSS. Oppose per https://www.gnu.org/philosophy/who-does-that-server-really-serve.html. MER-C (talk) 08:43, 1 December 2015 (UTC)[reply]
    Comment Comment - There is no such things like free and open source OCR softwares present for Indic languages which is as accurate as Google OCR. There are people, who have tried to build such free OCRs, but no such real luck till now, and we fear, not in near future. Even WMF is not ready to develop free OCRs due to lack of expertise and infrastructure, as stated at the Wikisource Conference 2015 in Vienna recently, even it was acknowledged at the conference that this is one of the highest priority need for Wikisource community. Google OCR is the only successful OCR available for us, so we just cannot ignore it as it is SaaSS. Ravi has explained in detail below. -- Bodhisattwa (talk) 19:54, 1 December 2015 (UTC)[reply]
  4. Support Support --Satdeep Gill (talk) 14:25, 1 December 2015 (UTC)[reply]
  5. Support Support Most needed as of now for Indic Wikisource. We are suffering.Jayantanth (talk) 14:30, 1 December 2015 (UTC)[reply]
  6. Support Support This would come handy for many Wikisourcers. --Subhashish Panigrahi (talk) 14:43, 1 December 2015 (UTC)[reply]
  7. Conditional support: We need to check with legal first as the OCR service is hosted by a for-profit organisation, if the legal team give the green signal than support otherwise oppose. ~ Nahid Talk 14:44, 1 December 2015 (UTC)[reply]
    Comment Comment In Commons, there is a Javascript gadget which helps to check whether any uploaded image is present in other website or not using Google Images. The gadget is listed in the preference section of Commons. I dont think, legal team has any problem on that part. The gadget is utilizing the same for-profit organization. Besides, we are talking about semi-automation here, just like the said gadget, nothing more, just trying to make our lives easier. -- Bodhisattwa (talk) 19:25, 1 December 2015 (UTC)[reply]
  8. Support Support, very needed and Comment Comment for many years, Wikisources are using Tesseract, a free (Apache licenseed) OCR now sponsored by Google, are we talking about Tesseract or a similar software ? Cdlt, VIGNERON * discut. 14:47, 1 December 2015 (UTC)[reply]
    Comment Comment This is Google OCR in their Google Drive. Not a FOSS software. Can only be used via Google Drive or Services. Not the best situation, but the best practical solution for now. Should be viewed as a transition solution until FOSS OCR solutions become effective. --Natkeeran (talk) 15:00, 1 December 2015 (UTC)[reply]
    Thank you for this already but could you give more info? For instance: does this software have a name? a license ? what are the « more than 20 Indic languages » ? and did they all have a Wikisource? (no need to answer all my question but I'm curious). Cdlt, VIGNERON * discut. 15:30, 1 December 2015 (UTC)[reply]
    This is the Google OCR we are talking about. And these are the Wikimedia projects running in 22 Indian languages. As you can see, all of them don't have Wikisource projects, but almost every major old languages is running it and few are also in multilingual Wikisource. But this proposal will not only help Indic language contributors, but also others who face the same problem like us. At the bottom of this link, there is a list of Google OCR supported languages and its plenty. -- Bodhisattwa (talk) 19:31, 1 December 2015 (UTC)[reply]
    Thank you for explaining this service is part of w:Google Drive. Their terms of service do not allow accessing "using a method other than the interface and the instructions that we provide." There is an official API however it does not allow upload by URL (only direct POST/PUT from the client), so I expect it is against their TOS to integrate Google Drive into any process which automatically transmits a document from Commons to Google Drive, as it is Wikimedia Foundation doing the upload instead of the end user. It may be more legal to upload to Google Drive first, mark it as public and then Wikimedia Commons imports that document with OCR from Google Drive into Commons. John Vandenberg (talk) 07:47, 2 December 2015 (UTC)[reply]
  9. Support Support - This is a common task in many Indian Languages, including Tamil. We are looking for similar tool to upload scanned images via Google OCR into WikiSource. --Natkeeran (talk) 14:55, 1 December 2015 (UTC)[reply]
  10. Support Support --KRLS (talk) 15:07, 1 December 2015 (UTC)[reply]
  11. Strong support If we can do it , it would be excellent. -- TitoDutta 15:12, 1 December 2015 (UTC)[reply]
  12. Neutral Neutral Humbly acceptable, but only as a temporary brief solution, while waiting - and actively working - for a free opensource software into an excellent "wikisource OCR service" . --Alex brollo (talk) 17:37, 1 December 2015 (UTC)[reply]
    Comment Comment I agree that, building an accurate free open-source OCR is the only permanent solution. As discussed at Wikisource Conference 2015 in Vienna recently with WMF staff (community tech+ language engineering), it became clear that WMF is not interested in development of OCR software due to lack of infrastructure and expertise in this field. Besides, other FOSS based available OCRs are far from accurate and practically speaking, they are useless. So, integrating Google OCR is the only practical alternative available to us, which is not only accurate but saves a lot of time and effort. -- Bodhisattwa (talk) 19:25, 1 December 2015 (UTC)[reply]
    Wikisource Community User Group/Wikisource Conference 2015/Participants doesn't list anyone from mw:Wikimedia Language engineering. Who did you talk with? Nemo 14:28, 3 December 2015 (UTC)[reply]
    We had long discussion with Frances Hocutt, Software Engineer, WMF about this matter. We also had a Skype session with Amir Aharoni, Software Engineer, Language Engineering team, WMF. Thanks - Bodhisattwa (talk) 14:43, 5 December 2015 (UTC)[reply]
  13. Support Support Anubhab91 (talk) 15:53, 1 December 2015 (UTC)[reply]
  14. Support Support There are Wikimedia projects running in 22 Indian languages. While many of the Wikipedias in these languages have a slow growth owing to our socio economic and political conditions, many Indian languages have a rich sources of books and classics available in public domain dating 1000s of years. Unlike the western world or global north, we do not have Guternberg like projects with an army of volunteers to proofread and transcribe. Hardly 1 in 20 Wikipedians contribute to WikiSource projects and only 1 in a 10 million population become Wikipedians. Stats of the leading Indian language WikiSource projects for Malayalam, Tamil, Telugu and Bengali can be checked. And for the sake of informing the global community better, this is what we mean by Google OCR and here is what we need: Check this example page in Tamil Wikisource where we have proof read extension installed. Now, Google OCR will help us transcribe this page and make our job easy in proof reading. But, we need to upload images page by page to Google OCR. We can't upload more than 10 pages at a time. And then, we are again limited by the storage capacity of one's Google Drive. Our Wikipedian T. Shrinivasan came up with this python script to automate this process. But not everyone are tech savvy enough to run this script. What we need is an OCR solution that is as easy as the proofread extension itself or one that integrates with it. Even 3rd party bookmarklet that interacts with Google should be enough. We have seen enough FOSS based and other industry grade OCR solutions that won't even come near Google OCR's output for the next decade simply because they cannot match Google's resources or approach to solve this issue. It is not a question of whether WMF should do this or if it is within its operating principles of free software. In the past, WMF has redefined global norms if it believes it is in the best interest to serve our mission. This is a matter of immense impact and the question should be how the community can be helped. After all, the output will be available as free content in Wikimedia projects and will also be of great use to add references to Wikipedia. If it is needed, the WMF should talk to Google to get an API or a special agreement that supports WikiSource as even Google only stands to benefit from more content being added to the web. --Ravi (talk) 16:01, 1 December 2015 (UTC)[reply]
  15. Oppose Oppose - No google in to wikipedia pls Singhalawap (talk) 17:01, 1 December 2015 (UTC)[reply]
    Comment Comment Can you please elaborate your reason of opposition? As explained above, we are not incorporating Google into Wikipedia!!! For your information, we are using Google OCR practically in all Indic language Wikisource projects everyday. We just want to make the task semi-automated, that's all. -- Bodhisattwa (talk) 18:35, 1 December 2015 (UTC)[reply]
  16. Support Support This will be an investment for the future, especially if it is a free and open source OCR. I see that Tesseract (FOSS OCR software that is developed and sponsored by Google) supports some Indian languages, maybe extend it to WikiSource and improve it? Kenrick95 (talk) 01:48, 2 December 2015 (UTC)[reply]
  17. Support Support --Sayant Mahato (talk) 04:20, 2 December 2015 (UTC)[reply]
  18. Support Support This is very much needed. - Shubha (talk) 04:26, 2 December 2015 (UTC)[reply]
  19. Support Support It's a shame there's no FOSS option, but this sounds like a pretty good way to go for the time being. — Sam Wilson ( TalkContribs ) … 08:53, 2 December 2015 (UTC)[reply]
  20. Support Support But, I also think WMF should try and negotiate a freer fair use agreement with Google (if there are any restrictions). We should also invest in developing an open OCR software for Indian languages after sufficient amounts of training data is available in Wiki projects. -- Sundar (talk) 08:58, 2 December 2015 (UTC)[reply]
  21. Support Support This would not affect non-Indic Wikisources, but it would have an huge impact on Indic ones: I think "language equity" is an important goal for Wikimedia projects, so this is definitely something to do. Aubrey (talk) 09:03, 2 December 2015 (UTC)[reply]
  22. Support Support - I would suggest, if any auto spell-check is available same to be included with option to replace it with suggested words. This would save time of typing the correct word and work of proof reading can get more speed. --Sushant savla (talk) 09:22, 2 December 2015 (UTC)[reply]
  23. Support Support- Reasons are well explained by Ravi (see above)-Nan (talk) 10:38, 2 December 2015 (UTC)[reply]
  24. Strong Support Support --Mathanaharan (talk) 10:41, 2 December 2015 (UTC)[reply]
  25. Support Support-In order to protect the anonymity of contributors, a solution through API aggrement between WMF and Google would be better. --Arjunaraoc (talk) 10:55, 2 December 2015 (UTC)[reply]
  26. Support Support--Balurbala (talk) 12:19, 2 December 2015 (UTC)[reply]
  27. Support Support This would come handy for many Wikisourcers--Kurumban (talk) 13:07, 2 December 2015 (UTC)[reply]
  28. Support Support I am in an openion this will help Tamil (my mother tongue) and other Indic Languages --உமாபதி (talk) 13:20, 2 December 2015 (UTC)[reply]
  29. Support Support --Sivakosaran (talk) 15:14, 2 December 2015 (UTC)[reply]
  30. Support Support--Parvathisri (talk) 17:45, 2 December 2015 (UTC)[reply]
  31. Oppose Oppose. I do understand frustration of the Indian community that the only good OCR tool is a non-free tool. I am not sure however that we can do anything here unless Google is generous enough to release the source code of their OCR under a free license. In the end this request depends on Google and not on Wikimedia Foundation, as WMF can do nothing without a number of actions by Google, thus it is not a task for Community Tech — NickK (talk) 16:09, 2 December 2015 (UTC)[reply]
  32. Support Support-this is a game changer where wikisource can go where Gutenberg does not. language support in global south. Slowking4 (talk) 02:38, 3 December 2015 (UTC)[reply]
  33. Support Support --Vikassy (talk) 16:51, 3 December 2015 (UTC)[reply]
  34. Support Support -- it will verymuch useful to Indic Languages.--சஞ்சீவி சிவகுமார் (talk) 08:07, 4 December 2015 (UTC)[reply]
  35. Support Support --Pymouss Tchatcher - 20:15, 4 December 2015 (UTC)[reply]
  36. Strong Support Support --ViswaPrabhaവിശ്വപ്രഭtalk 23:51, 4 December 2015 (UTC)[reply]
  37. Oppose Oppose Google Drive OCR Feature is not even a product with API availability . It is a proprietory product feature within Google drive. I dont know what you mean by Integrating a non existent product to wikisource. The OCR's google open sourced are Tesseract and ocropus . Tesseract is already integrated with Wikisource. I believe the proprietiry feature in Google Drive is more an optimization of these engines. Why dont Wikimedia invest on Community Tech for optimizations and improvments for tesseract and ocropus . The issue with indian languages is absense of financial support for people working on these domains. If wikimedia can address it , we can easily beat it . Integrating non existent proprietory service is always a burden and does not help in solving the OCR problem in long run. -- AniVar (talk) 08:48, 5 December 2015 (UTC)[reply]
  38. Support Support Very Much useful. -தமிழ்க்குரிசில் (talk) 13:06, 5 December 2015 (UTC)[reply]
  39. Support Support -- நி.மாதவன்  ( பேச்சு  )
  40. Support Support --Kasyap (talk) 15:39, 7 December 2015 (UTC)[reply]
  41. Support Support This will help decrease many man-hours Yohannvt (talk) 07:23, 8 December 2015 (UTC)[reply]
  42. Strong support Strong support This is obvious idea that any wiki librarian working on indic language wikisources would get. This will be handy, helpful and boast to indic-wikimedians. So, I support this. --Pavan santhosh.s (talk) 07:59, 8 December 2015 (UTC)[reply]
  43. Support Support -- Mayooranathan (talk) 18:39, 8 December 2015 (UTC)[reply]
  44. Support Support As the Indian language communities have no other good option so it should be implemented. --Jnanaranjan sahu (talk) 18:54, 8 December 2015 (UTC)[reply]
  45. Support Support - If an API solution cannot be negotiated with Google, then something along the lines of user:John_Vandenberg's suggestion under point 8 above. Bcharles (talk) 23:03, 8 December 2015 (UTC)[reply]


Visual Editor adapted for Wikisource

Currently, Wikisource is using the old but reliable text editor. This requires all Wikisource contributors to know lots of templates that are different from one Wikisource language to another. Having a special version of the Visual Editor, adapted for the Wikisource needs, would facilitate inter-language help on Wikisource and bring ease to new contributors on the Wikisource projects. By having selected buttons on this adapted Visual Editor for titles, centering, left or right margins text, tracing lines, etc, would be easy to learn, especially if those are derived from a word processor general look and contribute to bring people on different language Wikisource...

Placing a title in French, in English, in Spanish or Croatian would now be the same thing : selecting the text and pressing a button... not use a different named-template depending on which Wikisource you are. Many people could help proofread pages in different languages, for example, with a global project of the week... Myself, being a french-speaking canadian, yes, I could proofread in French, English, Russian, Ukrainian, Spanish, but I'd need to know all the different templates in all these languages,and as my level of speaking and understanding in these languages are not as fluent as my native language, it is sometimes difficult to find and search on the other Wikisource projects... But nothing would prevent me from helping on any of those or even an Italian, Bulgarian or Portuguese special projects... These are the same fonts... Proofreading only needs us to be able to compare the orignal text of the book and the text transcribed... But not knowing all the templates on the different other wikisource prevent me of helping other communities...

The magic in all this, we don't have to re-invent the wheel! I figure it would be easy to apply some modification to the actual Visual Editor used on the other projects to be able to concentrate the needs of Wikisource editing in a concise list of buttons for the most basics needs, that would allow to proofread 95% or even more of the actual book pages... Worst case scenario, the 5% left would be done the old-fashion way... --Ernest-Mtl (talk) 03:31, 10 November 2015 (UTC) — WMCA[reply]

Earlier discussion and endorsements
Endorsed Endorsed Slowking4 (talk) 04:36, 11 November 2015 (UTC) user interface major impediment to new users - make WSUG happy[reply]
I believe the linked phab task (phab:T48580) is at least one of the prerequisites to this task. Generally speaking, wikisource uses several extensions to mediawiki core, and both Visual Editor and Parsoid need to have code added to support those extensions. cscott (talk) 18:57, 11 November 2015 (UTC)[reply]
Endorsed Endorsed - VE for Wikisource would greatly lower the barriers to entry on what (IMHO) is the sister project with the greatest potential for rapid growth if a little bit of resources were allocated to it. Wittylama (talk) 12:21, 23 November 2015 (UTC)[reply]

Votes

  1. Support Support --Tobias1984 (talk) 11:36, 30 November 2015 (UTC)[reply]
  2. Support Support VE is likely to see a much higher adoption rate on Wikisource than on Wikipedia, and the ability to extract high-fidelity Wikisource content as a DOM using Parsoid will be a very important step forwards. John Vandenberg (talk) 01:42, 1 December 2015 (UTC)[reply]
  3. Support Support Risker (talk) 04:17, 1 December 2015 (UTC)[reply]
  4.  Support Support--Kippelboy (talk) 05:41, 1 December 2015 (UTC)[reply]
  5.  Support Support --Candalua (talk) 08:57, 1 December 2015 (UTC)[reply]
  6. Support Support --Accurimbono (talk) 09:08, 1 December 2015 (UTC)[reply]
  7. Support Support --Anika (talk) 09:28, 1 December 2015 (UTC)[reply]
  8. Support Support --Rahmanuddin (talk) 15:10, 14 December 2015 (UTC)[reply]
  9. Support Support --Xavier121 (talk) 09:40, 1 December 2015 (UTC)[reply]
  10. Support Support -- Bodhisattwa (talk) 13:27, 1 December 2015 (UTC)[reply]
  11. Support Support --Satdeep Gill (talk) 14:26, 1 December 2015 (UTC)[reply]
  12. Support Support obviously. Cdlt, VIGNERON * discut. 14:35, 1 December 2015 (UTC)[reply]
  13. Support Support --TitoDutta 14:43, 1 December 2015 (UTC)[reply]
  14. Support Support--David Saroyan (talk) 14:46, 1 December 2015 (UTC)[reply]
  15. Support Support --KRLS (talk) 15:07, 1 December 2015 (UTC)[reply]
  16. Support Support -- Wittylama (talk) 15:07, 1 December 2015 (UTC)[reply]
  17. Support Support In order to start to make Wikisource proofreading workflow what it should be: an easy task doable by everyone. Tpt (talk) 15:15, 1 December 2015 (UTC)[reply]
  18. Support Support Beleg Tâl (talk) 15:18, 1 December 2015 (UTC)[reply]
  19. Support Support Sadads (talk) 16:18, 1 December 2015 (UTC)[reply]
  20. Support Support --Wesalius (talk) 19:17, 1 December 2015 (UTC)[reply]
  21. Support Support--Arxivist (talk) 20:10, 1 December 2015 (UTC)[reply]
  22. Support Support -- Daniel Mietchen (talk) 20:46, 1 December 2015 (UTC)[reply]
  23. Support Support  Trizek from FR 22:17, 1 December 2015 (UTC)[reply]
  24. Support Support --Aristoi (talk) 22:55, 1 December 2015 (UTC)[reply]
  25. Support SupportGeorge Orwell III (talk) 23:28, 1 December 2015 (UTC)[reply]
  26. Support Support but please, ensure an immediate switch between VE edit and traditional edit while editing, without the need of saving/re-edit the page. If this feature already exists, I apologyze for this inappropriate comment. If this feature is impossible to get, please allow simple, fast and persistent disabling of VE as an user preference. --Alex brollo (talk) 00:00, 2 December 2015 (UTC)[reply]
  27. Support Support Of course, we are talking about VE in the Proofread page, right? Aubrey (talk) 09:06, 2 December 2015 (UTC)[reply]
  28. Support Support --Barcelona (talk) 12:09, 2 December 2015 (UTC)[reply]
  29. Support Support--Yodaspirine (talk) 13:02, 2 December 2015 (UTC)[reply]
  30. Support Support --Le ciel est par dessus le toit (talk) 13:56, 2 December 2015 (UTC)[reply]
  31. Support Support, one of the few cases where VisualEditor will really make life simpler for experienced users — NickK (talk) 16:05, 2 December 2015 (UTC)[reply]
  32. Support Support Pyb (talk) 01:10, 3 December 2015 (UTC)[reply]
  33. Support Support this is a hard ask, but it is a game changer. even incremental progress would be appreciated. Slowking4 (talk) 02:42, 3 December 2015 (UTC)[reply]
  34. Support Support - Wieralee (talk) 17:14, 4 December 2015 (UTC)[reply]
  35. Support Support --Pymouss Tchatcher - 20:12, 4 December 2015 (UTC)[reply]
  36. Support Support Halibutt (talk) 00:24, 5 December 2015 (UTC)[reply]
  37. Support Support --Yeza (talk) 10:47, 7 December 2015 (UTC)[reply]
  38. Support Support --Kasyap (talk) 15:41, 7 December 2015 (UTC)[reply]
  39. Support Support Abyssal (talk) 16:50, 10 December 2015 (UTC)[reply]
  40. Support Support --ESM (talk) 16:41, 13 December 2015 (UTC)[reply]
  41. Support Support --Davidpar (talk) 14:21, 14 December 2015 (UTC)[reply]

2016

Wikisource Wishlists
0 proposals, 0 contributors, 0 support votes



Improvement of Phe's Statistics and Tools for Wikisource: vector graphs and a sortable table

  • Problem: Phe's Statistics and Tools for Wikisource are already a very valuable resource, but there are two features that would make it even more useful for a better insight of the activities in many small projects.
  1. Currently pages like this or this are easy to read as far as en.source or fr.source are involved, but the lines showing progress and growth in smaller projects are all flattened on the bottom of the graphs.
  2. The ProofreadPage statistics are ranked by the number of page verifications, but many users would like to read the same tables ranked by other criteria, which currently cannot be done.
  • Who would benefit: any users interested in comparing their growth with their past achievement and with other projects'; projects willing to motivate the effort of their small but working communities.
  • Proposed solution:
  1. Either the graphs could be
    • rendered in svg instead of png to let users magnify them freely and explore their bottom right corner, or
    • split into two graphs for big and small projects with the appropriate proportions in the axis scale (see for example these two graphs).
  2. The ProofreadPage Wikitables could be sortable to let users explore other rankings.
  • More comments: There's no point in the comparison between projects per se, but a fair amount of ambition between small projects has always been a motivating spring to help them grow in quality, encourage proogfreading and so on.
  • Phabricator tickets:
  • Proposer: εΔω 07:51, 15 November 2016 (UTC)

Community discussion

Good ideas but I don't feel it's for the Community Tech team, except if @Phe: need help (do you?). Cdlt, VIGNERON * discut. 09:37, 15 November 2016 (UTC)[reply]
I met Tpt at Wikimania 2016: I explained him this issues and he wrote an email to Phe, but since then we have had no feedback about it. - εΔω 23:43, 15 November 2016 (UTC)

Well I'm not sure who should do that, but I give my small support to this small improvement.--Alexmar983 (talk) 08:47, 17 November 2016 (UTC)[reply]

@OrbiliusMagister: Could you please summarize the actual "a small change for a big improvement" in the task proposal by editing your proposal, so others can immediately spot what is being proposed here? Thanks! :)
Done - εΔω 22:21, 19 November 2016 (UTC)

Voting – Improvement of Phe's Tools

  1. Support Support--Micru (talk) 13:33, 28 November 2016 (UTC)[reply]
  2. Support Support plus the tool should show statistics of more wikisource language projects.--Snaevar (talk) 23:16, 28 November 2016 (UTC)[reply]
  3. Support Support (for Snaevar: good idea, what language is missing?). Cdlt, VIGNERON * discut. 09:09, 29 November 2016 (UTC)[reply]
    @Phe: ang, bs, cy, fo, ht, is, li, mk, sah, sk, and yi are missing (some are closed like ang or ht :( and most are inactive, but some seems a little bit actives). Cdlt, VIGNERON * discut. 14:28, 29 November 2016 (UTC)[reply]
  4. Support Support Obviously... - εΔω 20:45, 1 December 2016 (UTC)
  5. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]
  6. Support Support --Accurimbono (talk) 05:03, 6 December 2016 (UTC)[reply]
  7. Support Support --Yann (talk) 22:49, 12 December 2016 (UTC)[reply]

Integrate the CIS-LMU Post Correction Tool

  • Problem: As far as I know (based on my work in deWS) all texts in Wikisources are created either by typewriting, copying and pasting from other sources or from OCR. Mistakes are identified only by proofreading and accordingly the quality of texts are strongly dependent on the individual proofreaders.
  • Who would benefit: All Wikisources that work with OCR.
  • More comments: PoCoTo is a Open-Source-Tool, available on GitHub. It gives the opportunity to identify common OCR-mistakes and handle them easily. It would be probably a big improvement to the quality of texts with one or no proofreaders. Also texts that have been validated two times could systematically looked at for remained mistakes.

Community discussion

none

Voting – Integrate the CIS-LMU Post Correction Tool

  1. Support Support Ninovolador (talk) 11:40, 28 November 2016 (UTC)[reply]
  2. Support Support Tannertsf (talk) 13:38, 28 November 2016 (UTC)[reply]
  3. Support Support --Micru (talk) 16:13, 28 November 2016 (UTC)[reply]
  4. Support Support ShakespeareFan00 (talk) 20:56, 29 November 2016 (UTC)[reply]
  5. Support Support --Alex brollo (talk) 07:53, 30 November 2016 (UTC)[reply]
  6. Support Support - εΔω 20:57, 1 December 2016 (UTC) Whoever has proofread ancient texts understands the importance of this tool.
  7. Support Support Libcub (talk) 03:47, 2 December 2016 (UTC)[reply]
  8. Support Support Shubha (talk) 10:32, 2 December 2016 (UTC)[reply]
  9. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]
  10. Support Support --Continua Evoluzione (talk) 10:29, 5 December 2016 (UTC)[reply]
  11. Support Support Would this circumvent the current roundabout practice of uploading PDFs to Archive.org to get a DJVU which has gone through their OCR? Blue Rasberry (talk) 18:34, 6 December 2016 (UTC)[reply]
  12. Support Support --Yann (talk) 22:50, 12 December 2016 (UTC)[reply]

  • Problem: Right now the editions are represented in Wikidata as separate items . See for instance The Raven, there are multiple editions linked with the property "editions". However this is not represented in Wikisource as inter-language links.
  • Who would benefit: All Wikisource editors.
  • Proposed solution: Create a plugin that will aggregate all the edition interwiki links and will display them for each item on Wikisource.
  • Phabricator tickets: T128173
  • Proposer: Micru (talk) 14:06, 8 November 2016 (UTC)[reply]

Community discussion

Comment Comment the interplay of book <-> edition also needs to be considered as it is a similar problem so that wikisource <-> wikipedia can also be resolved.  — billinghurst sDrewth 16:25, 8 November 2016 (UTC)[reply]

Comment Comment is this really a technical problem? do we really want all links? (nd just edition or works/exemplar too ? for some works, links can be longer than the text itself, eg. Tales or Bible books like Book of Genesis... for Wikisource <-> Wikipedia links, shouldn't the wikipedia article be linked to the work page on Wikisource (which is actually done I think, and it.ws has an Opera namespace dedicated for that). I wonder if we maybe don't need a whole new system for navigation on Wikisource (similarly the Author: pages done by hand seems a non-sens to me too). Cdlt, VIGNERON * discut. 16:53, 8 November 2016 (UTC)[reply]

  1. Support Support JAn Dudík (talk) 22:23, 28 November 2016 (UTC)[reply]
  2. Support Support Peter Alberti (talk) 11:23, 3 December 2016 (UTC)[reply]
  3. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]
  4. Support Support -- Sergey kudryavtsev (talk) 07:12, 7 December 2016 (UTC)[reply]
  5. Support Support --Yann (talk) 22:51, 12 December 2016 (UTC)[reply]
  6. Support Support, I would really like to have this feature so that I do not have to copy interwikis manually to each and every edition — NickK (talk) 23:35, 12 December 2016 (UTC)[reply]

Make Wikisource "book-based"

  • Problem:

Mediawiki on Wikisource is "page-based", but Wikisource is "book-based". In WS, we use:

  • in the main namespace: 1 page for the index, N subpages for the single chapters
  • in the Index namespace: 1 page for the index
  • in the Page namespace: N pages for the pages

Examples:

This division has multiple effect, which have been discussed a lot in the past: see Book_management. In short, this affects:

  • metadata storage
  • import and export of books
  • navigation bars and automatization of task --> thus, steep learning curve for new users
  • statistics are page based, and not book-based: you never know what books are mostly read, you just know which single pages or chapters are.

At the same time, we don't know which books people look for on Wikisource, and what they don't find.

On small projects like Wikisource, it would be possible to experiment and have better analytics regarding books and also users, because numbers are small enough to be manageable. If we knew what

  • Who would benefit:

The community of editors, first, and the community of readers, later. The idea is to have better and more logic framework for Wikisource, which in time will allow better tools, a more easier workflow, and even better analytics for the community so that they understand what's liked and read on WS.

Community discussion

  • @Aubrey: Sounds interesting. I'm still reading through the documentation linked above, so apologies if my questions are answered there — but: how much of proofreadpage would need to change? And with the long-term plan of moving lots of metadata to Wikidata, I'm imagining that this would be mostly about structuring books within MediaWiki (and not so much about describing them); is that right? Sam Wilson 04:59, 17 November 2016 (UTC)[reply]
    • I think that Molly's extension was directed more at the ns0 than the ProofreadExtension. That is because the PE is its own think, and because it's in the ns0 that we need a "logic structure" of a book (meaning: Index, chapters). Of course, and there are hacks from @Alex brollo: in this direction, everything is correlated, and when you create the index in the Index page you can pre-fill the book subpages with the right data... When you have all the "structural metadata" (meaning, number of pages, which page a chapter start, which page the next chapter start, which page is the cover, etc.) you should be able to create automatically everything that use that piece of data. Which means index in the Index page, but also subpages in the ns0 that represent chapters, and also pages index tag (like <pages index="BOOK.djvu" from=544 to=548 />)...
    • These structural metadata will never go on Wikidata, but should be stored in a place that is equal for every Wikisource: at the moment, the Italian wikisource uses Alex tools to do this, which is fine for us but not for everybody else. When you have all the data you can use how you want. Descriptive metadata (author, title, publisher) will go on Wikidata. Aubrey (talk) 10:50, 17 November 2016 (UTC)[reply]

Voting – Make Wikisource "book-based"

  1. Support Support--Wesalius (talk) 08:20, 28 November 2016 (UTC)[reply]
  2. Support Support --Micru (talk) 16:12, 28 November 2016 (UTC)[reply]
  3. Support Support Aubrey (talk) 08:49, 30 November 2016 (UTC)[reply]
  4. Support Support--Satdeep Gill (talk) 17:57, 1 December 2016 (UTC)[reply]
  5. Support Support NMaia (talk) 00:25, 2 December 2016 (UTC)[reply]
  6. Support Support Libcub (talk) 03:48, 2 December 2016 (UTC)[reply]
  7. Support Support Shubha (talk) 10:36, 2 December 2016 (UTC)[reply]
  8. Support Support Csisc (talk) 10:56, 2 December 2016 (UTC).[reply]
  9. Support Support Pamputt (talk) 10:48, 4 December 2016 (UTC)[reply]
  10. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]
  11. Support Support--Alexmar983 (talk) 08:51, 8 December 2016 (UTC)[reply]
  12. Support Support - DPdH (talk) 12:14, 12 December 2016 (UTC)[reply]

Semi-automated tool for importing Wikisource data from standard header template into Wikidata items

  • Problem: importing data from wikisource pages (especially transcluded pages with complete header) is very painful, while all informations are present in the header = title, author, date, publisher, pages, and for articles Periodical data and pages, + the scanned file info. A tool, allowing to automatically import those data in wikidata empty items (thousands have been imported by bots, without bothering to complete them).
  • Who would benefit: both wikidata and wikisource communities, because wikidata info for wikisource texts would be much more complete. A semi-automated tool, launchable individually on each item, would allow to complete items that need it, and control data, which a bot would not allow.
  • Proposed solution: a tool/script/gadget that would import available header data from the standard header template. @Tpt:

Community discussion

  • @Hsarrazin: I think the crucial part of this is not so much importing the data to Wikidata (which I agree is a great idea) but rather using Wikidata data back in Wikisource. If the header template were to use Wikidata more, then Wikisourcerors would keep it up to date and correct. (Really, we shouldn't have to enter the same metadata in the header template, the Index page, and the file description on Commons!) Sam Wilson 00:59, 15 November 2016 (UTC)[reply]
    For the header template vs Index page duplication there is the header=1 feature of the <pages> tags (we use it a lot on French Wikisource). For duplication between Wikidata and Wikisource, it's just a matter of using mw.wikibase API in Lua gadget (and hopefully the same process will be usable when Commons file will use Wikibase-based metadata storage). Tpt (talk) 09:44, 18 November 2016 (UTC)[reply]
    I would hope that we would be able to pull the information from Commons {{book}} or {{header}} as generally Commons is the first upload place, and header template for where the work is not scan supported.  — billinghurst sDrewth 12:55, 3 December 2016 (UTC)[reply]

Voting – Semi-automated tool for importing Wikisource data

  1. Support Support--Wesalius (talk) 08:20, 28 November 2016 (UTC)[reply]
  2. Support Support --Micru (talk) 16:14, 28 November 2016 (UTC)[reply]
  3. Support Support Aubrey (talk) 08:49, 30 November 2016 (UTC)[reply]
  4. Support Support Ankry (talk) 23:20, 1 December 2016 (UTC)[reply]
  5. Support Support as this will surely let implementing Open Citations corpus in Wikidata easier Csisc (talk) 10:57, 2 December 2016 (UTC).[reply]
  6. Support Support Peter Alberti (talk) 11:26, 3 December 2016 (UTC)[reply]
  7. Support Support no brainer for value for WSes and WD  — billinghurst sDrewth 12:55, 3 December 2016 (UTC)[reply]
  8. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]
  9. Support Support Risker (talk) 03:12, 9 December 2016 (UTC)[reply]
  10. Support Support --Edhral 08:40, 10 December 2016 (UTC)[reply]

Spelling- and typo-checking system for proofreading

  • Problem:
    When proofreading using ProofreadPage, there is no syntax highlighting or spell-checking (other than what may be native to the web browser). This makes it hard to see typos, scannos, mis-spellings, and other problems. A font that makes greater distinctions between characters, such as WikisourceMono, can help; but it's still hard to proofread things like punctuation (e.g. new line characters where they occur within a paragraph) or archaic spellings of words (e.g. "to-day" spelt with a hyphen).
  • Who would benefit:
    1. Wikisource readers, because the texts they view will be of higher quality; and
    2. Proofreaders, because they will catch errors earlier in the proofreading cycle, and have to revisit fewer pages.
  • Proposed solution:
    A tool akin to Distributed Proofreaders' WordCheck tool, which would implement site-wide and per-work word lists against which pages could be checked. In PGDP, this checking happens as what we call the 'preview' stage (i.e. it's optional, but its use is encouraged), when the user is presented with a colour-coded display of the text.
    All punctuation characters are highlighted (just for easier viewing), and every word that doesn't exist in a 'Good word list' (or does exist in a 'Bad word list') is shown in an editable text field. The user can then correct the word, or elect to add it to one of the work's word lists. The punctuation highlighting would include paragraph marks (currently some people use the pilcrowMarkers gadget) and whitespace.
  • More comments:
    • The punctuation-highlighting aspects of this wish could be satisfied by a syntax-highlighting editor (but such a thing would have to be customisable, for example some works enforce spaces around dashes, and some prohibit them).
    • The word-correcting UI could appear in a page's preview (although it may also be good to be able to view it applied to the actual wikitext as well), and it might be useful to be able to enable it when just reading a Page namespace page.
    • The per-work word lists could be able to be copied from other works (or some sort of word list library). Although, copy and paste works for this too.
    • It should be easy to get a report of a whole work's wordcheck status (although, another approach to this could be to have the word-fix UI able to be turned on in Main namespace, or anywhere Page pages are transcluded).
    • If a word is in a Bad word list, it should be easy to replace all occurrences throughout a work (although, there's value in going the traditional per-page route, because it could be that not all occurrences are actually the same misspelling).
    • The PGDP source code is PHP/MySQL, GPL-2.0, and hosted on Sourceforge.
  • Phabricator tickets:

Community discussion

Comment Comment
How do they manage proper English vs. American English words?
There was an earlier discussion about having an Index: page level setting or typing error correction as the OCR for a work can have reproducible errors that could be applied. Having something like Index:workname.djvu/Badwords  — billinghurst sDrewth 15:58, 8 November 2016 (UTC)[reply]

Yes, PGDP has per-work wordlists (which, as you say, would probably be well off stored as /goodwords and /badwords or something under each Index page). This means that UK-spelling works would have US spellings in the badwords, and vice versa for US works (or 19th century vs modern, e.g. "to-day" is correct but "today" is not). I'm not sure about the idea of mass-applying fixes over a whole work—but certainly mass-reporting on misspellings would be brilliant. I'm also not sure about how to manage puntuation fixes (e.g. highlighting when someone's done a spaced em dash, or has a mid-paragraph line break); but maybe that's beyond this wordlist idea? Sam Wilson 22:05, 8 November 2016 (UTC)[reply]
Start with the framework IMNSHO, it can expand after that. Maybe phase 1 is to get the "words" pages framework set up, to do the matches in phase 1, then phase 2 is to build the regex replacer (I note something like w:Wikipedia:AutoWikiBrowser/Typos which Reedy (talk · contribs) built may be a useful model, and our required complexity is way less than that. user:Pathoschild has done some work in the area with m:TemplateScript maybe he can contibute his knowledge to this idea.)

We all know a good series of bad OCR that regularly comes through that I would happily have corrected as the page is loaded for the first time. After 20pp of a work (and s:Index:A biographical dictionary of eminent Scotsmen, vol 1.djvu is a typical example) there are 20-30 reproducible bad OCRs \bAvas\b -> was is so familiar to me from many works.  — billinghurst sDrewth 10:07, 10 November 2016 (UTC)[reply]

Comment Comment At WS, we "faithfully transcribe" original source texts. Therefore, addressing OCR errors is one thing, but misspellings and archaic spellings that appear in the original text should usually remain after transcription. Such a tool may confuse a new user who is prompted to make a change that shouldn't be made. Also, a spelling and typo checker may create a dependency on the tool by users, assuming that a "clean" page means all is well when other errors may be lurking. Just as many errors may be found in the end. Nothing beats old-fashioned proofreading letter-by-letter, word-by-word, line-by-line, in my opinion. Londonjackbooks (talk) 11:05, 7 December 2016 (UTC)[reply]

Voting – Spelling and typo checking

  1. Support Support--Wesalius (talk) 08:21, 28 November 2016 (UTC)[reply]
  2. Support Support Tannertsf (talk) 13:39, 28 November 2016 (UTC)[reply]
  3. Support Support --Micru (talk) 16:14, 28 November 2016 (UTC)[reply]
  4. Support Support --Alex brollo (talk) 07:54, 30 November 2016 (UTC)[reply]
  5. Support Support --ShakespeareFan00 (talk) 10:28, 1 December 2016 (UTC)[reply]
  6. Support Support — The preceding unsigned comment was added by Satdeep Gill (talk)
  7. Support Support - εΔω 20:46, 1 December 2016 (UTC)
  8. Support Support NMaia (talk) 00:15, 2 December 2016 (UTC)[reply]
  9. Support Support Libcub (talk) 03:42, 2 December 2016 (UTC)[reply]
  10. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]
  11. Support Support --Continua Evoluzione (talk) 10:32, 5 December 2016 (UTC)[reply]
  12. Support Support - DPdH (talk) 12:18, 12 December 2016 (UTC)[reply]

Support Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

  • Problem: GLAM partners are reluctant to add material to Wikisource because (among other reasons) it's hard to incorporate back into their own catalogues.
  • Who would benefit: GLAM partners and their users (who would also thus be exposed to Wikisource, and might think it's great)
  • Proposed solution: Support the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
  • More comments: This idea has been around for a long time (the ticket below dates from 2004) so I'm sure there's lots of back-story that I don't know! :-)

    One crucial part of it as it relates to Wikisource is that it's possible for OAI-PMH consumers to specify certain search criteria (such as a particular collection, or category, or author, etc.) when they only want to harvest those works.

    For example, the National Library of Australia's Trove system could harvest all Wikisource material relating to or originating in Australia, and then any library user would see Wikisource items in their search results.

Community discussion

  • Great idea Samwilson (I have an exemple with my local library tablettes-rennaises.fr/ with OAI-PMH, books in PD licence and metadata in CC0, everything perfect except I don't have the tools and I don't really know OAI-PMH), shouldn't the support be with/for Wikidata too ? Cdlt, VIGNERON * discut. 16:55, 14 November 2016 (UTC)[reply]
    • @VIGNERON: Absolutely! Good point... really, we should aim at not having metadata in Wikisource at all (one day). So do you think this proposal should just target Wikidata? If it did, it would give Wikisource another reason to shift in that direction—but it could delay making it a reality for Wikisource works too. Sam Wilson 00:41, 15 November 2016 (UTC)[reply]
      • @Samwilson: not sure if we should target Wikidata only... I'm too aiming for having all (meta)data stored into Wikidata but I know it will take quite a long time (even a simple thing as just converting the Author template to Wikidata on frws took almost year and people are still not use to it). So, targeting only Wikidata is still a good idea but it's probably too early as it seems too pushy right now ; tldr; still unsure. Cdlt, VIGNERON * discut. 09:51, 15 November 2016 (UTC)[reply]
        • Hm, yes good point @VIGNERON. I've been writing a scraper for WS that could be a useful interface, and that could be updated as data is moved to WD without the core PMH system having to change along with it. So yeah, let's say this is just for going direct from Wikisource for now. Sam Wilson 00:02, 16 November 2016 (UTC)[reply]
          • From what I understand about this problem, we should completely refactor the whole metadata system in Wikisource. It's a daunting task, but very important. In fact, we already have a OAI-PMH tool for Index pages, it was made (of course!) by @Tpt: https://it.wikisource.org/wiki/Speciale:ProofreadIndexOai
The point is, as you two wrote up here, that we need our data in a place with a good metadata system and good API. I think Wikidata is definitely good enough, so the goal, IMHO, should be "importing all the WS metadata in Wikidata". As you know, this means several things:
  • agreeing on a good bibliographic model on Wikidata
  • make it compatible with the bibliographic model in Commons (because they will have their own thing)
  • creating a very good scraper for different WS
  • importing data on WD
  • creating good documentations for GLAMs and Wikidata, for importing and exporting
Mind you: this could also mean rethinking and refactoring Wikisource "data model", Proofread extension, and other things. Meaning that We are always patching up and building on existing patches and hacks on Wikisource, but the whole project could enjoy that some good professionals look at the thing from scratch and decide if things are good enough or there is some code to be written or changed. I'm saying all this because my fear is that we try always to do small new tools on top of each other, when what we need is good, old-fashioned but serious code/system review. Aubrey (talk) 08:22, 16 November 2016 (UTC)[reply]
I strongly support Aubrey's comment. We definitely need to seat down and find a good metadata management workflow before creating more tools. It will allow us to have a clear vision of what is needed and which tool create. I have tried a few years ago to start such work. See mw:User:Tpt/RFC (most part of it have been written in 2013). Tpt (talk) 09:58, 18 November 2016 (UTC)[reply]

Voting – Support Open Archives Initiative Protocol

  1. Support Support--Shizhao (talk) 03:20, 28 November 2016 (UTC)[reply]
  2. Support Support--Wesalius (talk) 08:21, 28 November 2016 (UTC)[reply]
  3. Support Support--Micru (talk) 13:32, 28 November 2016 (UTC)[reply]
  4. Support Support Sadads (talk) 15:01, 28 November 2016 (UTC)[reply]
  5. Support Support --Izno (talk) 01:33, 29 November 2016 (UTC)[reply]
  6. Support Support VIGNERON * discut. 09:14, 29 November 2016 (UTC)[reply]
  7. Support Support --Ernest-Mtl (talk) 17:56, 29 November 2016 (UTC) (BAnQ's national collection archivists and librarians have been dreaming of this for over 2 years! lol)[reply]
  8. My Support Support is not actually related to the OAI-MPH (there is ***already*** such a tool, see my comment above) but for a rethinked-refactored metadata workflow. A simple tool won't solve the real issue (the moment you work a bit on this you will understand what I mean). We discussed a lot about this, we just need to focus on doing it. Aubrey (talk) 08:56, 30 November 2016 (UTC)[reply]
  9. Support Support - εΔω 20:47, 1 December 2016 (UTC)
  10. Support Support NMaia (talk) 00:15, 2 December 2016 (UTC)[reply]
  11. Support Support Libcub (talk) 03:43, 2 December 2016 (UTC)[reply]
  12. Support Support Pamputt (talk) 10:49, 4 December 2016 (UTC)[reply]
  13. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]
  14. Support Support --HHill (talk) 11:04, 5 December 2016 (UTC)[reply]
  15. Support Support Risker (talk) 03:13, 9 December 2016 (UTC)[reply]
  16. Support Support  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  18:24, 11 December 2016 (UTC)[reply]
  17. Support Support --Yann (talk) 22:57, 12 December 2016 (UTC)[reply]

Upload Wikisource text wizard

  • Problem: The text upload process is complex across many projects.
  • Who would benefit: Uploaders
  • Proposed solution: Create a wizard that includes the upload text process - search Internet Archive, use IA uploader to commons, set index at Wikisource to match Commons, adjust 'page offset' on index page.

Community discussion

  • By "adjust page offset" I assume you mean setting up the pagelist on the Index page? If so, I reckon that in itself could be a great project: a wizard for easier matching up of scan pages to work page numbers. Could be a great thing to gamify with a simple mobile interface for asking users to identify what page number a given page is; it could figure out the progression from the answers. (Maybe.)

    Also, this is about the workflow of uploading works' Djvu & PDF files isn't it? (For the benefit of non-wikisourcerers who are reading: the 'text' in this case is a book or other work containing text, not just "uploading wikitext".) —Sam Wilson 03:37, 8 November 2016 (UTC)[reply]

    @Samwilson: using the existing Book2Scroll tool on enWS index pages allows all users to see page numbering and gaps, images etc. Maybe we can work off that tool to mark pages and have that feedback the data to the <pagelist>  — billinghurst sDrewth 15:58, 8 November 2016 (UTC)[reply]
    yes, i do index pages the hard way, but an easier way would be appreciated. it is for the total process redesign, including dejavu conversion, of scanned text layers. there are a lot of steps with a lot of custom tweaking required. Slowking4 (talk) 12:18, 9 November 2016 (UTC)[reply]
    @Billinghurst: Good point about using book2scroll; it's certainly the sort of interface we'd want for page-number-correlation. And @Slowking4: Every time I recommend to someone that they add something to Wikisource, I'm reminded of quite how many steps there are! It'd be terrific to have a most-common-workflow wizard: from a pile of scan files (or a IA identifier) through to a correctly-constructed Index page. Is this what you're envisaging? Sam Wilson 04:42, 10 November 2016 (UTC)[reply]
  • Good idea --Wesalius (talk) 07:28, 8 November 2016 (UTC)[reply]
  • Good idea --Jayantanth (talk) 11:14, 8 November 2016 (UTC)[reply]
  • Good idea VIGNERON * discut. 14:25, 8 November 2016 (UTC)[reply]
  • We need a tool which can create DjVu files and OCRs, since IA doesn't do DjVu anymore. In short, to replace or improve BuB. Yann (talk) 15:55, 8 November 2016 (UTC)[reply]
  • As an occasional contributor to Italian Wikisource I agree that we need some kind of wizard to make it easier. --Jaqen (talk) 15:45, 16 November 2016 (UTC)[reply]

Voting – Upload Wikisource text wizard

  1. Support Support--Shizhao (talk) 03:21, 28 November 2016 (UTC)[reply]
  2. Support Support--Wesalius (talk) 08:21, 28 November 2016 (UTC)[reply]
  3. Support Support--Micru (talk) 13:34, 28 November 2016 (UTC)[reply]
  4. Support Support Tannertsf (talk) 13:39, 28 November 2016 (UTC)[reply]
  5. Support Support very important for new contributor support, Sadads (talk) 15:02, 28 November 2016 (UTC)[reply]
  6. Support Support --Consulnico (talk) 17:50, 28 November 2016 (UTC)[reply]
  7. Support Support --Snaevar (talk) 23:18, 28 November 2016 (UTC)[reply]
  8. Support Support John Carter (talk) 23:41, 28 November 2016 (UTC)[reply]
  9. Support Support VIGNERON * discut. 09:14, 29 November 2016 (UTC)[reply]
  10. Support Support a real wizard, with good quality djvu encoding, could merge existing tools and provide a real improvement to the existing workflow. I think we have all the pieces: we need to join them and make a powerful and simple tool for everyone. Aubrey (talk) 08:59, 30 November 2016 (UTC)[reply]
  11. Support Support Trizek from FR 20:02, 30 November 2016 (UTC)[reply]
  12. Support Support NMaia (talk) 00:16, 2 December 2016 (UTC)[reply]
  13. Support Support Shubha (talk) 10:16, 2 December 2016 (UTC)[reply]
  14. Support Support Pamputt (talk) 10:49, 4 December 2016 (UTC)[reply]
  15. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]
  16. Support Support Sadads (talk) 18:45, 6 December 2016 (UTC)[reply]
  17. Support SupportNala Discuter 10:30, 7 December 2016 (UTC)
  18. Support Support --Edhral 07:26, 8 December 2016 (UTC)[reply]
  19. Support Support Weft (talk) 07:55, 8 December 2016 (UTC)[reply]
  20. Support Support Risker (talk) 03:07, 9 December 2016 (UTC)[reply]
  21. Support Support - DPdH (talk) 12:22, 12 December 2016 (UTC)[reply]
  22. Support Support --Yann (talk) 22:54, 12 December 2016 (UTC)[reply]
  23. Support SupportNickK (talk) 23:37, 12 December 2016 (UTC)[reply]

Visual Editor menu refresh

  • Problem: the visual editor menus could be refreshed, and improved
  • Who would benefit: new editors on wikisource
  • Proposed solution: redesign menus based on survey of workflows of most used functions, and editor feedback - do UX design.
    • for example - in page view under style text menu, you need a small caps, and larger and smaller text should be higher than strikethru or subscript ; you need an easy way to insert common templates such as running header
    • & in article view you need to move insert pages up and insert media down ; you need an easy way to insert common header templates such as EB1911 or NIE.
  • Phabricator tickets:
  • Proposer: Slowking4 (talk) 02:16, 8 November 2016 (UTC)[reply]
  • Translations: none yet

Community discussion

@Slowking4: This proposal welcomes a better problem description: What exactly you would like to achieve, and why it is currently hard or impossible to perform your work, given the current visual editor menu. "Could be more helpful" is not a problem but a solution. :) --AKlapper (WMF) (talk) 10:26, 8 November 2016 (UTC)[reply]

  • i would like to see menu redesign based on wikisource workflow and editor feedback, not on a wikipedia default. to the extent menus are not useful, then editors switch to wikicode to get work done. have you actually used VE to insert or edit a page header, because i cannot - it looks impossible to me. i also gave you four examples, to change the existing menus. or you could make the drop down menus customizable by editor. Slowking4 (talk) 19:46, 8 November 2016 (UTC)[reply]
  • It's already possible to add custom/local buttons; see mw:VisualEditor/Gadgets. I don't believe that it's currently possible re-arrange the order of items in the character formatting menu.
    Inserting a running header is possible now. You just need to go to Insert > Template and put in {{RunningHeader|left=LEFT|center=CENTRE|right=RIGHT}}. You should use the parameter names. It would be much quicker to do if the template had mw:TemplateData documentation, but it's possible now. Whatamidoing (WMF) (talk) 17:04, 2 December 2016 (UTC)[reply]

Voting – Visual Editor menu refresh

Add a 'clean' method for side-titles, and side notes to parser

The current approaches to this are not necessarily ideal, and do not necessarily render in a manner that is consistent across differing "media" formats (such as mobile), or even namespace displays at Wikisource.
  • Who would benefit: Transcribers and proofreaders at Wikisource.
  • Proposed solution:
    1. Implement an inline <sidenote></sidenote> tag pair that can be used to mark a sidenote directly to the parser.
    2. Implement a back end to process the "sidenotes", such that they are rendered using a manner consistent with the media format concerned.
    (a) For desktop and 'print', the sidenotes should be rendered as classed layout block to the appropriate side of the main content, taking into account the side 'swapping' may be be required when transcluded to Main/Article namespace. The reason for the block being classed is so that it can be style defined per work or even per page as required. (ie. left align and right align accordingly in Page: namespace, and either left or right align when transcluded to main namespace)
    (b) For 'mobile', the sidenotes could ideally be converted into either <ref></ref> pairs automatically (in an appropriate group) and rendered at the end of the document in a suitable formatable block. For sidetitles, which typically appear at the start of a paragraph, conversion of these into paragraph leaders (as suggested in the commennts) is another possible solution.

Community discussion

See Bible (King James Version, 1611)/Genesis for a current example. Kaldari (talk) 18:41, 7 November 2016 (UTC)[reply]

yes, sidenotes are a major headache stopping work on documents with them. current practice uses template:sidenote. Slowking4 (talk) 02:20, 8 November 2016 (UTC)[reply]

Comment Comment
Part of the issue is to also get the sidenotes to work well with the different page layouts.
For mobile version it may be more appropriate to place sidenotes as paragraph leaders rather than treat as references. Putting them as end notes does not seem to be the best solution.
Another issue is avoiding the overlapping display of sidenotes.  — billinghurst sDrewth 15:46, 8 November 2016 (UTC)[reply]

Voting – Add a 'clean' method for side-titles

  1. Support Support--Wesalius (talk) 08:21, 28 November 2016 (UTC)[reply]
  2. Support Support Tannertsf (talk) 13:40, 28 November 2016 (UTC)[reply]
  3. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]

Add simple filters to Danmichaelo's CropTool

  • Problem: CropTool by Danmichaelo (c:Commons:CropTool, and at Github) presently does an excellent job for wikisource users too, since it can retail nsPage images from multipage books, djvu and pdf, so simplifying a lot a difficult step of nsPage formatting. It would be great to add to it some filters and tools (rotation, grey and BW conversion, background removal) to enhance image quality.
  • Who would benefit: all contributors
  • Proposed solution: to implement an optional canvas environment with simple tools
  • Phabricator tickets:

Community discussion

Comment Comment giving crop the ability to remove and replace the first page of a djvu/pdf would have value for all the Commonists who complain about the scans opening with a Google text page (and whine about copyright).  — billinghurst sDrewth 22:32, 23 November 2016 (UTC)[reply]

A good standard to replace the Google front image could be, to replace it with a copy of title page of the book, coming from the djvu/pdf itself. The only parameter to give to a bot would be, the djvu/pdf page number of title page. --Alex brollo (talk) 19:40, 24 November 2016 (UTC)[reply]

Voting – Add simple filters

  1. Support Support Ninovolador (talk) 11:40, 28 November 2016 (UTC)[reply]
  2. Support Support--Alexmar983 (talk) 17:32, 28 November 2016 (UTC)[reply]
  3. Support Support --Alex brollo (talk) 09:56, 30 November 2016 (UTC)[reply]
  4. Support Support - εΔω 20:49, 1 December 2016 (UTC)
  5. Support Support NMaia (talk) 00:17, 2 December 2016 (UTC)[reply]
  6. Support Support Shubha (talk) 10:23, 2 December 2016 (UTC)[reply]
  7. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]
  8. Support Support --Edhral 07:31, 8 December 2016 (UTC)[reply]
  9. Support Support Risker (talk) 03:08, 9 December 2016 (UTC)[reply]
  10. Support Support--Nizil Shah (talk) 06:29, 10 December 2016 (UTC)[reply]

AJAX editing of nsPage content

  • Problem: Opening and saving nsPage pages is slow
  • Who would benefit: Experienced contributors
  • Proposed solution: Experienced contributors edit usually a series of pages in their natural order, often with minimal changes and using the same browser environment (same tools, same nsIndex....). This job can be done by ajax API with no need of re-upload and re-compile the complex browser environment, it only needs to get text, image and some data of new page, to save edits and to get immediately the next page without leaving the edit mode. Tests are running into it.source by the tool Edit in Sequence.
  • Phabricator tickets:

Community discussion

THIS is actually a super great idea!! Specially with slow internet connections, that have to reload over and over again the same UI elements, and, in my case at least, not always all the javascripts are loaded, so i need to reload the page a few times to start working. --Ninovolador (talk) 13:09, 19 November 2016 (UTC)[reply]

Voting – AJAX editing of nsPage content

  1. Support Support Ninovolador (talk) 11:39, 28 November 2016 (UTC)[reply]
  2. Support Support Reptilien.19831209BE1 (talk) 14:40, 28 November 2016 (UTC)[reply]
  3. Support Support --Alex brollo (talk) 09:57, 30 November 2016 (UTC)[reply]
  4. Support Support Shubha (talk) 10:24, 2 December 2016 (UTC)[reply]
  5. Support Support Omino di carta (talk) 12:25, 3 December 2016 (UTC)[reply]
  6. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]

Allow Wikisource pages to be cited correctly

  • Problem: Cite this page produces incorrect citation -

Example see this page - https://en.wikisource.org/wiki/Malthus,_Thomas_Robert_(DNB00) Cite-this-page gives (APA and MLA styles):

What it should actually cite:

It would be good if the page actually has embedded COinS metadata that can be set by editors - so that anyone can cite it correctly using the visual editor.

  • Who would benefit: Editors who understand referencing
  • Proposed solution:

Perhaps having page variables would work so that the Cite-This-Page extracts data from manually set values.

Community discussion

We have a gadget like that on French Wikisource. See the citer le texte button. Tpt (talk) 09:46, 18 November 2016 (UTC)[reply]

Green tickY Awesome, exactly, I just checked and it's something other language Wikisources ought to have. Shyamal (talk) 06:47, 23 November 2016 (UTC)[reply]
The js code has been installed in Bengali Wikisource. -- Bodhisattwa (talk) 06:58, 7 December 2016 (UTC)[reply]

Voting – Allow Wikisource pages to be cited correctly

  1. Support Support making the citer le texte button available to all other wikisource projects. --Wesalius (talk) 08:03, 28 November 2016 (UTC)[reply]
  2. Support Support — The preceding unsigned comment was added by FocalPoint (talk)
  3. Support Support --R. S. Shaw (talk) 17:20, 9 December 2016 (UTC)[reply]
  4. Oppose Oppose As noted above, this already exists; all wikis have to do is install and localize it. It's not a WMF dev job.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  18:26, 11 December 2016 (UTC)[reply]

Automated reader's portal

  • Problem: For readers it is hard to get an easy overview of the works that are available for reading from Wikisource.
  • Who would benefit: Visitors mostly.
  • Phabricator tickets:

Community discussion

none

Voting – Automated reader's portal

  1. Support Support--Wesalius (talk) 08:20, 28 November 2016 (UTC)[reply]
  2. Support Support (ping Ernest-Mtl ;) ) VIGNERON * discut. 09:16, 29 November 2016 (UTC)[reply]
  3. Support Support --Ernest-Mtl (talk) 17:52, 29 November 2016 (UTC)[reply]
  4. Support Support I wish we had something as good for commons media... but wikisource is a good start in the direction.--Alexmar983 (talk) 06:16, 30 November 2016 (UTC)[reply]
  5. Support Support --Alex brollo (talk) 07:50, 30 November 2016 (UTC)[reply]
  6. Support Support Aubrey (talk) 09:00, 30 November 2016 (UTC)[reply]
  7. Support Support - εΔω 20:51, 1 December 2016 (UTC)
  8. Support Support Jberkel (talk) 22:23, 1 December 2016 (UTC)[reply]
  9. Support Support Libcub (talk) 03:45, 2 December 2016 (UTC)[reply]
  10. Support Support Shubha (talk) 10:26, 2 December 2016 (UTC)[reply]
  11. Support Support Peter Alberti (talk) 11:21, 3 December 2016 (UTC)[reply]
  12. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]
  13. Support Support --Continua Evoluzione (talk) 10:34, 5 December 2016 (UTC)[reply]
  14. Support Support --Edhral 06:23, 9 December 2016 (UTC)[reply]
  15. Support Support --Francois C (talk) 15:11, 10 December 2016 (UTC)[reply]
  16. Support Support -- Sometimes I've got a feeling that Wikisource are even less friendly for readers than for edithors Plogeo (talk) 19:29, 10 December 2016 (UTC)[reply]
  17. Support Support --NaBUru38 (talk)
  18. Support Support - DPdH (talk) 12:23, 12 December 2016 (UTC)[reply]
  19. Support Support --Yann (talk) 22:57, 12 December 2016 (UTC)[reply]
  20. Support SupportNickK (talk) 23:38, 12 December 2016 (UTC)[reply]

Create new Han Characters with IDS extension for WikiSource

於維基文庫上利用IDS描述新造漢字

  • Problem:
    • en-Han-character (en:logogram, include en:Chinese Characters, en:Hanja, and en:Kanji)- is widely used in East Asia (China, Taiwan, Singapore, Mandarin area in Malaysia, HongKong, Japan, Korea, Taiwan and Vietnam). An enduring problem unsolved for digital archiving is "lacking of characters". Not only for characters in ancient books, even modern publications lacks for characters ( i.e. Some authors may created 300-400 unique new characters in certain books). It's difficult to deal when we archive them into WikiSource. Unicode gradually add new characters into the chart, but new Uni-han extension always takes time to go live. In the past WikiSource,even Wikipedia, used to deal this problem with image files to present those characters. But images cannot be indexed, unsearchable, even not exchangeable between computer systems.
    • zh-東亞文化圈的CJKV(中國、台灣、新加坡、馬來西亞華語區、香港、日本、韓國、台灣、越南)許多地方使用漢字(更正式的名稱是「zh:語素文字」),在電腦數位文獻處理上,一直有一個大問題,就是漢字缺字問題(lacking),不單單是各國古代漢字文獻有大量缺字,近代傳統活字排版印刷時代的書籍還有各自自創的缺字(有時可能是只有某一本書,就獨特地出現了300-400個該書獨有發明的新字),當要放進維基文庫的時候,處理此問題非常地困難。這個問題的本質是漢字過去在電腦上的處理未考慮到漢字是個開放字集的事實。當今維基媒體計畫上如果有尚未被unicode所支援的方塊字,現在只有一種解決辦法,就是使用圖檔,問題是必須手工繪製而且該文字的資訊無法被排序(indexing)、搜尋(search)、交換(exchange ,copy paste到別的網站,圖片就消失了,文章裡面的缺字就變成空白)。
  • Who would benefit:
      1. en-Mostly the contributors and readers of Chinese Wikisource. However, if this way is available, all Wikimedia projects in languages that use Han characters will be benefited. (such as Japanese, Vietnamese, Korean, and Chinese dialects version like Classical Chinese, Hakka, Wu, or Gan., )
      2. Further more, even Wikipedia (Zh Wikipedia already used a lot of lacking characters,now .) and Wiktionary also are benefited.
      3. Other 2D composite characters writing system: For instance, Ancient Egypt and Maya.
    • zh-最主要會受到益處的,是中文維基文庫的編者與讀者,但使用漢字的維基媒體計畫將來都可以受益(如日文、越南文、韓文、中文、文言文、客家語、吳語、贛語等),甚至未來其他語言的維基辭典、維基百科。
  • Proposed solution:
    • en-Unicode IDS -Ideographic Description Sequence- defined how to composite Han character with components. We implement the function to dynamically render Han character with Ideographic Description Sequences(IDS) and extension in WikiSource like: <ids>⿺辶⿴宀⿱珤⿰隹⿰貝招</ids> It will generate a Han character image file(now rendered on the temporary server on wmflabs ) with IDS in metadata. This is a solution to resolve lacking of Han characters problem on all C/J/K/V books. The basis is that Han characters are not as the same level as European alphabets,but words. Han characters are an open set. They are composited on 2 dimension by more basic components which owns basic element ,like "affix" in English (English words are composite on 1 dimension). In academies,components based Han character composite technology are developed and adapted to handle ancient Han books. The most famous are Academia Sinica 's development and cbeta Sutras plan. Recent years, opensource IDS renders are developed stable, so we can use the same technology to benifit Wikisource for handling Han ancient books as the same as those academies.
    • 漢字的特殊性在於字並非像拼音字母由少數的字母以一維空間(1D)構成,而是以更多的基本表音或表意的「部件」( Components)以2維方式在一個方塊空間內組合(compisite)而成,主要組合方式是水平組合、垂直組合、包圍組合。基本的研究,在1970年代開始,台灣中研院開始進行這方面的研究,有很豐碩的成果,而後就被應用在有超巨量缺字的cbeta佛典計畫(把日本大正藏佛典數位化)等計畫。
      而後,Unicode標準裡面也推出了Ideographic Description Sequence(IDS)規範,制定了IDC(組字符)而且是以符合電腦文字處理的先序(prefix)結構設計,從此之後,在各學術機構的中文研究(例如四庫全書,裝滿四個倉庫的圖書館的一堆書)或者佛學研究,就開始活用IDS ,催生動態組字技術來解決其缺字問題。
      過去,這類技術都在學術界內使用,最近10年,才有通用用途的開放原碼的動態組字引擎陸續研發。台灣在地的維基協會發現現在有很好的進步引擎:漢字組建,遂提出解決方案-han3_ji7_tsoo1_kian3 呈像伺服器rendering server + IDS extension,漢字缺字可以呈現、可以被搜尋定序、可以被交換。
  • More comments:
    • en-There are couple of tests in the test wiki.
    • zh-目前已經有不少測試了,詳見測試維基頁
  • Phabricator tickets:
  • Proposer: Liang(WMTW) (talk) 16:07, 10 November 2016 (UTC)[reply]

Community discussion

@Shangkuanlc: I'm trying to understand this proposal named "Create new Han Characters with IDS extension for WikiSource" and the related Phabricator task which is about deploying (making available) the IDS extension on Wikimedia sites like WikiSource, so I have to ask for clarification when it comes to the proposed solution: Do you ask WMF's Community Tech team to create new Han characters with the IDS extension? Or the WMF's Community Tech team to extend the IDS extension to allow authors to create new Han characters? Or to deploy the IDS extension on WikiSource (which would be the same request as in the Phabricator task)? Thanks for clarifying! --AKlapper (WMF) (talk) 20:00, 14 November 2016 (UTC)[reply]

@AKlapper (WMF): I have asked the IDS extension programmer, he says we need the latter two of what you mentioned, and basically those two action items actually means the same thing -- editors can create new or ancient characters through the ids service, so deploy the software would be the easiest solution. --Liang(WMTW) (talk) 17:04, 15 November 2016 (UTC)[reply]

Some characters in s:zh:template:SKchar and s:zh:template:SKchar2 may be displayed using it.--維基小霸王 (talk) 07:31, 19 November 2016 (UTC)[reply]

Voting – New Han characters

  1. Support Support--Shizhao (talk) 03:19, 28 November 2016 (UTC)[reply]
  2. Strong Support Support--Liang(WMTW) (talk) 07:32, 1 December 2016 (UTC)[reply]
  3. Support Support--魔法設計師(Shoichi) (talk) 08:39, 1 December 2016 (UTC)[reply]
  4. Support Support--Billxu0521 (talk) 08:52, 1 December 2016 (UTC)[reply]
  5. Support Support--Yannmaco (talk) 09:52, 1 December 2016 (UTC)[reply]
  6. Support Support--S099001 (talk) 12:08, 1 December 2016 (UTC)[reply]
  7. Support Support--Wing (talk) 12:32, 1 December 2016 (UTC)[reply]
  8. Support Support--Honmingjun (talk) 12:42, 1 December 2016 (UTC)[reply]
  9. Support Support--JM99 (talk) 13:40, 1 December 2016 (UTC)[reply]
  10. Support Support--Csjh21010 (talk) 14:06, 1 December 2016 (UTC)[reply]
  11. Support Support--Fweng322 (talk) 14:13, 1 December 2016 (UTC)[reply]
  12. Support Support- Earth Saver (talk) at 14:32, 1 December 2016 (UTC)[reply]
  13. Support Support--Hwayang (talk) 15:51, 1 December 2016 (UTC)[reply]
  14. Support Support--Tsuna Lu (talk) 16:05, 1 December 2016 (UTC)[reply]
  15. Support Support--Wolfch (talk) 17:13, 1 December 2016 (UTC)[reply]
  16. Support Support--維基小霸王 (talk) 00:01, 2 December 2016 (UTC)[reply]
  17. Support Support--CYLu (talk) 03:28, 2 December 2016 (UTC)[reply]
  18. Support Support--John123521 (talk) 06:21, 2 December 2016 (UTC)[reply]
  19. Support Support--Micru (talk) 08:18, 2 December 2016 (UTC)[reply]
  20. Support SupportJc86035 (talk) 11:23, 2 December 2016 (UTC)[reply]
  21. Support Support--RJ-king (talk) 12:21, 2 December 2016 (UTC)[reply]
  22. Support Support--Alex S.H. Lin 12:23, 2 December 2016 (UTC)[reply]
  23. Support Support--David675566 (talk) 12:25, 2 December 2016 (UTC)[reply]
  24. Support Support--BobChao (talk) 12:27, 2 December 2016 (UTC)[reply]
  25. Support Support--Vel c (talk) 12:30, 2 December 2016 (UTC)[reply]
  26. Support Support--Medicalwei (talk) 12:33, 2 December 2016 (UTC)[reply]
  27. Support Support--Freedman.tw (talk) 16:14, 2 December 2016 (UTC)[reply]
  28. Support Support--AddisWang (talk) 02:34, 3 December 2016 (UTC)[reply]
  29. Support Support--Zerng07 (talk) 04:51, 5 December 2016 (UTC)[reply]
  30. Support Support--Goldie_lin (talk) 05:06, 5 December 2016 (UTC)[reply]
  31. Support Support--Subscriptshoe9 (talk) 13:01, 3 December 2016 (UTC)[reply]
  32. Support Support--Supaplex (talk) 15:37, 3 December 2016 (UTC)[reply]
  33. Support Support Pamputt (talk) 10:46, 4 December 2016 (UTC)[reply]
  34. Support Support- I am Davidzdh. 12:44, 4 December 2016 (UTC)[reply]
  35. Support Support--Jasonzhuocn (talk) 05:10, 5 December 2016 (UTC)[reply]
  36. Support Support--Reke (talk) 06:45, 5 December 2016 (UTC)[reply]
  37. Support Support--KOKUYO (talk) 08:06, 5 December 2016 (UTC)[reply]
  38. Support Support--Sean9064 (talk) 10:08, 5 December 2016 (UTC)[reply]
  39. Support Support ... --Liuxinyu970226 (talk) 10:21, 5 December 2016 (UTC)[reply]
  40. Support Support--S8321414 (talk) 10:34, 5 December 2016 (UTC)[reply]
  41. Support Support--Seadog007 (talk) 14:02, 5 December 2016 (UTC)[reply]
  42. Support Support--Toppy368 (talk) 15:44, 5 December 2016 (UTC)[reply]
  43. Support Support--林博仁 (talk) 17:48, 5 December 2016 (UTC)[reply]
  44. Question Question: Are there so many people interested in these characters? What works are we hosting in this language? I suppose it seems like a good idea if we actually have a body of these works. Blue Rasberry (talk) 18:36, 6 December 2016 (UTC)[reply]
    @Bluerasberry:The idea of this proposal is to make the Wikimedia projects available to use the <ids> extension to make newly or ancient created han characters which has not included in unicode lists be more accessible (easier to search, index... etc). The idea is not only to create some characters, but to make the system of character creation available in Wikimedia projects. And about you mentioning "body of these works", yes, there is a work of Taiwanese-Mandarin dictionary, which donated by a passed away professor in Taiwan, will be most benefited by this <ids> technique available. You may see further discussion on google groups and meta. I hope I have answered you question, let me know if it's not clear. --Liang(WMTW) (talk) 04:22, 7 December 2016 (UTC)[reply]
    @Bluerasberry:There are still many books including en:Siku_Quanshu (36,381 books, very important and classic in east-Asia) ,its missing Han characters are not 10 or hundreds but tens of thousands. Until now its digitalization still depends on special software with IDS supporting in academies. In the past, not everyone can touch and read them on computers (even through internet) . With IDS rendering technology, books with large missing Han characters can also be put in Wikisource. Missing Han characters are also indexed,exchange-able ,searchable. I think that it's also very meaningful for WMF movement in asia. --魔法設計師(Shoichi) (talk) 14:34, 7 December 2016 (UTC)[reply]
    Support Support It seems like there are people who have identified books which cannot be converted to digital form without support for these characters. These books are famous and of cultural significance, so this is not just a matter of archiving works which would not be popular but actually a chance to make classical works more available to more people. Since there are already books anticipated for this, and since there is already a community organized to engage with this character support, then this seems like a good idea. Blue Rasberry (talk) 14:47, 7 December 2016 (UTC)[reply]
  45. Support Support--Jesus estw (talk) 19:18, 6 December 2016 (UTC)[reply]
  46. Support Support--A2093064 (talk) 10:09, 8 December 2016 (UTC)[reply]
  47. Support Support given how important the Chinese language is on a global scale. This, that and the other (talk) 13:59, 8 December 2016 (UTC)[reply]
  48. Support Support --MoonYaksha月夜叉 01:21, 9 December 2016 (UTC)[reply]
  49. Support Support Risker (talk) 03:10, 9 December 2016 (UTC)[reply]
  50. Support Support --Edhral 06:29, 9 December 2016 (UTC)[reply]
  51. Support Support --10:21, 9 December 2016 (UTC)
  52. Support Support--Lt2818 (talk) 10:37, 9 December 2016 (UTC)[reply]
  53. Support Support--Bowleerin (talk) 13:18, 10 December 2016 (UTC)[reply]

Make the Page proofreading interface easier to use

  • Problem: The Page interface wastes screen space on tools that aren't used for proofreading, and splits the useful tools between the top and the bottom of the proofread page. You have to scroll up and down to access the useful tools. The option for setting the page status is under the window, so you have to scroll down to use it; when you're done, you can't just hit tab to get into the edit summary, and quickly move on.
  • Who would benefit: all proofreading contributors
  • Proposed solution: To implement three principles:
  1. maximize screen area devoided to edit textarea and to front image hiding anything unuseful for proofreading;
  2. move usual tools into fixed, compact areas (top/bottom of the screen); tools area should not scroll
  3. wrap unusual tools into draggable boxes that can be toggled into/ouside visibility
  • More comments: an excellent first step to get result 1 is FullScreenEditing script by Samwilson. A running example of "draggable tools" is gadget "diacritici", recently implemented into la.souce.
  • Phabricator tickets:

Community discussion

  • @Alex brollo: these sound like good ideas. Do you think the crux of this problem is something like "the proofreading interface does not make optimal use of the browser window" or similar? That it's too cluttered with UI elements that don't pertain to proofreading? The tools-shouldn't-scroll rule is great, I reckon. :-) But yeah, I think we need to clarify the title and problem statement here, so people know what they're voting for. Sam Wilson 08:14, 18 November 2016 (UTC)[reply]
@Samwilson: My English is rather poor, please feel free to change anything. An inspiring example for a good edit interface, focused on proofreading, is the Distributed Proofreaders one; nsPage edit page, on the contrary, is the same used for all wikimedia projects, with minor changes. --Alex brollo (talk) 08:40, 18 November 2016 (UTC)[reply]
Hello, I can confirm that the Wikimedia Foundation has a functional prototype of this feature. --NaBUru38 (talk) 21:49, 10 December 2016 (UTC)[reply]

Voting – Deeply review nsPage edit interface

  1. Support Support--Wesalius (talk) 08:20, 28 November 2016 (UTC)[reply]
  2. Strong support Strong support --Ninovolador (talk) 22:34, 29 November 2016 (UTC)[reply]
  3. Support Support --Alex brollo (talk) 09:57, 30 November 2016 (UTC)[reply]
  4. Support Support --Shubha (talk) 10:28, 2 December 2016 (UTC)[reply]
  5. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]
  6. Support Support --Continua Evoluzione (talk) 10:28, 5 December 2016 (UTC)[reply]
  7. Support Support --NaBUru38 (talk)
  8. Support Support --Yann (talk) 22:56, 12 December 2016 (UTC)[reply]

Delete all NS:Page while deleting an index file

  • Problem: When an Index page is deleted, NS:page is not deleted simultaneously
  • Who would benefit: Wikisource Admins
  • Proposed solution: Option for Deletion of all NS:Page while deleting an Index page.

Community discussion

Good idea. oh yes... --Hsarrazin (talk) 13:35, 10 November 2016 (UTC)[reply]
+1, I already commented on the task but isn't this just an option to re-activate ? (@Quiddity (WMF): any thought ?) Cdlt, VIGNERON * discut. 16:59, 14 November 2016 (UTC)[reply]
  • @VIGNERON: Do you have any more info about the 'delete all subpages' feature? I can't find it anywhere. Anyway, this proposal can still stand, because there's also the added convenience of not having to create the dummy top-level page. Sam Wilson 00:40, 29 November 2016 (UTC)[reply]
  • @Samwilson: sorry, can't remember... I think it disappeared more than (around) 3 years ago.

Voting – Delete all NS:Page

  1. Support Support. --Consulnico (talk) 17:45, 28 November 2016 (UTC)[reply]
  2. Support Support VIGNERON * discut. 09:00, 29 November 2016 (UTC)[reply]
  3. Support Support ShakespeareFan00 (talk) 20:55, 29 November 2016 (UTC)[reply]
  4. Support Support --Ninovolador (talk) 22:35, 29 November 2016 (UTC) I don't think that is such a BIIIIG hack, but it would help the WS admins A LOT[reply]
  5. Support Support --Alexmar983 (talk) 06:14, 30 November 2016 (UTC)[reply]
  6. Strong support Strong support! - εΔω 20:53, 1 December 2016 (UTC)
  7. Support Support --Shubha (talk) 10:29, 2 December 2016 (UTC)[reply]
  8. Support Support --Framawiki (talk) 20:55, 2 December 2016 (UTC)[reply]
  9. Support Support Pamputt (talk) 10:47, 4 December 2016 (UTC)[reply]
  10. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]

Fix Extension:Cite to allow tags and other functionality to work within ref tags

  • Problem: Extension:Cite has major issues where tags are used within tags, and pipe tricks don't work
  • Who would benefit: all wiki and mediawikis
  • Proposed solution: fix extension:cite !!!
  • More comments: here are long standing comments, and it is time that there was a plan to update and fix the extension, one of the most widely utilised extensions across wikimedia sites

Community discussion

@Billinghurst: Could you please try to name the "foibles" in the summary of this proposal (otherwise we would end up with indistinguishable generic "Fix $something to get rid of foibles" summaries for many proposals), describe who (groups/categories of users) would benefit, and describe an actual potential solution? I'm asking as proposals should be as specific as possible and explain what the problem is and who is affected by it. Thanks a lot in advance! --AKlapper (WMF) (talk) 13:20, 9 November 2016 (UTC)[reply]

The foibles are detailed in the phab bug tickets. I tried to change the summary, but I may not have characterized it correctly. - Jonesey95 (talk) 06:29, 10 November 2016 (UTC)[reply]

@AKlapper (WMF):. As Jonesey95 says! I think that I could entitle this reqeust phabricator:4700 though that ticket itself is quite imposing. As the bugmeister if you can lead us on how we can migrate ye olde bug 4700 to a series of concrete components, however, my gut feel is that the words "complete rebuild" and "anachronistic mess" and "ugh!" all swim around this matter. It needs a path to improvement. All that said I can list those items that I face.

Typical examples

  • template substitution fails inside extensions custom tags like <ref> (and <poem>). I have no idea whether that is an issue with substitution or the tags or extensions.
  • pipe trick fails within ref tags, so a common action like an author link like [[Author:John Doe|]] does not work and becomes in operative non-link

 — billinghurst sDrewth 10:21, 10 November 2016 (UTC)

That phab task has a patch, which could be code reviewed and merged. -- DannyH (WMF) (talk) 01:05, 23 November 2016 (UTC)[reply]

Voting – Fix Extension:Cite

  1. Support Support --Alex brollo (talk) 07:52, 30 November 2016 (UTC)[reply]
  2. Support Support--Jayantanth (talk) 20:57, 4 December 2016 (UTC)[reply]
  3. Support Support, but this should not be buried in the WikiSource section; this affects all wikis. E.g., if you put a {{rs?|{{subst:DATE}}}} inside a <ref>...</ref>, the DATE template will not subst.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  18:23, 11 December 2016 (UTC)[reply]

2017

0 proposals, 0 contributors



XTools Edit Counter for Wikisource

  • Problem: There are not wikisource specific stats about user wise Proofread/validation
  • Who would benefit: Wikisource Community
  • Proposed solution: Need one tools
  • More comments:

Discussion

Voting

ProofreadPage extension in alternate namespaces

  • Problem: ProofreadPage elements, such as page numbers, "Source" link in navigation, etc. do not display in namespaces other than mainspace
  • Who would benefit: Wikisources with works in non-mainspace, such as user translations on English Wikisource
  • Proposed solution: Modify the ProofreadPage extension to allow its use in namespaces other than mainspace
  • More comments:

Discussion

Voting

Extend pag and num accessibility

  • Problem: {{{pag}}} and {{{num}}} are reserved parameters for proofread extension, logically linked to pagelist tag. It would be useful to extend their use, so that they can run anywhere.
  • Who would benefit: wikicode contributors
  • Proposed solution: to allow to pass to {{{pag}}} and {{{num}}} two additional optional data (index name, book page/file page) to get book page by file page and file page by book page dynamically using pagelist data, into any context.
  • More comments:
  • Phabricator tickets:

Discussion

Thank you for the proposal. I am not sure to understand what you want to have. Maybe API (maybe in Lua) that provides this kind of functions getPageTitleForFile(fileName, filePageNumber), getPageTitleForIndexAndPage(indexName, logicalPageNumber), getIndexTitleForPage(pageName), and getFilePageNumberForPage(pageName)? Tpt (talk) 10:59, 8 November 2017 (UTC)[reply]
Lua access to all data coming from Index page (all fields, pagelist relate table too) will be great. It.source uses a special Modulo:Dati/[baseIndexName] to save and use these data, see it:Template:Pg that uses data, but it's a local, do-it-yourself solution. --Alex brollo (talk) 14:49, 8 November 2017 (UTC)[reply]

Voting

Improve workflow for uploading books to Wikisource

  • Problem:
Uploading books to Wikisource is difficult.
In the current workflow you need to upload the file on Commons, then go to Wikisource and create the Index page (and you need to know the exact URL). :The files need to be DJVU, which has different layers for the scan and the text. This is important for tools like Match & Split (if the file is a PDF, this tool doesn't work).
More importantly, the current workflow (especially for library uploads) includes Internet Archive, and the famous IA-Upload tool. This tool is now fundamental for many libraries and uploaders, but it has several issues.
As Internet Archive stopped creating the DJVU files from his scans, the international community has struggled solving the issue of creating automatically a DJVU for uploading on Commons and then Wikisource.
This has created a situation where libraries love Internet Archive, want to use it, but then get stuck because they don't know how to create a DJVU for Wikisource, and the IA-Upload is bugged and fails often.
Summary
    • IA-Upload tool is bugged and fails often when creating DJVU files.
    • M&S doesn't work with PDF files.
    • Users do not expect to upload to Commons when transferring files from Internet Archive to Wikisource.
    • Upload to Internet Archive is an important feature expecially for GLAMs (ie. libraries).
  • Who would benefit:
    • all Wikisource communities, especially new users
    • new GLAMs (libraries and archives) who at the moment have an hard time coping with the Wiki ecosystem.
  • Proposed solution:
Improve the IA-Upload tool: https://tools.wmflabs.org/ia-upload/commons/init
The tool should be able to create good-quality DJVU from Archive files, and do not fail as often as it does now.
it should also hide, for the end-user, the uploading to Commons phase. The user should be able to upload a file on Internet Archive, and then use the ID of the file to directly create the Index page on Wikisource. We could have an "Advanced mode" that shows all the passages for experienced user, and a "Standard" one that makes things more simple.
  • More comments:

Discussion

Voting

Page status color code not always showing

  • Problem: Color code indicating the page status on the index page do not always show on French Wikisource. We have to purge the book page many times.
Problem started in mid-late 2016; before it was very rare that we had to purge to see the colors.
  • Who would benefit: This is counter intuitive for beginners. Documentation mentions the page color code but they do not show, and this is very confusing to new contributors. Reduce loss of time when editing a book, especially for advanced contributors.
  • Proposed solution: We should not have to purge the index page each time we display a book
  • More comments:

Discussion

  • I strongly endorse this. That's an annoying bug that I was thinking as unique to it.wikisource: if such behaviour is common to more projects it deserves an appropriate solution, and it needs it quickly: an index page is meant to show the state of any pages to let users decide transcribe, proofread or validate them. - εΔω 16:51, 23 November 2017 (UTC)

Voting

Improve export of electronic books

  • Problem: Imagine if Wikipedia pages could not display for many days, or would only be available once in a while for many weeks. Imagine if Wikipedia displayed pages with missing informations or scrambled information.
This is what visitors get when they download books from the French Wikisource. Visitors do not read books online in a browser. They want to download them on their reader in epub, mobi or pdf.
The current tool to export books in these formats has all those problems: last spring 2017, it was on and off for over a month; since october 2017, mobi format does not work, then pdf stopped working. I did not publish a book because the electronic formats has different problems. (I have made a list of these problems if required.)
  • Who would benefit: The end users, the visitors to Wikisource, by having access to high quality books. This would improve the credibility of Wikisource.
This export tool is the showcase of Wikisource. Contributors can be patient with system bugs, but visitors won’t be, and won’t come back.
The export tool is as important as the web site is.
  • Proposed solution: We need a professional tool, that runs and is supported 24/7, as the different wikimedia web sites are, by Wikimedia foundation professional developers.
The tool should support different possibilities of electronic book, and the evolution of ebooks technology.
The different bugs should be corrected.
  • More comments: There are not enough people in a small wiki to support and maintain such a tool.
Wikisource should not only be considered a web base platform: the ebooks are as important, and even more important for visitors.

Discussion

For some information the current problem (only the last episode of a long and sad road of problem) is phabricator:T178803. The problem is ongoing for almost a month now and we have a lot of complaints from readers. Cdlt, VIGNERON * discut. 13:01, 21 November 2017 (UTC)[reply]

Voting

Specify transcription completion with more granularity

  • Problem: Currently Wikisource revision system only allow to give a global status completion for the transcription, when a more flexible solution allowing multiple extensible criteria set would be welcome.
  • Who would benefit: Anybody interesting in having having fine granularity information about transcription completion status.
    • For giving a very concrete example, one might one to study evolution of hyphenation on a Wikisource corpus subset. But currently, the hyphenation is often dropped in the transcription process, and even when it is taken into account, there is no obvious way to query which transcriptions does that, or not, nor having an overview of the completion status for this criteria in the work completion overview.
      In this precise case, part of the problem might be solved through categories. For example, on the French Wikisource, there is the template Césure, which allow one to transcribe the text with hyphenation. It thereafter render the text hyphened when consulted in the Page namespace, and unhyphened otherwise like when it is transcluded in the main namespace. This template might add a category stating the page use it. However, also adding the level to which the page is completed regarding hyphenation criteria would be cumbersome, and it wouldn't allow quick overview of progression on this topic in the Livre (Work) namespace.
    • Additionally, this would avoid that pages stay in an "uncompleted" status when the transcription was done and reviewed but only the layout was not yet done to match the original page as close as possible. That's an interesting information. Indeed the transcription is not globally complete, but for a mere reading through the transclusion in the main namespace, that is wrong to state that the work is not complete.
  • Proposed solution:
    • Allow user to input status of transcription along an extensible set of parameters, like rates of sign matching, layout matching, and so on for stuff like tables and trees which might have a proper rendering but an improper html structure or the opposite.
    • Allow user to switch criteria in the transcription completion overview of the work
    • Possibly, a "global completion" criteria should provide a pondered mix of all existing criteria
  • More comments: This also pertains the remark of @Alex brollo: above about the true digitalization of a edition.
  • Phabricator tickets:

Discussion

Voting

Create new Han Characters with IDS extension for WikiSource

  • Problem: Han-character (en:logogram, include en:Chinese Characters, en:Hanja, and en:Kanji)- is widely used in East Asia (China, Taiwan, Singapore, Mandarin area in Malaysia, HongKong, Japan, Korea, Taiwan and Vietnam). An enduring problem unsolved for digital archiving is "lacking of characters". Not only for characters in ancient books, even modern publications lacks for characters ( i.e. Some authors may created 300-400 unique new characters in certain books). It's difficult to deal when we archive them into WikiSource. Unicode gradually add new characters into the chart, but new Uni-han extension always takes time to go live. In the past WikiSource,even Wikipedia, used to deal this problem with image files to present those characters. But images cannot be indexed, unsearchable, even not exchangeable between computer systems.
  • Who would benefit: Mostly the contributors and readers of Chinese Wikisource. However, if this way is available, all Wikimedia projects in languages that use Han characters will be benefited. (such as Japanese, Vietnamese, Korean, and Chinese dialects version like Classical Chinese, Hakka, Wu, or Gan., )
    1. Further more, even Wikipedia (Zh Wikipedia already used a lot of lacking characters,now .) and Wiktionary also are benefited.
    2. Other 2D composite characters writing system: For instance, Ancient Egypt and Maya.
  • Proposed solution: Unicode IDS -Ideographic Description Sequence- defined how to composite Han character with components. We implement the function to dynamically render Han character with Ideographic Description Sequences(IDS) and extension in WikiSource like: <ids>⿺辶⿴宀⿱珤⿰隹⿰貝招</ids> It will generate a Han character image file(now rendered on the temporary server on wmflabs ) with IDS in metadata. This is a solution to resolve lacking of Han characters problem on all C/J/K/V books. The basis is that Han characters are not as the same level as European alphabets,but words. Han characters are an open set. They are composited on 2 dimension by more basic components which owns basic element ,like "affix" in English (English words are composite on 1 dimension). In academies,components based Han character composite technology are developed and adapted to handle ancient Han books. The most famous are Academia Sinica 's development and cbeta Sutras plan. Recent years, opensource IDS renders are developed stable, so we can use the same technology to benifit Wikisource for handling Han ancient books as the same as those academies.
  • More comments:

Discussion

  • IMO there's no reason to limit this to Wikisource, as Wiktionary could also benefit a lot from this. NMaia (talk) 00:35, 28 November 2017 (UTC)[reply]
  • Question Question: I support the general need to display unencoded characters. However, personally I think the quality of the generated characters is regretfully a bit substandard. Simply compressing each component together into a block is not aesthetic. Using images instead of web-fonts in this day and age is also suboptimal (even if it is SVG).
    The creator of this extension has probably poured their heart and soul into creating it, but may I suggest some sort of partnership with GlyphWiki instead? It is a website designed for hosting hanzi. Glyphs can be manually created and stored under IDS names, and the glyphs can be used in fonts. GlyphWiki supports generation of webfonts. Suzukaze-c (talk) 03:01, 3 December 2017 (UTC)[reply]

Voting

Offer PDF export of original pagination of entire books

  • Problem: Presently PDF conversion of proofread wikisource books doesn't mirrors original pagination and page design of original edition, since it comes from ns0 transclusion.
  • Who would benefit: Offline readers.
  • Proposed solution: To build an alternative PDF coming from conversion, page for page, of nsPage namespace.
  • More comments: Some wikisource contributors think that nsIndex and nsPage are simply "transcription tools"; I think that they are much more - they are the true digitalization of a edition, while ns0 transclusioni is something like a new edition.

Discussion

@Samwilson: Yes, perfect, thank you! --Alex brollo (talk) 07:49, 22 November 2017 (UTC)[reply]

Voting

2019

0 proposals, 0 contributors, support votes
The survey has closed. Thanks for your participation :)



Ajax editing for nsPage

Français: Des outils Ajax pour l'espace Page.
  • Problem: The editing into nsPage are much slower than needed - a lot of valuable user time is wasted dealing with "easy" edits.
    Français: Les modifications apportées à nsPage sont beaucoup plus lentes que nécessaire: un temps précieux est perdu à gérer des modifications "faciles".
  • Who would benefit: Many wikisource contributors but the beginners.
    Français: Beaucoup de contributeurs à wikisource sauf les débutants.
  • Proposed solution: An AJAX environment for both edit & view can fasten edit a lot; heavy tools and settings would be loaded once for a large sequence of edits (just as wikidata does). Consider that edit conflicts are very infrequent in nsPage. A successful gadget based on AJAX edit/preview is running into itwikisource but it's a "do-it-yourself" tool.
    Français: Un environnement AJAX pour l'édition et l'affichage peut améliorer grandement la vitesse d’édition; les outils lourds et les paramètres sont chargés une seule fois pour une grande séquence de modifications (comme le fait wikidata). Considérez que les conflits d’édition sont très rares dans nsPage. Un gadget basé sur l'édition / prévisualisation AJAX fonctionne bien avec itwikisource, mais il s'agit d'un outil "à faire soi-même".
  • More comments:
  • Phabricator tickets:

Discussion

Could you provide a link to the itwikisource tool? MaxSem (WMF) (talk) 22:11, 30 October 2018 (UTC)[reply]

Sure: https://it.wikisource.org/wiki/MediaWiki:Gadget-eis.js . But please catche the rough idea, the code runs and is used by many users, but it is a DIY (do-it-yourself) code. Its name is eis from "edit in sequence"--Alex brollo (talk) 19:34, 4 November 2018 (UTC)[reply]

Voting

Improve export of electronic books

Original title (Français): Améliorer l'exportation des versions électroniques des livres
  • Problem: Imagine if Wikipedia pages could not display for many days, or would only be available once in a while for many weeks. Imagine if Wikipedia displayed pages with missing information or scrambled information. This is what visitors get when they download books from the French Wikisource. Visitors do not read books online in a browser. They want to download them on their reader in epub, mobi or pdf. The current tool (Wsexport) to export books in these formats has all those problems : on spring 2017, it was on and off for over a month ; after october 2017, mobi format did not work, then pdf stopped working. These problems still continue on and off.
    Français: Imaginez si les pages Wikipédia ne s’affichaient pas pour plusieurs jours, ou n’étaient disponibles que de façon aléatoire durant plusieurs jours. Imaginez si sur les pages Wikipédia certaines informations ne s’affichaient pas ou était affichées tout croche. C’est la situation qui se produit pour les visiteurs qui désirent télécharger nos livres. Les visiteurs ne lisent pas les livres en ligne dans un navigateur, ils désirent les télécharger sur leurs lecteurs en epub, mobi ou pdf. L’outil actuel (Wsexport) permettant l’export dans ces formats possède tous ces problèmes: au printemps 2017, il fonctionnait de façon aléatoire durant un mois; depuis octobre 2017, le format mobi puis pdf ont cessé de fonctionner. Ces problèmes continuent de façon aléatoire.
  • Who would benefit: The end users, the visitors to Wikisource, by having access to high quality books. This would improve the credibility of Wikisource.

    This export tool is the showcase of Wikisource. Contributors can be patient with system bugs, but visitors won’t be, and won’t come back.

    The export tool is as important as the web site is.

    Français: L’utilisateur final, le visiteur de Wikisource, en ayant accès à des livres de haute qualité. Ceci contribuerait à améliorer la crédibilité de Wikisource. L’outil d´exportation est une vitrine pour Wikisource. Les contributeurs peuvent être patients avec les anomalies de système, mais les visiteurs ne le seront peut-être pas et ne reviendront pas. L’outil d’exportation est tout aussi important que le site web.
  • Proposed solution: We need a professional tool, that runs and is supported 24/7, as the different Wikimedia web sites are, by Wikimedia foundation professional developers.

    The tool should support different possibilities of electronic book, and the evolution of ebooks technology.

    The different bugs should be corrected.

    Français: Nous avons besoin d’un outil professionnel, fonctionnant et étant supporté 24/7, comme tous les différents sites Wikimedia, par les développeurs professionnels de la fondation Wikimedia. Les différentes anomalies doivent être corrigées.
  • More comments: There are not enough people in a small wiki (even French, Polish or English Wikisource) to support and maintain such a tool.
    Français: Nous ne sommes pas assez nombreux dans les petits wiki (même Wikisource Français, Polonais ou Anglais) pour supporter une telle application.

Discussion

When it comes to PDF format, mw:Reading/Web/PDF Functionality is probably also related here? --AKlapper (WMF) (talk) 12:46, 7 November 2018 (UTC)[reply]

Yes it is a "related tool" mw:Reading/Web/PDF Functionality is not covering Wsexport features: Wikisource books are split in multiple subpages and Wsexport is able to know automatically which pages should be included and in which order. It also properly attributes the proofreaders of the Page: pages and extract the relevant metadata (e.g. the author is the original author of the work and not the Wikisource contributors).
On the technical side about Wsexport: its codebase is mostly derived from a quick PHP hack and is not able to scale with the current load on the restricted tools labs capacities. A full rewrite is I believe required in order to get this tool in a working state. Tpt (talk) 09:14, 10 November 2018 (UTC)[reply]
+1 we need to increase reading capability. downloading in e-reader format would expand off-line reader base, increasing use-ability. will require some community management. Slowking4 (talk) 22:00, 15 November 2018 (UTC)[reply]

As a writer of science (Mathematics and Physics) books, it is important for me, to be able to extract these books also in odt form. I have already written three books in both forms (wiki and odt) and the additional effort to do this was not exactly sparse. Of course I need the odt Format, in order to be able to change contents according to the needs of my pupils any time, without needing to change the wiki project. If extraction in odt format exists, then it is very easy to make a pdf out of it (the opposite is not exactly so easy). I already tried to create a Star-Basic macro, that should do part of the job, the wiki-programmers can maybe find there ideas for a wiki2odt converter. If this in not possible, a working pdf converter would also suffice... Yomomo (talk) 05:25, 20 November 2018 (UTC)[reply]

Voting

Create integrated interwiki mechanism for Wikisource

Français: Créer un mécanisme interwiki intégré spécifique pour Wikisource.
  • Problem: Interwiki mechanism based on a single Wikidata item used in Wikipedia is not suitable for Wikisource. Wikisource can present multiple editions of the same text as well as multiple translations to a single language, eg. made by various translators. The Wikidata model used to store information from Wikisource is two-level, based on "work" and "edition" Wikidata elements. The purpose of this proposal is to create an implementation of link-based interwiki system that used this model and is integrated with MediaWiki.
    Français: Le mécanisme interwiki basé sur un seul élément Wikidata utilisé dans Wikipedia ne convient pas pour Wikisource. Wikisource peut présenter plusieurs éditions du même texte ainsi que plusieurs traductions dans une seule langue, par exemple, fait par divers traducteurs. Le modèle Wikidata utilisé pour stocker les informations de Wikisource est à deux niveaux, basé sur les éléments "travail" et "édition" de Wikidata. Le but de cette proposition est de créer une implémentation du système interwiki basé sur les liens qui utilise ce modèle et qui est intégrée à MediaWiki.
  • Who would benefit: All Wikisources
    Français: Tous les Wikisources
  • Proposed solution: JavaScript based implementation is used in Swedish Wikisource. However, links created using JavaScript are visible only by browsers with JavaScript enabled. If the mechanism is integrated with MediaWiki, the links are available to any HTML parsing tool, eg. indexing machines. Another disadvantage of the Swedish solutiuon is that it needs to be maintained separately by each Wikisource. Wikisource communities are small and have no resources to do this. The suggested solution is to make the links integrated with page's HTML code by MediaWiki.
    Français: L'implémentation basée sur JavaScript est utilisée dans Wikisource suédois. Cependant, les liens créés en utilisant JavaScript sont visible uniquement par les navigateurs avec JavaScript activé. Si le mécanisme est intégré à MediaWiki, les liens sont disponibles pour n’importe quel fichier HTML, outil d'analyse, par exemple, machines d'indexation. Un autre inconvénient de la solution suédoise est qu’il doit être entretenu séparément par chaque Wikisource. Les communautés Wikisource sont petites et n'ont pas ressources pour le faire. La solution suggérée est de faire les liens intégré au code HTML de la page par MediaWiki.
  • More comments: In pre-Wikidata interwiki implementation, multiple interwikis to a single wiki worked fine and were used in Wikisource. So invention of Wikidata became actually a degradation of interwiki system in Wikisources.
    Français: Dans l'implémentation interwiki antérieure à Wikidata, plusieurs interwikis sur un seul wiki fonctionnait bien et était utilisé dans Wikisource. Ainsi, l'invention de Wikidata est devenue en réalité une dégradation du système d'interwiki dans Wikisources.
  • Phabricator tickets: phab:T128173, phab:T180304
  • Proposer: Ankry (talk) 16:04, 10 November 2018 (UTC)[reply]

Discussion

Voting

Diacritics editing tool

Français: Un outil pour plus de diacritiques.
  • Problem: I't difficult and time-consuming to find Unicode for unusual characters and to build characters that have no Unicode (i. e. q̃)
    Français: Il est difficile et fastidieux de rechercher Unicode pour des caractères inhabituels et de créer des caractères sans Unicode ((i. e. q̃)
  • Who would benefit: all wikisource contributors
    Français: Tous les contributeurs wikisource
  • Proposed solution: it's possible to build a gadget to manipulate diacritics only (using standard string normalize property, decompose then compose)
    Français: Il est possible de créer un gadget pour manipuler uniquement les signes diacritiques (à l'aide de la propriété de normalisation de chaîne standard, décomposer puis composer)
  • More comments: A running draft tool to edit diacritics is running into it.wikisource.
    Français: it.wikisource est en train d'élaborer un brouillon pour éditer les signes diacritiques.
  • Phabricator tickets:
  • Proposer: Alex brollo (talk) 09:16, 6 November 2018 (UTC)[reply]

Discussion

@DChan (WMF): Any thoughts on this idea? Kaldari (talk) 18:38, 8 November 2018 (UTC)[reply]

A tool for Unicode IVS input is probably also good-to-have? C933103 (talk) 08:09, 9 November 2018 (UTC)[reply]
@Kaldari: Hmm, there are many different sets of diacritics, across different scripts. Collecting and maintaining that data would be significant work. On the other hand, we don't want to end up with many versions of this tool with different hard-coded sets of diacritics.
To avoid this extra burden, it may be worth extending the jQuery.IME rules format so that a rule set can specify a list of labelled buttons, each of which performs one of the substitutions in the rule set. In particular, its IPA-SIL rule set already contains a definition of the diacritic substitutions used in the it.wikisource script. DChan (WMF) (talk) 21:18, 20 November 2018 (UTC)[reply]

Voting

ProofreadPage extension in alternate namespaces

Français: Utiliser les outils de l'espace page dans d'autres espaces
  • Problem: ProofreadPage elements, such as page numbers, "Source" link in navigation, etc. do not display in namespaces other than mainspace
    Français: Les éléments de l’espace page, tels que les numéros de page, le lien "Source" dans la navigation, etc. ne s'affichent pas dans les espaces de noms autres que l’espace principal.
  • Who would benefit: Wikisources with works in non-mainspace, such as user translations on English Wikisource
    Français: Utilisateurs Wikisource qui font des travaux qui ne sont pas en espace principal, tels que des traductions utilisateur sur Wikisource anglaise
  • Proposed solution: Modify the ProofreadPage extension to allow its use in namespaces other than mainspace
    Français: Modifier l'extension de l'espace page, ProofreadPage, pour permettre son utilisation dans des espaces de noms autres que l’espace principal.
  • More comments: I also proposed this in 2017

Discussion

Voting

Wikisource Contest Tools

Français: Des outils pour concours et championnats.
  • Problem: There are so many tools about Wikipedia online Contest, but there are no tool for Wikisource, where we can judge the proofreading. It will very useful for all wikisource WS:PotM and other Wikisource contests.
    Français: Il existe plusieurs outils sur Wikipédia pour l’organisation de concours en ligne, mais aucun pour Wikisource avec lequel nous pourrions évaluer la correction. Il serait très utile d’en avoir pour les différents concours Wikisource.
  • Who would benefit: Wikisource Community
    Français: La communauté Wikisource
  • Proposed solution: https://tools.wmflabs.org/wscontest/
  • More comments:
  • Phabricator tickets: phab:T163060
  • Proposer: Jayantanth (talk) 09:18, 4 November 2018 (UTC)[reply]

Discussion

  • I started helping with this at the Hackathon in Barcelona this year, but rather dropped the ball after that (sorry!). It's still on my (volunteer-time) radar, and I'd love to do more with it soon. We also talked about adding these metrics to GrantMetrics, but it's a pretty different system: GM is about metrics per event, this is per-user-and-event. Not that it couldn't be incorporated, but it might not make sense. Certainly things like the "big red button" (T197772) for announcing winners wouldn't be in scope elsewhere I think. Sam Wilson 08:19, 5 November 2018 (UTC)[reply]

Voting

Interface to display two (or more) different text from same (or different) wikisource side by side

Français:Interface permettant d'afficher deux (ou plus) textes différents provenant du même (ou différent) wikisource côte à côte
  • Problem:
    1. When a source document is in multiple languages/ancient form of language/alternative script version of a language/follow an alternative orthography/written in another language, it is useful to read the source and a translated-transliterated-modernized version of the text side-by-side, or sometimes even line-by-line.
    Français: Lorsqu'un document source est dans plusieurs langues / forme ancienne de la langue / version de script alternative d'une langue / suit une orthographe alternative / écrit dans une autre langue, il est utile de lire la source et une version traduite-translittérée-modernisée du texte côte à côte, et parfois même ligne par ligne.
    1. Some wiki try to solve the problem by hard-coding source document into different alternative version of a page, and/or having multiple versions of same text inside same page, and that create additional problem of verifiability
    Français: Certains wiki tentent de résoudre le problème en codant en dur le document source dans une version alternative différente d'une page et / ou en utilisant plusieurs versions du même texte dans la même page, ce qui crée un problème supplémentaire de vérifiabilité.
  • Who would benefit: all wikisource users
    Français: Tous les utilisateurs de wikisource.
  • Proposed solution: Create an interface that can allow the display of two (or more) different text from same (or different) wikisource side by side, either by arrangement by editor or by selection by users.
    Français: Créez une interface permettant d’afficher côte à côte deux (ou plus) textes différents provenant du même (ou différent) wikisource, soit par arrangement par éditeur, soit par sélection par les utilisateurs.
  • More comments:
  • Phabricator tickets:
  • Proposer: C933103 (talk) 01:22, 9 November 2018 (UTC)[reply]

Discussion

@C933103: Isn't this Community Wishlist Survey 2019/Wikisource/Two windows view for editors? Jc86035 (talk) 14:04, 4 November 2018 (UTC)[reply]

@Jc86035: But this is for readers. I mentioned editors because in some cases automatic matching might not work so well and need editor to match and align those documents. C933103 (talk) 20:55, 4 November 2018 (UTC)[reply]

@C933103: On mul.source there is a shared script that does this kind of thing, it's the one called Compare.js. --Candalua (talk) 15:45, 5 November 2018 (UTC)[reply]

@Candalua: I see, that is useful, however there are some limitations to the script that wouldn't cut it when it come to achieving what I want to do:
  1. It requires the user to manually select which international language version would show up, instead of showing a specific language version by default (For instance for a English-French bilingual document, you would want to default it to showing English side by side with French)
  2. It seems like there is an upper limit of displaying up to two documents at a time and cannot display three or four of them together at the same time.
  3. It simply show two different pages side by side, and do not support line-be-line alignment of content of two document.
  4. It seems like the UI is limited to showing documents with interlanguage link and cannot be used to show e.g. alternative version of the document in same wiki.
C933103 (talk) 17:57, 5 November 2018 (UTC)[reply]
Point 4 can be done with a template that emulates interwikis, such as this plus a local script to load the link. Point 3 is a difficult task, that can probably be achieved only by putting line or position markers into every text so that markers can be matched side-by-side. But points 1 & 2 look feasible. It would be great to have such a functionality available by default. --Candalua (talk) 08:47, 7 November 2018 (UTC)[reply]
For point 3, it would only need to put those markers into relevant texts that are desired to have such effect, instead of literally every text. The reason why I am requesting this is that from what I see there are already some different texts on wikisources that have different versions of the same document aligned on the same page line by line using various different layouts. I think it would be easier to handle both as someone who would edit wikisource and also to people who want to copy thing from wikisource if they're separately stored in different pages and then display together than mixing all different versions together within same page. C933103 (talk) 11:35, 7 November 2018 (UTC)[reply]
Point 4 can be also done like this: https://bn.wikisource.org/s/ef54 -- Hrishikes (talk) 07:19, 18 November 2018 (UTC)[reply]

Voting

Offer PDF export of original pagination of entire books

Français: Pouvoir exporter en pdf en respectant la pagination de l'édition source.
  • Problem: Presently PDF conversion of proofread wikisource books doesn't mirrors original pagination and page design of original edition, since it comes from ns0 transclusion.
    Français: La conversion en PDF des livres Wikisource ne reflète pas la pagination et le design original des pages de l’édition originale, car la conversion provient de la transclusion et non des pages.
  • Who would benefit: Offline readers.
    Français: Lecteurs hors ligne.
  • Proposed solution: To build an alternative PDF coming from conversion, page for page, of nsPage namespace.
    Français: Élaborer un outil pour générer un PDF alternatif provenant d’une conversion page par page.
  • More comments: Some wikisource contributors think that nsIndex and nsPage are simply "transcription tools"; I think that they are much more - they are the true digitalization of a edition, while ns0 transclusioni is something like a new edition.
    Français: Certains contributeurs de wikisource pense que nsIndex et nsPage sont simplement des « outils de transcription » ; je pense qu’ils sont beaucoup plus que cela – ce sont la vraie numérisation d’une édition, tandis que la transclusion ns0 constitue une nouvelle édition.
  • Phabricator tickets: T179790
  • Proposer: Alex brollo (talk)

Discussion

  • With the ws-export tool getting error messages many a times it is better to have this functionality in to the default pdf download of the mediawiki. So it would be easy to download the book and associated subpages with it in the correct order. -- Balajijagadesh (talk) 06:48, 11 November 2018 (UTC)[reply]

Voting

Better import of descriptions of images with croptool

Français: Mieux importer les descriptions d'images avec l'outil croptool.
  • Problem: many images from wikisuorce books are not correctly or fully categorized and described. I am cross-wiki user so when I crop an image with the croptool on wikisource my efforts rarely stop there. I take care to improve categorization and description and I try to inform users about that. Many cropped images are probably underused because of this aspect, the files on commons lack any information regarding descriptions or categories.
    Français: De nombreuses images de livres wikisource ne sont pas correctement ou entièrement catégorisé et décrit. Je suis un utilisateur multi-wiki alors quand j’extrais et recadre une image avec l'outil « croptool » sur wikisource, mes efforts s'arrêtent rarement là.

    Je prends soin d’améliorer la catégorisation et la description et j’essaie d’informer les utilisateurs à ce sujet. De nombreuses images recadrées sont probablement sous-utilisé à cause de cet aspect, les dossiers sur commons ne possèdent aucune information concernant les descriptions ou les catégories.

  • Who would benefit: People who need to reuse cropped images in the future and should be able to find them.
    Français: Les utilisateurs ayant besoin de réutiliser les images extraites devrait pouvoir les retrouver.
  • Proposed solution: import with croptool the content of the caption of a figure on the commons file description.

    Improving description and categorization is a lot of manual work. I think that at least for the first aspect there is a shortcut. We use Template:FreedImg which contains a "caption" entry. My idea is that when croptool is used and saves an image on commons, it reads the language domain of the current wikisource, placing such iso language code in the description on the file and adding a description in that space based on the first "caption" string going from the "=" to the next vertical bar.

    The only thing I should do as a user is to save the code of the image before cropping, but this is something I already do and it's really not a problem, especially if it saves time later.

    For example in this page I would firstly write the code for the figure (with just plain text in the caption), and than croptool will read "it.wikisource.org" and "caption=tempio di minerva" and place under int:filedesc {{Information |description={{it|tempio di minerva}} ...}}

    This will work at least for the first or single image on a page. I don't mind using it only for the pages with just one image (maybe the caption import can be an option to select), and if it is too complicated at least I hope we can start to think about this issue in the framework of c:commons:structured data in the future because we can't keep uploading thousands of files with such poor descriptions.

    In any case the description line with the iso language code should always be added by croptool even if it is left empty, IMHO.

    Preliminary discussion at itwikisource was supportive of the idea.

    Français: Importer avec croptool le contenu de la légende d'une figure dans la description du fichier des biens communs. Améliorer la description et la catégorisation nécessite beaucoup de travail manuel. Je pense qu'au moins pour le premier aspect il y a un raccourci. Nous utilisons Template: FreedImg qui contient une entrée « légende » ("caption"). Mon idée est que, lorsque croptool est utilisé et enregistre une image sur commons, il lit le domaine de langage du wikisource actuel, en plaçant ce code de langage ISO dans la description du fichier et en ajoutant une description dans cet espace en fonction du premier "libellé" chaîne allant du "=" à la barre verticale suivante.

    La seule chose que je devrais faire en tant qu’utilisateur est de sauvegarder le code de l’image avant de recadrer, mais c’est quelque chose que je fais déjà et c’est vraiment pas un problème, surtout si cela fait gagner du temps plus tard.

    Par exemple, dans cette page, je commencerais par écrire le code de la figure (avec juste du texte en clair dans la légende), et que croptool lira "it.wikisource.org" et "caption = tempio di minerva" et placer sous int:filedesc {{Information |description={{it|tempio di minerva}} ...}}

    Cela fonctionnera au moins pour la première ou une seule image sur une page. Cela ne me dérange pas de l'utilisez que pour les pages avec une seule image (peut-être que l’importation de sous-titres peut être une option à sélectionner), et si c’est trop compliqué au moins j'espère que nous pourrons commencer à réfléchir à cette question dans le cadre de c:commons:structured data à l'avenir car nous ne pouvons pas continuer à télécharger des milliers de fichiers avec de telles descriptions médiocres. Dans tous les cas, la ligne de description avec le code de langue ISO devrait toujours être ajoutée par croptool même s'elle est laissée vide, à mon humble avis.

    Une discussion préliminaire à itwikisource soutient cette idée.

  • More comments:
  • Phabricator tickets:
  • Proposer: Alexmar983 (talk) 18:04, 9 November 2018 (UTC)[reply]

Discussion

  • Really CropTool is an excellent tool to import book illustrations; the problem is that a book illustration needs a set of metadata and categories very different from the book's one. The main problem is presently that CropTool uploads often a Book template for illustrations, that's completeìy unappropriate (a drawing/a picture is not a book!). I feel that the first step should be, a "translation" of Book template into a simplified and customized Info template. Next steps could be help user in the difficult task of good, and esay, descriptions and categorization. --Alex brollo (talk) 22:45, 9 November 2018 (UTC)[reply]
    Français:CropTool est un excellent outil pour importer des illustrations de livres. Le problème est qu'une illustration de livre nécessite un ensemble de métadonnées et de catégories très différentes de celles du livre. Actuellement, le principal problème est que CropTool télécharge souvent un modèle de livre pour les illustrations, ce qui est totalement inapproprié (un dessin / une image n'est pas un livre!). Je pense que la première étape devrait être une "traduction" du modèle de livre en un modèle d’informations simplifié et personnalisé. Les prochaines étapes pourraient consister à aider l’utilisateur dans la tâche difficile consistant à obtenir de bonnes descriptions et catégorisations.

Voting

XTools Edit Counter for Wikisource

Français: Compteur de modifications très amélioré.
  • Problem: There are not wikisource specific stats about user wise Proofread/validation. It is impossible to know stats about proofreading task. Wikisource workflow is different from Wikipedia. It could not be done by xtool. So we need specific Stats tools for Wikisource.
    Français: Il n’existe pas de statistiques spécifiques sur les correction/Validations par utilisateurs. Les processus de travail (workflow) de Wikisource diffèrent de ceux de Wikipédia. Ce ne peux être fait via par xtool. Nous avons besoin d’un outil statistique spécifique pour Wikisource.
  • Who would benefit: Whole Wikisource Community.
    Français: Toute la communauté Wikisource.
  • Proposed solution: Make a stats Tools for Wikisource specific.
    Français: Créer un outil de statistiques spécifiques à Wikisource.
  • More comments:
  • Phabricator tickets: phab:T173012
  • Proposer: Jayantanth (talk) 08:54, 4 November 2018 (UTC)[reply]

Discussion

  • Just a note: reliable analysis of page proofread status changes would require data from page header which is a part of a specific revision text and not available in the labs database copy, AFAIK. This might be a good question: how to get access to this data eficiently. Ankry (talk) 02:58, 18 November 2018 (UTC)[reply]

Voting

Enable book2scroll that works for all Wikisources

Français: Rendre book2scroll accessible sur tous les Wikisource.
  • Problem: book2scroll is not enable for all wikisource and not working for any non -latin wikisource. It is very useful for Page markinn numbering in index: pages any more..
    Français: book2scrool n’est pas activé pour tous les wikisources et ne fonctionne pas sur les wikisources non-latin. Cet outil est très utile pour la numérotation du marquage des Pages dans l’index:page.
  • Who would benefit: Whole Wikisource community.
    Français: Toute la communauté wikisource.
  • Proposed solution: problem is that this code is very old (as in Toolserver-old), and only works with some site naming schemes. Other languages don't work either for many titles.
    Français: Le problème est que le code est très anciens (??? as in Toolserver-old), et ne fonctionne que pour la nomenclature de nommage de certains sites et ne fonctionne pas pour plusieurs titres.
  • More comments:
  • Phabricator tickets: phab:T205549
  • Proposer: Jayantanth (talk) 08:43, 4 November 2018 (UTC)[reply]

Discussion

I just verified it on en.* and it appears to work fine. Lostinlodos (talk) 01:12, 20 November 2018 (UTC)[reply]

Currently enable for English French German Italian Latin Portuguese Belarusian Bengali and русский. But what about the others languages??
  • Français: Outil actif sur Wikisource Anglais, Français, Allemand, Italien, Latin, Bélarusse, Bengali et ??? Qu’en est-il des autres langues?

Jayantanth (talk) 19:45, 22 November 2018 (UTC)[reply]

Voting

2020



UI improvements on Wikisource

  • Problem: Big part of work on WS is proofreading of OCR texts. Wikitexteditor2010 have some useful functions, but these are divided in more tabs:
    • Advanced - there is very useful search and replace button
    • Special characters - there are many characters which are not on keyboard
    • Proofread tools (page namespace only) - some more tools.
    When I am working on some longer text from OCR, there are typical errors, which can be fixed by search and repace (e.g " -> “ or ii -> n) . So I must use first tab. Now there is missing character from another language, so I must switch to second tab and find this character. Then I find next typical error, so I must again switch to first...
  • Who would benefit: Wikisource editors, but useful for other projects too.
  • Proposed solution: Proofread is probably made mainly on desktops (notebooks) which have monitor wide enough to have all these tools on one tab without need of switching again and again
  • More comments:
  • Phabricator tickets:
  • Proposer: JAn Dudík (talk) 20:59, 22 October 2019 (UTC)[reply]

Discussion

Hi, did you know that you can customize the edittoolbar to your liking? See https://www.mediawiki.org/wiki/Manual:Custom_edit_buttons. Also I use a search-replace plugin directly in a browser as this works better for me. See e.g. https://chrome.google.com/webstore/detail/find-replace-for-text-edi/jajhdmnpiocpbpnlpejbgmpijgmoknnl https://addons.mozilla.org/en-US/firefox/addon/find-replace-for-text-editing/?src=search I use the chrome one and it works alright for simple stuff. For more advanced stuff I copy the text to notepad++/notepadqq/libreoffice writer and do the regex stuff there.--So9q (talk) 11:26, 25 October 2019 (UTC)[reply]

Very late to the party, but nothing has seemed to change. There is a need to customize the edit bar per book and not per user. Every book has special requirements for proofreading that are static. A book on Cicero’s letters may need quick access to Greek polytonic letters, but a book on Mediaeval poetry will have thorn/ Wynn/yogh and possibly long s. A book of poetry will need immediate access to the poem tag, but not a novel. Therefore, we need a scripting language to be able to set the edit bar according to the needs of the work. This is how typesetting used to work and it did an outstanding job for centuries. Languageseeker (talk) 19:04, 10 March 2021 (UTC)[reply]

Voting

Repair Index finder

  • Problem: It's rather similar to the first proposal on this page; that is, for at least a month, the Index finding thingy is broken; whatever title you put in it, it says something along the lines of "The index finder is broken. Sorry for the inconvenience." (This is just from memory!) It also gives a list of indexes, from the largest to the smallest. The compromise I at any rate am using now is the index-finder installed in the search engine.
  • Who would benefit: Everybody who wants to find an index.
  • Proposed solution: Somebody who has a good knowledge about bugs? I'm not good at wikicode!
  • More comments: Excuses for any vague terminology - I am writing via mobile.
  • Phabricator tickets: task T232710
  • Proposer: Orlando the Cat (talk) 07:00, 5 November 2019 (UTC)[reply]

Discussion

Voting

Enable book2scroll that works for all Wikisources

  • Problem: book2scroll is not enabled for all Wikisource and not working for any non -latin wikisource. It is very useful for Page marking numbering in index: pages any more..
    Français: book2scrool n’est pas activé pour tous les wikisources et ne fonctionne pas sur les wikisources non-latin. Cet outil est très utile pour la numérotation du marquage des Pages dans l’index:page.
  • Who would benefit: Whole Wikisource community.
    Français: Toute la communauté wikisource.
  • Proposed solution: problem is that this code is very old (as in Toolserver-old), and only works with some site naming schemes. Other languages don't work either for many titles.
    Français: Le problème est que le code est très anciens (??? as in Toolserver-old), et ne fonctionne que pour la nomenclature de nommage de certains sites et ne fonctionne pas pour plusieurs titres.
  • More comments: same as previous year list
  • Phabricator tickets: phab:T205549
  • Proposer: Jayantanth (talk) 15:58, 26 October 2019 (UTC)[reply]

Discussion

Voting

Migrate Wikisource specific edit tools from gadgets to Wikisource extension

  • Problem: There are many useful edit tools gadgets on some wikisources. Many of these should be used everywhere, but...
    • Not every user knows, he can import script from another wiki.
    • Some of these script cannot be only imported, they must be translated or localised.
    • Majority of users will search these tools on en.wikisource, but there are many scripts eg. on it.wikisource too
  • Who would benefit: Editors on other Wikisources
  • Proposed solution: Select the best tools across wikisources and integrate them as new functions.
  • More comments:
  • Phabricator tickets:
  • Proposer: JAn Dudík (talk) 13:24, 5 November 2019 (UTC)[reply]

Discussion

It would be good to point to these gadgets or describe the proposed process to choose and approve propositions of gadgets to integrate. --Wargo (talk) 21:35, 17 November 2019 (UTC)[reply]
1) Ask communities for the best tools on their wikisource
2) Make list of them, with comments, merge potentially duplicates
3) Ask communities again which ones should be integrated.
4) Make global version and integrate it (eg as beta function)
There is one problem, single-wikis gadgets are often hidden for others due language barrier etc. JAn Dudík (talk) 21:31, 18 November 2019 (UTC)[reply]

Voting

Batch move API

  • Problem: On Wikisource, the "atomic unit" is a work, consisting of a scanned book in the File: namespace, a set of transcribed pages in the Page: namespace, an index in the Index: namespace, and hopefully also one or more pages in mainspace that transcludes the pages for presentation. This is unlike something like a Wikipedia, where the atomic unit is the (single) page in mainspace, period.
    ProofreadPage ties these together using the pagename: an Index: page looks for its own pagename (i.e. without namespace prefix) in the File: namespace, and creates virtual pages at Page:filenameoftheuploadedfile.PDF/1 (and …/2 etc.). If any one of these are renamed, the whole thing breaks down.
    A work can easily have 1000+ pages: if it needs to be renamed, all 1000 pages have to be renamed. This is obviously not something you would ever undertake manually. But API:Move just supports moving a single page, leading to the need for complicated hacks like w:User:Plastikspork/massmove.js.
    The net result is that nothing ever gets renamed on Wikisource, and when it's done it's only done by those running a personal admin-bot (so of the already very few admins available, only the subset that run their own admin-bots can do this, and that's before taking availability into account).
  • Who would benefit: All projects, but primarily the Wikisources; it would be used (via scripts) by +sysop, but it would benefit all users who can easily have consistent page names for, say, a multi-volume work or whatever else necessitates renaming.
  • Proposed solution: It would wastly simplify this if API:Move supported batch moves of related pages, at worst by an indexed list of fromto titles; better with fromto provided by a generator function; and ideally by intelligently moving by some form of pattern. For example, Index:vitalrecordsofbr021916brid.djvu would probably move to Index:Vital records of Bridgewater, Massachusetts - Vol. 2.djvu, and Page:-namespace pages from Page:vitalrecordsofbr021916brid.djvu/1 would probably move to Page:Vital records of Bridgewater, Massachusetts - Vol. 2.djvu/1
    It would also be of tremendous help if mw.api actually understood ProofreadPage and offered a convenience function that treated the whole work as a unit (Index:filename, Page:filename/pagenum, and, if local, File:filename) for purposes of renaming (moving) them.
  • More comments: For the purposes of this proposal, I consider cross-wiki moves out of scope, so, e.g., renaming a File: at Commons as part of the process of renaming the Index:/Page: pages on English Wikisource would be a separate problem (too complicated). Ditto fixing any local mainspace transclusions that refer to the old name (that's a manageable manual or semi-automated/user-tools job).
  • Phabricator tickets:
  • Proposer: Xover (talk) 12:41, 5 November 2019 (UTC)[reply]

Discussion

@Xover: Why sysop bit is needed here? I think the bot flag is enough unless the pages are fully protected. Ankry (talk) 20:45, 9 November 2019 (UTC)[reply]
@Ankry: Because page-move vandalism rises to a whole `nother level when you can do it in batches of 1k pages at a time. And for the volume we're talking about, having to go through a request and waiting for an admin to handle it is not a big deal: single page moves happen all the time, but batch moves of entire works would typically top out at a couple per week tops (ignore a decade's worth of backlog for now). Given these factors, requiring +sysop (or, if you want to be fancy, some other bit that can be assigned to a given user group like "mass movers" or whatever) seems like a reasonable tradeoff. You really don't want inexperienced users doing this willy nilly!
But so long as I get an API that lets me do this in a sane way (and w:User:Plastikspork/massmove.js is pretty insane), I'd be perfectly happy imposing limitations like that in the user script or gadget implementing it (unless full "Move work" functionality is implemented directly in core, of course). Different projects will certainly have different views on that issue. --Xover (talk) 21:28, 9 November 2019 (UTC)[reply]

Voting

Activate templatestyles by Index page css field

  • Problem: templatestyles extension is almost magic into wikisource environment, but there's the need to activate easily it into all pages of an Index.
  • Who would benefit: all contributors
  • Proposed solution: to allow optionally to fill Index page css field with a valid templatestyle page. A simple regex could be used to see if css field content contains a valid css or a valid page name.
  • More comments: Presently it.wikisource and nap.wikisource are testing other tricks to load work-specific templatestyles into all pages of an Index, with very interesting results.
  • Phabricator tickets: phab:T226275, phab:T215165
  • Proposer: Alex brollo (talk) 07:24, 9 November 2019 (UTC)[reply]

Discussion

  • Reproducing original books is inherently layout and formatting heavy, presenting books to readers is inherently layout and formatting heavy. Inline formatting templates are carrying a lot of weight right now, with somewhat severe limitations and very little semantics. Getting a good system for playing with the power of CSS would help a lot. --Xover (talk) 11:08, 9 November 2019 (UTC)[reply]

Voting

Make content of Special:IndexPages up-to-date and available to wikicode

  • Problem: 1. The content of Special:IndexPages (eg. s:pl:Special:IndexPages) is not updated after changing status of some pages in an index page until the appropriate index page is purged. 2. The data from this page is not available to wikicode. Its availability would make possible creation of various statistics / sortable lists or graphic tools showing the status of index pages by users. In plwikisource, we make this data available to wikicode via bot which updates specific teplates regularily; these extra edits would be able to be avoided.
  • Who would benefit: All wikisources, mainly those with large number of indexes
  • Proposed solution: Make per-index numbers of pages with various statuses from Special:IndexPages available via mechanism like a magic function, a LUA function or something similar.
  • More comments:
  • Phabricator tickets:
  • Proposer: Ankry (talk) 19:12, 9 November 2019 (UTC)[reply]

Discussion

Voting

Transcluded book viewer with book pagination

Vis-itwikisource
  • Problem: When we view a transcluded (NS0) book, its a normal view of wikilike environments. Most of the book reader or lover don't like this kind of view and navigation. They are always like a book, page by page view two-page view like a physical book. Every-time we go to the next page subpage. For Italian Wikisource create one js to view like this, Vis, View In Sequence (two-sided view of our page).
  • Who would benefit: Wikisource editors and readers
  • Proposed solution: Create Vis like default viewer, View In Sequence (two-sided view of our page).
  • More comments:
  • Phabricator tickets:
  • Proposer: Jayantanth (talk) 15:43, 11 November 2019 (UTC)[reply]

Discussion

Voting

Repair Book Uploader Bot

  • Problem: Book Uploader Bot was a valuable tool for the upload of books from Google-Books on Commons for Wikisource. It is not working for a long time and it takes a long time for uploading a book from: Google Books (you need to download the book in PDF, make an OCR, convert into a djvu, upload on Commons and then fill the information). From IA, we have IA upload. It is working but also have some issues from time to time.
  • Who would benefit: Contributors of Wikisources
  • Proposed solution: Repair the tool or build a new one
  • More comments:
  • Phabricator tickets:
  • Proposer: Shev123 (talk) 14:58, 10 November 2019 (UTC)[reply]

Discussion

Voting

  • Problem: Wikidata's inter-language link system does not work well for Wikisource, because it assumes that pages are structured the same way as Wikipedia pages are structured, and this is not the case.
  • Who would benefit: Editors and readers of all Wikisources, and editors and readers of Wikidata
  • Proposed solution:
    1. Support linking from Wikidata to Multilingual Wikisource
    2. Support automatic interlanguage links between multiple editions that are linked to different items on Wikidata, where these items are linked by "has edition" and "edition or translation of"
  • More comments: This was also proposed last year
  • Phabricator tickets: phab:T138332, phab:T128173, phab:T180304, phab:T54971
  • Proposer: —Beleg Tâl (talk) 15:47, 23 October 2019 (UTC)[reply]

Discussion

This issue causes a lot of confusion for new editors on Wikisource and Wikidata, who frequently set up the interwiki links incorrectly in order to bypass this limitation. —Beleg Tâl (talk) 16:12, 23 October 2019 (UTC)[reply]

@Beleg Tâl: great proposal ! For information @Tpt: is working on something quite similar (Tpt: can you confirm?), we should keep this proposal as this is important and any help is welcome but still we should keep that in mind ;) Cdlt, VIGNERON * discut. 14:47, 27 October 2019 (UTC)[reply]
HI! Yes, indeed, I am working on it as part of mw:Extension:Wikisource. It's currently in the process of being deployed on the Wikimedia test cluster before a deployment on Wikisource. It should be done soon, so, hopefully no need from the Foundation on this (except helping the deployment). Tpt (talk) 13:59, 30 October 2019 (UTC)[reply]
@Tpt: Fantastic, thank you!! —Beleg Tâl (talk) 17:22, 2 November 2019 (UTC)[reply]

Voting

Index creation wizard

  • Problem: The process of turning a PDF or DjVu file into an index for transcription and proofreading is quite complicated and confusing. See Help:Index pages and Help:Page numbers for the basics.
  • Who would benefit: Anyone wanting to start a Wikisource transcription
  • Proposed solution: Create a wizard that walks an editor though the process of creating an index from a PDF or DjVu file (that has already been uploaded). Most importantly, it will facilitate creating the pagelist, by allowing the editor to go through the pages and identify the cover, title page, table of contents, etc, as well as where the page numbering begins.
  • More comments: This is similar to a proposal from the 2016 Wishlist, but more limited in scope, i.e. this proposal only deals with the index creation process, not uploading or importing files.
  • Phabricator tickets: task T154413 (related)
  • Proposer: Kaldari (talk) 15:32, 30 October 2019 (UTC)[reply]

Update June 2020: a project page has been set up for this at Wikisource Pagelist Widget.

Discussion

  • A wizard for initial setup is a good start, but an interactive visual editor for Index: pages, and especially for <pagelist … /> tags, would be even better. The pagelist is often edited multiple times and by multiple people, and currently requires a lot of jumping between the scan and the browser, mental arithmetic and mapping between physical and logical page numbers, multiple numbering schemes and ranges in a single work, etc. etc. A visual editor oriented around thumbnails of each page in the book and allowing you to tag pages: “This thumbnail, physically in position 7 in the file, is logically the ‘Title’ page”; “In these 24 pages (physical 13–37) the numbering scheme is roman numerals, and numbering starts on the first page I've selected”; “On this page (physical 38) the logical numbering resets to 1, and we're now back to default arabic numerals”; “This page (physical 324) is not included in the logical numbering sequence, so it should be skipped and logical numbering should resume on the subsequent page, and this page should get the label ‘Plate’”. All this stuff is much easier to do in a visual / direct-manipulation way than by writing rules describing it in a custom mini-syntax. --Xover (talk) 11:40, 9 November 2019 (UTC)[reply]

Voting

Vertical display for classical Chinese content

  • Problem: Most content in Chinese Wikisource is classical Chinese, which has been printed or written in vertical for thousands of years.
  • Who would benefit: Chinese and Japanese Wikisource. Other Wikimedia projects of languages in vertical display (like Manchu).
  • Proposed solution: Add vertical support to the Wikimedia software. To the proposer's knowledge, MediaWiki already supports right-to-left display of Arabic and Hebrew.

    A switch button on each page and "force" setting in Special:Preferences should be added to allow readers to switch the display mode between traditional vertical text 傳統直寫 and modern horizontal text 新式橫寫. Magic word will be added that allow pages to set its own default display mode.

    Hypothetical vertical Chinese Wikisource as follows. (In this picture, some characters are rotated but they should not.)

  • More comments:
  • Phabricator tickets:
  • Proposer: 維基小霸王 (talk) 13:59, 1 November 2019 (UTC)[reply]

Discussion

Voting

Improve workflow for uploading books to Wikisource

  • Problem:
Uploading books to Wikisource is difficult.
In the current workflow you need to upload the file on Commons, then go to Wikisource and create the Index page (and you need to know the exact URL). :The files need to be DJVU, which has different layers for the scan and the text. This is important for tools like Match & Split (if the file is a PDF, this tool doesn't work).
More importantly, the current workflow (especially for library uploads) includes Internet Archive, and the famous IA-Upload tool. This tool is now fundamental for many libraries and uploaders, but it has several issues.
As Internet Archive stopped creating the DJVU files from his scans, the international community has struggled solving the issue of creating automatically a DJVU for uploading on Commons and then Wikisource.
This has created a situation where libraries love Internet Archive, want to use it, but then get stuck because they don't know how to create a DJVU for Wikisource, and the IA-Upload is bugged and fails often.
Summary
    • IA-Upload tool is bugged and fails often when creating DJVU files.
    • M&S doesn't work with PDF files.
    • Users do not expect to upload to Commons when transferring files from Internet Archive to Wikisource.
    • Upload to Internet Archive is an important feature expecially for GLAMs (ie. libraries).
  • Who would benefit:
    • all Wikisource communities, especially new users
    • new GLAMs (libraries and archives) who at the moment have an hard time coping with the Wiki ecosystem.
  • Proposed solution:
Improve the IA-Upload tool: https://tools.wmflabs.org/ia-upload/commons/init
The tool should be able to create good-quality DJVU from Archive files, and do not fail as often as it does now.
it should also hide, for the end-user, the uploading to Commons phase. The user should be able to upload a file on Internet Archive, and then use the ID of the file to directly create the Index page on Wikisource. We could have an "Advanced mode" that shows all the passages for experienced user, and a "Standard" one that makes things more simple.

Discussion

Voting

Ajax editing of nsPage text

  • Problem: Dealing with simple pages editing, much user time is lost into the cycle save - load in view mode - go to next page that opens in view mode - load it into edit mode.
  • Who would benefit: experienced users
  • Proposed solution: it.wikisource implemented an ajax environment, that allows to save edited text and to upload next page in edit mode (and much more) very fastly by ajax calls: it:s:MediaWiki:Gadget-eis.js (eis means Edit In Sequence). It's far from refined, but it works and it has been tested into other wikisource projects too. IMHO the idea should be refined and developed.
  • More comments:
  • Phabricator tickets:
  • Proposer: Alex brollo (talk) 07:16, 25 October 2019 (UTC)[reply]

Discussion

  • I enthusiastically support - I have often wished that I could move directly from page to page while staying in Edit mode - it would be particularly useful for error checking: making sure, for instance, that every page in a range which could have been proofread by different people over a number of months or even years all conform to the latest format/structure etc. CharlesSpencer (talk) 11:03, 25 October 2019 (UTC)[reply]
  • I think this is a very good project specific improvement that can be made within the remit of community wishlist. Seems feasible as well. —TheDJ (talkcontribs) 12:55, 4 November 2019 (UTC)[reply]
  • This would be a great first step towards something like a full-featured dedicated "transcription mode", that would likely involve popping into full screen (hiding page chrome, navbar, etc.; use all available space inside the browser window, but don't let the page scroll because it conflicts with the independently scrolling text field and scanned page display, in practice causing your whole editing UI to "jump around" unpredictably), some more flexibility and intelligence in coarse layout (i.e. when previewing a page, the text field and scanned page are side by side, but the rendered text you are trying to compare to the scanned page is about a screenworths of vertical scrolling away), prefetching of the next scanned page (cf. the gadget mentioned at the last Wikimania), and possibly other refinements (line by line highlighting on the scanned page? We often have pixel coordinates for that fro the OCR process). Alex brollo's proposal is one great first change under a broader umbrella that is adapting the tools to the typical workflow on Wikisource, versus the typical workflow on Wikipedia-like projects: the difference makes tools that are perfectly adequate for Wikipedia-likes really clunky and awkward for the Wikisources. Usable, but with needlessly high impedance. --Xover (talk) 12:53, 5 November 2019 (UTC)[reply]
    @Samwilson: Could s:User:Samwilson/FullScreenEditing.js be a piece of this larger puzzle? I haven't played with it, but it looks like a good place to start. If this kind of thing (a separate focussed editing mode) were implemented somewhere core-adjacent, it might also provide an opportunity to clean up the markup used ala. that attempt last year(ish) that failed due to reasons (I'm too fuzzy on the details. Resize behaviour for the text fields got messed up, I think.). Could something like that also have hooks for user scripts? There's lots of little things that are suitable for user scripting to optimize the proofreading process. Memoized per-work snippets of text or regex substitutions; refilling header/footer from the values in the associated Index:; magic comment / variables (think Emacs variables or linter options) for stuff like curly/straight quote marks. In a dedicated editing mode, where the markup is clean (unlike the chaos of a full skin and multiple editors), both the page and the code could have API-like hooks that would make that kind of thing easier. --Xover (talk) 11:20, 9 November 2019 (UTC)[reply]
  • Thanks for appreciation :-). Really the it.wikisource eis tool - even if rough in code - is appreciated by many users. I like to mention too its "ajax-preview" option, that allows to see very fastly (<1 sec) the result of current editing/formatting and that allows too some simple edit of brief chuncks of text nodes (immediately editing the underlying textarea). Some text mistakes are much more evident in "view" mode that in "edit" mode, but presently Visual Editor is too slow to be used for typical fast editing into wikisource. --Alex brollo (talk) 09:43, 7 November 2019 (UTC)[reply]

Voting

New OCR tool

  • Problem: 1) Wikisource has to rely on external OCR tools. The most widely used one has been out of service for many months and all that time we are waiting, whether its creator appears and repairs it or not. The other external OCR tools do not work well (they either have extremely slow response, or generate bad quality text). None of these tools can also handle text divided into columns in magazine pages and they often have problems with non-English characters and diacritics, the OCR output needs to be improved.
    2) The tool hOCR is not working for wikisources based on non-Latin scripts. PheTool hOCR is creating a Tesseract OCR text layer for wikisources based on Latin script. E. g. for Indic Wikisource, there is a temporary Google OCR to do this, but integrating non-Latin scripts into our tool would be more useful.
  • Who would benefit: Wikisource contributors handling scanned texts which do not have an original OCR layer or whose original OCR layer is poor, and contributors to wikisources based on non-Latin scripts.
  • Proposed solution: Create an integral OCR tool that the Wikimedia programmers would be able to maintain without relying on help of one specific person. The tool should:
    • be quick
    • generate good quality OCR text
    • be able to handle text written in columns
    • be able to handle non-English characters of Latin script including diacritics
    • be able to handle non-Latin languages

Tesseract, which is an open source application, also has a specific procedure to training OCR which requires corrected text of a page and an image of the page itself. On the Wikisource side, pages that have been marked as proofread show books that have been transcribed and reviewed fully. So, what needs to be done is to strip formatting the text of these finished trascriptions, expand template transclusions and move references to the bottom. Then take the text along with an image of the page in question and run it through the Tesseracts procedure. The improvement would then be updated on ToolLabs. The better the OCR the easier the process is with each book, allowing Wikisource editors to become more productive, completing more pages than they could do previously. This would also motivate users on Wikisource.

Some concerns have appeared that WMF nearly always uses open source software, which excludes e. g. Abby Reader and Adobe, and that the problem with free OCR engines is their lack of language support, so they are never really going to replace Phe's tools fully. I do not know whether free OCR engines suffice for this task or not, but I hope the new tool to be as good or even better than Phe's tools and ideological reasons that would be an obstacle to quality should be put aside.

Discussion

I think this is the #1 biggest platform-related problem we are facing on English Wikisource at this time. —Beleg Tâl (talk) 15:09, 27 October 2019 (UTC)[reply]

Yeah. For some reason neither Google Cloud nor phetools support all of the languages of Tesseract. Tesseract in comparision to the wikisources is missing Anglo-Saxon, Faroese, Armenian, Limburgish, Neapolitan, Piedmontese, Sakha, Venetian and Min nan.--Snaevar (talk) 15:12, 27 October 2019 (UTC)[reply]

Note that you really don't want a tool that scans all pages for all languages as that is so compute-intensive that you'd wait minutes for every page you tried to OCR. Tesseract supports a boatload of languages and scripts, and can be trained for more, but you still need a sensible way to pick which ones are relevant on any given page. --Xover (talk) 07:27, 31 October 2019 (UTC)[reply]
I know. Both the Google Cloud and phetools gadgets pull the language from the language code of the wikisource that the button is pressed on and thus only uses one language. The same thing applies here. These languages are mentioned however so it is clear which wikisources this proposal could support, and witch ones it would not. P.S. I am not american, so I will never try to word things to cover all bases.--Snaevar (talk) 23:01, 2 November 2019 (UTC)[reply]

Even aside from the OCR aspect, being able to extract the formatting out of a PDF int wikitext would be highly valuable for converting pdfs (and other formats via pdf) into wikimarkup. T.Shafee(Evo﹠Evo)talk 11:19, 29 October 2019 (UTC)[reply]

I am not sure about formatting. Some scans or even originals are quite poor and in such cases the result of trying to identify italics or bold letters may be much worse than if the tool extracted just pure text. I would support adding such feature only if it were possible to be turned on and off. --Jan.Kamenicek (talk) 22:05, 30 October 2019 (UTC)[reply]

Many pages requires only simple automatic OCR. But there are pages with another font (italics, fraktur) or pages with mixed languages (e.g. Missal both in local language and latin), where would be usseful to have possibility of some recognizing options. This can be more easily made on local PC, but not everybody have this option. JAn Dudík (talk) 11:21, 31 October 2019 (UTC)[reply]

Would also be great to default the OCR formatting to match the MOS, rather than having to change it all to conform to the MOS manually. --YodinT 14:19, 25 November 2019 (UTC)[reply]

Voting

Repair search and replace in Page editing

  • Problem: Actually, "Search and replace", as provided by the code Editor (top left option in the advanced editing tab), just doesn't work when using it at "Page" namespace.

This is the basic tool to... search and replace text when editing, mass correct OCR mistakes, etc. It is simply not working.

Discussion

  • Extending the proposal: This would profit all Wiki-Projects.
    • I would suggest something more general: when I use Search and replace, I cannot go a step backwards anymore, in case my replace (or more importantly something before) was wrong. This is a general problem with the text-editor. Every time I use any of the already existing buttons (like Bold, or math or what so ever), I cannot do this step backwards. So, if I' m editing for sometime and then do something wrong and then use one of these buttons (or search and replace), I must do the whole work from the beginning, because I cannot go back to the mistake that I did before using one of these buttons. This is not the case with the visual editor, so, I think, it would be possible to change this in the texteditor rather easily.
    • There are only two options in search and replace: you can either replace one after the other, or the whole text. I would be really grateful if I could use search and replace only in a marked text (and not the whole one)Yomomo (talk) 22:24, 8 November 2019 (UTC)[reply]
    • About Search and replace. If I want to replace something with more lines, the new-line-mark will not be included. I don't know how difficult it is to change this, but it would be a profit to be able to replace parts also when they (and the new part) have more lines. Yomomo (talk) 14:52, 1 November 2019 (UTC)[reply]

Voting

Offer PDF export of original pagination of entire books

Français: Pouvoir exporter en pdf en respectant la pagination de l'édition source.
  • Problem: Presently PDF conversion of proofread wikisource books doesn't mirrors original pagination and page design of original edition, since it comes from ns0 transclusion.
    Français: La conversion en PDF des livres Wikisource ne reflète pas la pagination et le design original des pages de l’édition originale, car la conversion provient de la transclusion et non des pages.
  • Who would benefit: Offline readers.
    Français: Lecteurs hors ligne.
  • Proposed solution: To build an alternative PDF coming from conversion, page for page, of nsPage namespace.
    Français: Élaborer un outil pour générer un PDF alternatif provenant d’une conversion page par page.
  • More comments: Some wikisource contributors think that nsIndex and nsPage are simply "transcription tools"; I think that they are much more - they are the true digitalization of a edition, while ns0 transclusioni is something like a new edition.
    Français: Certains contributeurs de wikisource pense que nsIndex et nsPage sont simplement des « outils de transcription » ; je pense qu’ils sont beaucoup plus que cela – ce sont la vraie numérisation d’une édition, tandis que la transclusion ns0 constitue une nouvelle édition.
  • Phabricator tickets: T179790
  • Proposer: previous year proposer Alex brollo got voted 57, Jayantanth (talk) 16:03, 26 October 2019 (UTC)[reply]

Discussion

  • I think I would have actually Opposed this: I don't want to reproduce original pagination, we have the original PDF for that. For this proposal to make sense, to me, it would need to be about having some way to control PDF generation in the same way transclusion to mainspace controls wikitext rendering. I wouldn't necessarily want to reproduce each original page in a PDF page there (often, yes, but not always), and I might want to tweak some formatting specifically for a paged medium (PDF) that doesn't apply in a web page, or vice versa. In other words, I'm going to abstain from voting on this proposal but I might support something like it in the future if it was better fleshed out. --Xover (talk) 06:03, 27 November 2019 (UTC)[reply]

Voting

Reorganize the Wikisource toolbar

français: reorganization wikisource toolbar
  • Problem: Some shortcuts are superfluous, others are missing
    français: Certains raccourcis sont superflus, d'autres absents
  • Who would benefit: facilitate editing for new writers
    français: faciliter l'édition pour les nouveaux rédacteurs
  • Proposed solution: Rearrange the toolbar a bit
    français: Réorganiser un peu la barre outils
  • More comments: In the toolbar, we have {{}},{{|}}, {{|}}. I think keep {{}} and replace the other two useless it goes as fast to type | on the keyboard at the desired place. Instead we could put {{paragraph|}}, {{space|}}, {{separation}}, <ref>txt</ref> duplicates the icon next to (insert file) at the top left. It could be replaced by <ref follow=pxxx>. Next to <br/> we could add <brn|> and {{br0}}. The <search-replace> could appear next to the pencil dropdown at the top-right.
    français: Dans la barre, nous avons {{}},{{|}}, {{|}}. Je pense garder {{}} et remplacer les 2 autres pour moi inutiles ça va aussi vite de taper au clavier le | à l'endroit voulu. A la place on pourrait y mettre {{alinéa|}}, {{espace|}}, {{separation}} <ref>txt</ref> fait double emploi avec l'icone à côté (inserer fichier) en haut à gauche. On pourrait le remplacer par <ref follow=pxxx>. A côté de <br/>, on pourrait rajouter <brn|> et {{br0}}. Le <rechercher-remplacer> pourrait figurer à côté du crayon changer d'éditeur.
  • Phabricator tickets:
  • Proposer: Carolo6040 (talk) 10:58, 25 October 2019 (UTC)[reply]

Discussion

I think you mean the Character insertion bar below the editor ? That can already be modified by the community itself, it does not require effort by the development team. —TheDJ (talkcontribs) 12:48, 4 November 2019 (UTC)[reply]
we need a redesign of default menus for wikisource. this is beyond the capabilities of the new editor. visual editor will not be used until this is done, either by community or developers. Slowking4 (talk) 15:11, 4 November 2019 (UTC)[reply]
Still, these changes can be performed by local interface admins (such edits are not to be done by new editors). Check for example MediaWiki:Edittools. Ruthven (msg) 18:55, 4 November 2019 (UTC)[reply]
  • "Oppose" expending Wishlist resources on this. This can be fixed by the local project, and is something that should be handled by the local project. For example, the set of relevant templates and things like style guides for (curly) quote marks etc. are going to vary from project to project. Perhaps if there was something not possible with the current toolbar it might make sense to add that support, but then just to enable local customization. --Xover (talk) 12:16, 27 November 2019 (UTC)[reply]

Voting

  • Problem: in italian version it is a problem to make too much link to Wikidata in a single page. But this is necessary to improve te use of Wikisource books out of is ouwn platform: on tablet, pc: The presence of links to Wikidata make the books an hipertext much more usefull.
  • Who would benefit: Every readers
  • Proposed solution: I am not a tecnic so I have only needs e not solutions ;-)
  • More comments:
  • Phabricator tickets:
  • Proposer: Susanna Giaccai (talk) 11:22, 4 November 2019 (UTC)[reply]

Discussion

@Giaccai: Can you give specific examples of pages where this is currently a problem ? —TheDJ (talkcontribs) 12:57, 4 November 2019 (UTC)[reply]
This is the only one page with a Lua error: it:s:Ricordi di Parigi/Uno sguardo all’Esposizione. IMHO it's a Lua "not enough memory" issue, coming from Lua exausted space: you can see "Lua memory usage: 50 MB/50 MB" into NewPP limit report. --Alex brollo (talk) 14:47, 4 November 2019 (UTC)[reply]
Weren't the links to Wikidata to be used only in case of author's names? --Ruthven (msg) 18:45, 4 November 2019 (UTC)[reply]
No, presently there are tests to link to wikidata other kinds of entities (i.e. locations); wikidata is used to find a link to a wikisource page, or to a wikipedia page, or to wikidata page when both are lacking (dealing with locations, usually the resulting link points to wikipedia). --Alex brollo (talk) 07:17, 5 November 2019 (UTC)[reply]
I just investigated the error. The "not enough memory" issue is caused by s:it:Modulo:Wl and s:it:Module:Common. @Alex brollo: What is going on is that the full item serialization is loaded into Lua memory twice per link, once by local item = mw.wikibase.getEntity(qid) in s:it:Modulo:Wl and once by local item = mw.wikibase.getEntityObject() in s:it:Module:Common. You could probably avoid both of this calls by relying on the Wikibase Lua functions that are already used s:it:Modulo:Wl and, so, limit a lot the memory consumption of the module. Tpt (talk) 16:08, 12 November 2019 (UTC)[reply]
@Tpt: Thanks! I'll review the code following your suggestions. --Alex brollo (talk) 14:26, 14 November 2019 (UTC)[reply]

Voting

XTools Edit Counter for Wikisource

Français: Compteur de modifications très amélioré.
  • Problem: There are no wikisource specific stats about user wise Proofread/validation. It is impossible to know stats about proofreading task. Wikisource workflow is different from Wikipedia. It could not be done by xtool. So we need specific Stats tools for Wikisource.
    Français: Il n’existe pas de statistiques spécifiques sur les correction/Validations par utilisateurs. Les processus de travail (workflow) de Wikisource diffèrent de ceux de Wikipédia. Ce ne peux être fait via par xtool. Nous avons besoin d’un outil statistique spécifique pour Wikisource.
  • Who would benefit: Whole Wikisource Community.
    Français: Toute la communauté Wikisource.
  • Proposed solution: Make a stats Tools for Wikisource specific.
    Français: Créer un outil de statistiques spécifiques à Wikisource.
  • More comments:
  • Phabricator tickets: phab:T173012
  • Proposer: Jayantanth (talk) 16:01, 26 October 2019 (UTC)[reply]

Discussion

  • I'm initially a “Support Sure, why not?” on this, but given there is a limited amount of resources available for the Wishlist tasks, and this is both not very important and potentially quite time-consuming to implement, I'm going to abstain from voting for this. It's a nice idea, but not worth the cost. --Xover (talk) 05:56, 27 November 2019 (UTC)[reply]
    • @Xover: I totally understand your point as I felt a bit the same way but after rethink, I don't think it's cost that much considering the impact. Better knowing a community can have a very strong effect to boost said community. In particular, I'm thinking about participation contest, who all need stats and could benefit from this edit counter. Cheers, VIGNERON * discut. 17:33, 27 November 2019 (UTC)[reply]

Voting

Template limits

  • Who would benefit: Every text on every Wikisource is potentially concerned but obviously the target is the text with a lot of templates (either the long text, the text with heavy formatting or both).
  • Proposed solution: I'm not a dev but I can imagine multiples solutions :
    • increase the limit (easy but maybe not a good idea in the long run)bad idea (cf. infra)
    • improve the expansion of template (it's strange that "small" template like the ones for formatting consume so much)
    • use something than template to format text
    • any other idea is welcome

Discussion

  • Would benefit all projects as pages that use a large number of templates, such as cite templates, often hit the limit and have to work round the problem. Keith D (talk) 23:44, 27 October 2019 (UTC)[reply]
  • for clarity, this is soley about the include size limit? (There are several other types of template limits). Bawolff (talk) 23:14, 1 November 2019 (UTC)[reply]
    @Bawolff: What usually bites us is the post-expand include size limit. See e.g. s:Category:Pages where template include size is exceeded. Note that the problem is exacerbated by ugly templates that spit out oodles of data, but the underlying issue is that the Wikisourcen operate by transcluding together lots of smaller pages into one big page, so even well-designed templates and non-pathological cases will sometimes hit this limit. --Xover (talk) 12:02, 5 November 2019 (UTC)[reply]
  • @VIGNERON: unfortunately, various parser limits exist to protect our servers and users from pathologically slow pages. Relaxing them is not a good solution, so we can't accept this proposal as it is. However, if it were reformulated more generally like "do something about this problem", it might be acceptable. MaxSem (WMF) (talk) 19:32, 8 November 2019 (UTC)[reply]
    • @MaxSem (WMF): thank for this input. And absolutely! Raising the limit is just of the ideas I suggested, "do something about this problem" is exactly what this proposition is about. I scratched the "increase the limit" suggestion, I can change other wording if needed, my end goal is just to be able to format text on Wikisource. And if you have any other suggestion, you're welcome 😉. VIGNERON * discut. 19:54, 8 November 2019 (UTC)[reply]
  • The problem here is that almost all content on large Wikisources is transcluded using ProofreadPage. I noticed that the result is that all the code of templates placed on pages in the Page namespace is transcluded (counted into the post-expand include size limit) twice. If you also note here that except the templataes, Wikisource pages have a lot of non-template content, you will see that Wikisource templates must be tiny, effective, etc. And even long CSS class name in an extensively used template might be a problem.
@Bawolff and MaxSem (WMF): So the problem is whether this particular limit has to be the same for very large, high traffic wikis like English Wikipedia as for medium/small low trafic wikis like Wikisource? I think that Wikisources would benefit much even if raising it for 25-50% (from 2MB to 2.5-3MB)
Another idea is based on the fact that Wikisource page creation idea is: create/verify/leave untouched for years. So if large transclusion pages hit a lot parser efficiency, maybe the solution is to use less aggressive updates / more aggressive caching for them? I think, that delayed updates would not be a big problem for Wikisource pages.
Just another idea: in plwikisource we have not pages hitting this limit at the moment due to a workaround used: for large pages we make userspace transclusions using {{iwpages}} template, see here. Of course, very large pages may then kill users' browsers instead of killing servers. But I think this is acceptable if somebody really wants to see the whole Bible on a single page (we had such requests...). Unfortunately, this mechanism is incompatible with the Cite extension (transcluded parts contain references with colliding id's - but maybe this can be easily fixed?). Also, a disadvantage is that there is no dependencies to the userspace transcluded parts of the page(s) (but maybe this is not a problem?). Ankry (talk) 20:04, 9 November 2019 (UTC)[reply]
Yeah, depending on just exactly what the performance issue that limit is trying to avoid is, it is very likely a good idea to investigate whether that problem is actually relevant on the Wikisources. Once a page on Wikisource is finished it is by definition an exception if it is ever edited again: after initial development the page is supposed to reflect the original printed book which, obviously, will not change. Even the big Wikisources are also tiny compared to enwp, so general resource consumption (RAM, CPU) during parsing has a vastly smaller multiplication factor. A single person could probably be reasonably expected to patrol all edits for a given 24-hour period on enWS without making it a full time job (I do three days worth of userExpLevel=unregistered;newcomer;learner recent changes on my lunch break). If we can run enwp with the current limit, it should be possible handle all the Wikisourcen with even ten times that limit and barely be able to see it anywhere in Grafana.
Not that there can't be smarter solutions, of course. And I don't know enough about the MW architecture here to predict exactly what the limit is achieving, so it's entirely possible even a tiny change will melt the servers. But it's something that's worth investigating at least. --Xover (talk) 21:50, 9 November 2019 (UTC)[reply]
@Ankry and Xover: thanks a lot for these inputs, raising even a bit the limit may be a good short term solution but I think we need more a long term solution. I think the most urgent is to look more into all the aspect of the problem to see what can be done and how. Cheers, VIGNERON * discut. 15:01, 12 November 2019 (UTC)[reply]
[The following is just my personal view and does not necessarily reflect anyone else's reasoning on this question]: One issue with just raising the limit on a small wiki, is that first one wiki will want it, then another wiki, and then a slightly bigger wiki wants it, and pretty soon english wikipedia is complaining its unfair that everyone else can have X when they can't and things spiral. So its a lot easier to have the same standard across all wikis. Bawolff (talk) 00:45, 22 November 2019 (UTC)[reply]
Although, one thing to note, this is primarily about the page tag, so I guess if we did mess with the limit, we could maybe change the limit just for stuff included with the page tag. But I'm not sure if people would go for thatBawolff (talk) 00:54, 22 November 2019 (UTC)[reply]
@Bawolff: “Everyone will want it and it's hard to say no”. Certainly. But that's a social problem and not a technical one. As you note, the Wikisourcen will be served (at least mostly) if the raised limit only applies when transclusion is invoked through ProofreadPage. As a proxy for project size it should serve reasonably well: I don't see any ProofreadPage-using project growing to within orders of magnitude of enwp scope any time soon (sadly. that would be a nice problem to have). --Xover (talk) 12:40, 27 November 2019 (UTC)[reply]

Voting

Better editing of transcluded pages

  • Problem: When somebody wants to edit text in page with transcluded content, he find nothing relevant in source, only links to one or many transcluded pages under textarea. There are some tools (default in some wikisources) which helps to find the correct page. These tools displays number of page in the middle of text (in some wikis in the edge) and in source html there are invisible parts, but sometimes in the middle of the word/sentence/paragraph.
  • Who would benefit: Users who wants to correct transcluded text
  • Proposed solution: 1) Make invisible html marking visible, but not disturbing the text. Find the way how to move it from the middle of words.
    • en.wikisource example (link to page is on the left edge):
      • dense undergrowth of the sweet myrtle,&#32;<span><span class="pagenum ws-pagenum" id="2" data-page-number="2" title="Page:Tales,_Edgar_Allan_Poe,_1846.djvu/16">&#8203;</span></span>so much prized by the horticulturists of England.
        
    • cs.wikisource example (link is not displayed by deafault, when make visible by css, is in the middle of text.)
      • vedoucí od západního břehu řeky k východ<span><span class="pagenum" id="20" title="Stránka:E. T. Seton - Prerijní vlk, přítel malého Jima.pdf/23"></span></span>nímu.
        
  • Alternate solution: 2) after click to [edit] display pagination of transcluded text, click on page will open it for edit.
  • Alternate solution 2: 3) Make transcluded page editable in VE.
  • More comments: Split from this proposal
  • Phabricator tickets:
  • Proposer: JAn Dudík (talk) 16:30, 11 November 2019 (UTC)[reply]

Discussion

  • I don't really see this as a worthwhile thing for Community Tech to spend their time on. The existing page separators from ProofreadPage can be customised locally by each project, and its display or not can be customised in project-local CSS. Editing of transcluded content is an inherent problem with a ProofreadPage-based project and is not really something that can be “fixed” in a push by Community Tech. --Xover (talk) 05:45, 27 November 2019 (UTC)[reply]

Voting

memoRegex

  • Problem: OCR editing needs lots of work-specific regex substitutions, and it would be great to save them, and to share them with any other user. Regex shared substitutions are too very useful to armonize formatting into all pages of a work.
  • Who would benefit: all users (the unexperienced ones could use complex regex subsitutions, tested by experienced ones)
  • Proposed solution: it.wikisource uses it:s:MediaWiki:Gadget-memoRegex.js, that does the job (it optionally saves regex substitutions tested with a it.source Find & Replace tool, so that they can be called by any other user with a click while editing pages of the same Index). The idea should be tested, refined and applied to a deep revision of central Find and Replace tool.
  • More comments: The tool has been tested into different projects.
  • Phabricator tickets:
  • Proposer: Alex brollo (talk) 07:33, 25 October 2019 (UTC)[reply]

Discussion

  • Actually this is very useful. It's an extension to a workaround to solve the search & replace bug that affects all Wikisource projects. If reimplementing the Search & Replace is retained as a solution, "memoRegex" should be considered as part of the implementation. --Ruthven (msg) 18:51, 4 November 2019 (UTC)[reply]

Voting

ProofreadPage extension in alternate namespaces

Français: Utiliser les outils de l'espace page dans d'autres espaces
  • Problem: ProofreadPage elements, such as "Source" link in navigation, do not display in namespaces other than mainspace
    Français: Les éléments de l’espace page, tels que le lien "Source" dans la navigation, ne s'affichent pas dans les espaces de noms autres que l’espace principal.
  • Who would benefit: Wikisources with works in non-mainspace, such as user translations on English Wikisource
    Français: Utilisateurs Wikisource qui font des travaux qui ne sont pas en espace principal, tels que des traductions utilisateur sur Wikisource anglaise
  • Proposed solution: Modify the ProofreadPage extension to allow its use in namespaces other than mainspace
    Français: Modifier l'extension de l'espace page, ProofreadPage, pour permettre son utilisation dans des espaces de noms autres que l’espace principal.
  • More comments: I also proposed this in the 2019 and 2017 wishlist surveys.
  • Phabricator tickets: phab:T53980
  • Proposer: —Beleg Tâl (talk) 16:23, 23 October 2019 (UTC)[reply]

Discussion

Voting

Improve extraction of a text layer from PDFs

  • Problem: If a scan in PDF has an OCR layer (i. e. original OCR layer, usually of high quality, which is a part of many PDF scans provided by libraries, not the OCR text obtained by our OCR tools), the text is very poorly extracted from it in Wikisource page namespace. DJVUs do not suffer this problem and their OCR layer is extracted well. If the PDF is converted into DJVU, the extraction of the text from its OCR layer improves too. (Example of OCR extraction from a pdf here: [1], example of the same from djvu here: [2] ) As most libraries including Internet Archive or HathiTrust offer downloading PDFs with OCR layers and not DJVUs, we need to fix the text extraction from PDFs.
  • Who would benefit: All Wikisource contributors working with PDF scans downloaded from various major libraries (see above). Some contributors in Commons have expressed their concern that the DjVu file format is dying and attempted to deprecate it in favour of PDF. Although the attempt has not succeeded (this time), many people still prefer working with PDFs (because the DJVU format is difficult to work with for them, or they do not know how to convert PDF into DJVU, how to edit DJVU scans, and also because DJVU format is not supported by Internet browsers...)
  • Proposed solution: Fix the extraction of text from existing OCR layers of scans in PDF.
  • More comments:
  • Phabricator tickets:
  • Proposer: Jan.Kamenicek (talk) 20:18, 24 October 2019 (UTC)[reply]

Discussion

There are also libraries, where is possible to download bunch of pages (20-100) in PDF, but no or only single in djvu.

There is also possibility of external google OCR

mw.loader.load('//wikisource.org/w/index.php?title=MediaWiki:GoogleOCR.js&action=raw&ctype=text/javascript');

, but there are more ocr errors and sometimes there are mixed lines. JAn Dudík (talk) 12:13, 25 October 2019 (UTC)[reply]

Yes, exactly, the Google OCR is really poor (en.ws has it among their gadgets), but the original OCR layer which is a part of most scans obtained from libraries is often really good, only Mediawiki fails to extract it correctly. If you download a PDF document e.g. from HathiTrust, it usually contains an OCR layer provided by the library (i.e. not obtained by some of our tools), and when you try to use this original OCR layer in the Wikisource page namespace, you get very poor results. But, if you take the same PDF document and convert it to djvu prior to uploading it here, then you get amazingly better results when extracting the text from the original OCR layer in Wikisource, and you do not need any of our OCR tools. This means that the original OCR layer of the PDF is good, only we are not able to extract it right from the PDF for some reason, although we are able to extract if from DJVU. --Jan.Kamenicek (talk) 17:10, 25 October 2019 (UTC)[reply]
yeah - it is pretty bad when the text layer does not appear, and OCR buttons hang with gray, but i can cut and paste text from IA txt file. clearly a failure to hand-off clear text layer. Slowking4 (talk) 02:34, 28 October 2019 (UTC)[reply]

Voting

Tools to easily localize content deleted at Commons

  • Problem: When a book scan is deleted on Commons, it completely breaks indexes on Wikisource. Commons does not notify Wikisource when they delete files, nor do they make any attempt to localize the file instead of deleting it. Wikisource has no way of tracking indexes that have been broken by the Commons deletion process.
  • Who would benefit: Wikisource editors
  • Proposed solution: 1) Make it really easy to localize files, for example by fixing phab:T8071, and 2) Fix or replace the bot(s) that used to notify Wikisources of pending deletions of book scans used by Wikisource
  • More comments: A similar approach may also be helpful for Wikiquote and Wiktionary items that depend on Wikisource, when Wikisource content is moved or deleted.
  • Phabricator tickets: phab:T8071
  • Proposer: —Beleg Tâl (talk) 14:45, 4 November 2019 (UTC)[reply]

Discussion

Voting

Generate thumbnails for large-format PDFs

  • Problem: For some PDFs, with very large images (typically scanned newspapers), no images (called "thumbnails") are shown.
  • Who would benefit: Wikisource when proofreading newspaper pages.
  • Proposed solution: Look at the PDF files described in phab:T25326, phab:T151202, commons:Category:Finlands Allmänna Tidning 1878, to find out why no thumbnails are generated.
  • More comments: When extracting the JPEG for an individual file, that JPEG can be uploaded. But when the JPEG is baked into a PDF, no thumbnail is generated. Is it because of its size? Small pages (books) work fine, but newspapers (large pages) fail.
  • Phabricator tickets: phab:T151202
  • Proposer: LA2 (talk) 21:04, 23 October 2019 (UTC)[reply]

Discussion

  • Hi LA2! Can you provide a description of the problem? This could help give us a deeper understanding of the wish. Thank you! --IFried (WMF) (talk) 18:52, 25 October 2019 (UTC)[reply]
    The problem is very easy to understand. I find a free, digitized PDF and upload it to Commons, then start to proofread in Wikisource. This always works fine for normal books, but when I try the same for newspapers, no image is generated. Apparently this is because the image has a larger number of pixels. I haven't tried to figure out what the limit is. --LA2 (talk) 21:36, 25 October 2019 (UTC)[reply]
  • What about to provide for ProofReading more compact desight at all. Those seconds scrolling counts. If we have on one site the window with the extracted text and in the other site the same size window with scan in which you can zoom and move fast, that should save your time and be more attractive for newbies. The way it is now it looks kind of techy and in some cases difficult to handle. E.g. there should be also more content help or a link to discussion page covered in more attracitve design. Juandev (talk) 09:22, 4 November 2019 (UTC)[reply]
  • I think that a tool that allows to generate such thumbnails manually / on request / offline with much higher limits and available to a specific group of users (commons admins? a dedicated group?) maybe a workaround for this problem. Ankry (talk) 20:23, 9 November 2019 (UTC)[reply]
I'm not sure what you want me to check - the question at hand is why that parti ular version of the file failed to render. Bawolff (talk) 09:15, 22 November 2019 (UTC)[reply]
Wow, @Hrishikes and Bawolff:, there is a fix? How exactly does it work? Could it be integrated into the upload process? Could it be applied to all files in commons:Category:Finlands Allmänna Tidning 1878? --LA2 (talk) 19:13, 10 December 2019 (UTC)[reply]
@LA2: -- This problem is occurring in highly compressed files and linked to the ocr layer. The fix consists of decompressing the file (so that the size in mb increases) and either flattening or removal of the ocr layer. I first tried flattening; it usually works but did not in this case; so I removed the ocr. Now it works. And yes, it is potentially usable for other files in your category. Extract the pages as png/jpg and rebuild the pdf. Hrishikes (talk) 01:39, 11 December 2019 (UTC)[reply]

Voting

Improve export of electronic books

Original title (Français): Améliorer l'exportation des versions électroniques des livres
  • Problem: Imagine if Wikipedia pages could not display for many days, or would only be available once in a while for many weeks. Imagine if Wikipedia displayed pages with missing information or scrambled information. This is what visitors get when they download books from the French Wikisource. Visitors do not read books online in a browser. They want to download them on their reader in ePub, MOBI or PDF. The tool to export books, WsExport, has all those problems : on spring 2017, it was on and off for over a month; after October 2017, MOBI format was not available, followed by PDF after a while. These problems still continue on and off.


  • Since the project was finished sometime in July or August 2019, the stability of the WsExport tool has improved. Unfortunately there has been downtimes, some up to 12 hours. The fact that the tool does not get back on line rapidly is a deterrent for our readers/visitors.
    1. September 30 : no download from 10:00 to 22:00 Montreal time
    2. October 30 : no download for around 30 minutes from 13:00 to 13:30
    3. October 31 : no anwser or bad gateway at 22:10
    4. November 1st : no download from 17:15 to 22:30
    5. November 2 : no download from 10:30 to 11:40
    6. November 2 : no download or bad gateway from 19:25 to 22:46
  • I have tested books and founds the same problems as before.
    1. Missing text at end of page or beginning of page (in plain text or in table)
    2. Duplication of text at end of page or beginning of page
    3. Table titles don't appear
    4. Table alignment in a page (centered) not respected
    5. Text alignment in table cell not respected
    6. Style in table not respected in MOBI format
    7. And others
  • More information can be found on my Wikisource page
  • For all these reasons, this project is resubmitted this year. It is an important aspect of the Wikisource project: an interface for contributors and an interface for everyone else from the public who wishes to read good e-books. --Viticulum (talk) 21:45, 7 November 2019 (UTC)[reply]


  • Français: Imaginez si les pages Wikipédia ne s’affichaient pas pendant plusieurs jours, ou n’étaient disponibles que de façon aléatoire durant plusieurs jours. Imaginez si, sur les pages Wikipédia, certaines informations ne s’affichaient pas ou étaient illisibles. C’est la situation qui se produit pour les visiteurs qui désirent télécharger les livres de la Wikisource en français. Les visiteurs ne lisent pas les livres en ligne dans un navigateur, ils désirent les télécharger sur leurs lecteurs en ePub, MOBI ou PDF. L’outil actuel (WsExport) permettant l’export dans ces formats possède tous ces problèmes: au printemps 2017, il fonctionnait de façon aléatoire durant un mois ; depuis octobre 2017, le format mobi puis pdf ont cessé de fonctionner. Ces problèmes continuent de façon aléatoire.


  • Depuis la fin du projet en juillet ou août 2019, la stabilité de l'outil WsExport s'est améliorée. Malheureusement, il y a eu des temps d'arrêt, certains jusqu'à 12 heures. Le fait que l'outil ne soit pas remis en ligne rapidement peut être dissuasif pour nos lecteurs / visiteurs.
    1. 30 septembre : aucun téléchargement de 10 h à 22 h heure de Montréal
    2. 30 octobre : pas de téléchargement pour environ 30 minutes de 13 h à 13 h 30
    3. 31 octobre : pas de réponse ou mauvaise passerelle 22 h 10
    4. 1er novembre : pas de téléchargement de 17 h 15 à 22 h 30
    5. 2 novembre : pas de téléchargement de 10 h 30 à 11 h 40
    6. 2 novembre : pas de téléchargement ou mauvaise passerelle de 19 h 25 à 22 h 46
  • J'ai testé des livres et trouve les mêmes problèmes qu'avant.
    1. Texte manquant à la fin ou au début de la page (dans le texte ou dans un tableau)
    2. Duplication de texte en fin ou en début de page
    3. Les titres de table n'apparaissent pas
    4. L'alignement de la table sur une page (centrée) n'est pas respecté
    5. L'alignement du texte dans la cellule du tableau n'est pas respecté
    6. Style dans la table non respecté en format MOBI
    7. Et d'autres
  • Plus d'informations peuvent être trouvées sur ma page wikisource
  • Pour toutes ces raisons, ce projet est soumis à nouveau cette année. La communauté Wikisource accorde une importance haute à cet aspect du projet : une interface pour les contributeurs et une interface pour tous les autres utilisateurs du public souhaitant lire de bons livres électroniques. --Viticulum (talk) 21:57, 7 November 2019 (UTC)[reply]


  • Who would benefit: The end users, the visitors to Wikisource, by having access to high quality books. This would improve the credibility of Wikisource.

    This export tool is the showcase of Wikisource. Contributors can be patient with system bugs, but visitors won’t be, and won’t come back.

    The export tool is as important as the web site is.

    Français: L’utilisateur final, le visiteur de Wikisource, en ayant accès à des livres de haute qualité. Ceci contribuerait à améliorer la crédibilité de Wikisource. L’outil d´exportation est une vitrine pour Wikisource. Les contributeurs peuvent être patients avec les anomalies de système, mais les visiteurs ne le seront peut-être pas et ne reviendront pas. L’outil d’exportation est tout aussi important que le site web.
  • Proposed solution: We need a professional tool, that runs and is supported 24/7, as the different Wikimedia web sites are, by Wikimedia Foundation professional developers.

    The tool should support different possibilities of electronic book, and the evolution of ebooks technology.

    The different bugs should be corrected.

    Français: Nous avons besoin d’un outil professionnel, fonctionnel et étant supporté 24/7, comme tous les différents sites Wikimedia, par les développeurs professionnels de la Fondation Wikimedia. Les différentes anomalies doivent être corrigées.
  • More comments: There are not enough people in a small wiki (even on French, Polish and English Wikisource, the three most important by the size of their community) to support and maintain such a tool.
    Français: Nous ne sommes pas assez nombreux dans les petits wikis (même Wikisource en français, polonais ou anglais, les trois plus importantes par le nombre de contributeurs) pour supporter une telle application.


Discussion

Voting

2021

0 proposals, 0 contributors, support votes
The survey has closed. Thanks for your participation :)



Fix search and replace in the Page: namespace editor

Discussion

Voting

XTools Edit Counter for Wikisource

  • Problem: There are no wikisource specific stats about user wise Proofread/validation. It is impossible to know stats about proofreading task. Wikisource workflow is different from Wikipedia. It could not be done by xtool. So we need specific Stats tools for Wikisource.
  • Who would benefit: Whole Wikisource Community.
  • Proposed solution: Make a stats Tools for Wikisource specific.
  • More comments: Community Wishlist Survey 2020/Wikisource/XTools Edit Counter for Wikisource
  • Phabricator tickets: phab:T173012, phab:T172408
  • Proposer: Jayantanth (talk) 09:10, 18 November 2020 (UTC)[reply]

Discussion

Hello Jayantanth. Do you know the tool called Wikiscan ? For the bengalese Wikisource, you can find a lot of statistics there (but not that much on proofreading though). For the french Wikisource, it is quite useful. Maybe you can also find what you are looking for on the tool called ProofreadPage Statistics, with specific statistics about proofreading, but not user-centered. --Consulnico (talk) 15:41, 20 November 2020 (UTC)[reply]

On the French Wikisource, we also use a self made robot, BookwormBot that allows to make reports on all the books (not users though) in a specific category. You can configure as many cats you need to be included in reports. It's the tool we're using to compile the daily results of our monthly "mission" This robot starts every 6h on WSFR and takes about 15-20 mins to run in our selected catagories ; so we have updated tables regularly... You may contact Coren to have more informations about this robot. --Ernest-Mtl (talk) 18:35, 20 November 2020 (UTC)[reply]

Voting

Automated move of items from Wikimedia Commons to Wikisource

  • Problem: bulk upload of items to Wikimedia Commons through Pattypan works well. But it's then a manual and time-consuming job moving each item over to Wikisource and creating the index page there
  • Who would benefit: anyone wanting to upload digitised content to Wikisource in bulk (e.g. national libraries)
  • Proposed solution: build step into Pattypan?
  • More comments:
  • Phabricator tickets:
  • Proposer: Gweduni (talk) 12:05, 30 November 2020 (UTC)[reply]

Discussion

  • @Gweduni: what do you call "items" here? Do you have an example? I'm not sure to fully understand (informations on index pages can already be automatically retrieve either from Commons or Wikidata, it shouldn't be time consuming anymore...) Cheers, VIGNERON * discut. 10:41, 9 December 2020 (UTC)[reply]
@VIGNERON: so sorry - only read this now. i mean if we bulk upload books it would be nice to autogenerate the index on Wikisource rather than having to copy the "File:..." from Wikicommons then manually change to "Index..." on Wikisource Gweduni (talk) 13:59, 6 July 2021 (UTC)[reply]

Voting

Structured Data on Wikisource

  • Problem: Cataloging Wikisource with Wikidata is possible, but an embedded solution like Structured data on Commons would enable to add more detailed metadata holdings at the documents.
  • Who would benefit: visibility and SPARQL-Query usage for Wikisource, textbox quality, interoperability, Wikisource communities in participating international wikisource versions
  • Proposed solution: a structured data part in Wikisource like Structured Data on Commons.
  • More comments:

There could be two ways in which Structured Data on Wikisource could be used:

1. bibiographic metadata about documents on Wikisource are stored in Wikidata and linked over the sitelink (schema:) to Wikisource. The Wikisource community could use this data to automatic generate infoboxes (textboxes).
* maybe there is a lack of discussion about the bibliographic metadata model on Wikisource. imho represents Wikisource pages a new version/edition of the underlying work. so instead of linking the Wikisource pages directly as sitelink to the originate works or editions, something similar to SDC "digital representation of" would be even more precise.
2. a great benefit of "structured data on Wikisource" could be the possibility to tag or (semi-)automatically recognize "named entites" within the given text corpora. many items (e.g. persons, places) where mentioned in different ways in the text, in interesting and important ways - but not important enough to define them as "main subject" (P920) in the bibliographic (wikidata) item. but if we could have to create a tagged map of named entites linked by Q-IDs a really new way of searching or disovering Wikisource articles could be enabled.

Discussion

See as well for Collaborative Semantic Annotations neonion.org + youtube demo via Tweet @clmbirn --Jeb (talk) 19:03, 8 December 2020 (UTC)[reply]

Voting

Improve workflow for uploading books to Wikisource

Discussion

I was going to publish a similar wish. We had Book Uploader Bot but it is not working anymore. A grant was funded but I don't know if the tool exist. IA Upload is great but we need something for other website (Google Books, Gallica, HathiTrust...). Actually it takes a few hours for downloading a book, convert it into djvu and upload it on Commons. --Shev123 (talk) 20:22, 25 November 2020 (UTC)[reply]

Not that it fixes the issue, but the grant resulted in the toolforge:bub2 tool. Both in BUB2 and InternetArchive the Index page in Wikisource is not filled.--Snaevar (talk) 22:40, 30 November 2020 (UTC)[reply]
yeah, a cascade of bots / tools is nice. but the maintenance is problematic. i think a universal tool / UX redesign with some onboarding of maintenance, with a dashboard to trackprogress, is going to be required in the long run. Slowking4 (talk) 02:02, 4 December 2020 (UTC)[reply]

I think it's also difficult to create the cover for a book, once you uploaded it. Avecus (talk) 23:59, 8 December 2020 (UTC)[reply]

See also Community Wishlist Survey 2020/Archive/Improve workflow for uploading academic papers to Wikisource, and Commons:Commons talk:CropTool#PDF quasi-extract. HLHJ (talk) 23:24, 19 December 2020 (UTC)[reply]

Voting

2022



New search tool using informations from Index pages

Discussion

Voting

Search in books

Discussion

But transcluded pages does not contain the text of the book. In Iliad/Canto I, there is not "The wrath of Peleus' son etc.", but <pages index="Homer - Iliad, translation Pope, 1909.djvu", etc.. And even if it would, for the average user, use prefixes, keywords, etc., is not easy. On every page of Gallica (webpage of the French National Library) or Internet Archive, when you consult a book in their portal, there are two search bar: the first one for searching in the whole website, the second one (commonly integrated in the book viewer) for searching the book. — ElioPrrl (talk) 12:31, 17 January 2022 (UTC)[reply]
Your use case above does not adequately explain what you're trying to explain to me then, because I still don't get why searching in wikitext doesn't get you what you want. Izno (talk) 19:26, 17 January 2022 (UTC)[reply]
@Izno:: Are you used to Wikisource? In Wikisource, the text of a page is rarely contained in the wikitext, but is transcluded from another page. Example : this transclusion only contains <pages index="Homer - Iliad, translation Pope, 1909.djvu" from=35 to=51 />, a command that transcludes the text containes in this page and the following ones; and note that the reader normally never accesses the Page: namespace. Therefore, even when restricting results to subpages of a given page, using insource: to search transclusions of a book would not give any result: give it a try! In fact, the results are better without this parameter.
Anyway, this a very complicated way, only known by contributors, not by the average user. In the same way, it is possible, on IA or the BnF, to use the advanced search to find words in a given book; but all users of IA and BnF search books not this way, but by using the search bar integrated in the book viewer. I ask to emulate this functioning in Wikisource: a search bar integrated to transclusions. I advise you to consult SWilson's first solution below to understand what I mean. — ElioPrrl (talk) 09:55, 18 January 2022 (UTC)[reply]
  • It's actually sort of already possible to put a search box in the 'indicators' area (where the export button is). Here's a hacky version: https://en.wikisource.beta.wmflabs.org/wiki/Test_Book — although, it's only possible to set a prefix to search, rather than following all the rules that WS Export uses (e.g. finding subpages referenced within ws-summary sections). I'm not sure of the best way around that, because it'd be really slow to determine what the extent of the work is (and even then it's still not foolproof), but we could think about adding a 'search' button next to export which would display a search dialog with some more info and the search form. —SWilson (WMF) (talk) 06:08, 17 January 2022 (UTC)[reply]
winkThanks! You have completely understood my thought. No matter if the first solution is not totally reliable: it's easier to urge developers into improving an existing tool than creating it from scratch. I only thought that, in order to save developer's time, the lines of code used in WSexport to select the pages to be exported could be reused for this new purpose, but maybe it's not feasible. — ElioPrrl (talk) 12:31, 17 January 2022 (UTC)[reply]
The prefix parameter would be basepagename, not fullpagename. The intent is to searh pages with the same base page. Other than that, excellent.--Snævar (talk) 07:10, 20 January 2022 (UTC)[reply]
@SWilson (WMF) and Snævar: I can also notice that sometimes the books may have transclusions whose title does not begin with the title of the base page. In particular, in French Wikisource, sometimes the complete works of an author are transcluded under their different titles; for example, in the book whose table of contents is found in Complete Works of Molière, the different plays are transcluded in the pages Tartufe, Don Juan, etc., not Complete Works of Molière/Tartufe, Complete Works of Molière/Don Juan (this is something I criticise on French Wikisource, but the very major part of the contributors still creates pages this way, because of this issue: they are afraid that under the long title the readers could not find the page, and they don't want to waste their time creating redirections to prevent this). — ElioPrrl (talk) 12:02, 4 February 2022 (UTC)[reply]
  • @ElioPrrl, maybe I did not understand every aspect of your question, but look at a random ns0 page in it.wikisource, e.g. this Iliad: under the header you can see a search box available also by the layman to have a full search of a word in the entire work (here is the result for Aiace). Is this what you asked for? - εΔω 08:18, 4 February 2022 (UTC)
@OrbiliusMagister: It is indeed what I am asking for. It seems that it is added automatically by your template Intestazione, doesn't it ? It would be nice if it could be added automatically on every page of ns0, even those without any template. — ElioPrrl (talk) 12:04, 4 February 2022 (UTC)[reply]
@ElioPrrl: helping other users is my pleasure. On it.source the search box is actually an embedded Template:Ricerca, but you may just help yourself with it in many projects. - εΔω 20:13, 4 February 2022 (UTC)

Voting

Translation of texts published in another language

Discussion

  • @JLVwiki: Thanks for your proposal! I've machine-translated it into English and moved your original text to the Spanish subpage, in preparation for it being translated to other languages. SWilson (WMF) (talk) 03:45, 21 January 2022 (UTC)[reply]
  • Note that not all Wikisource projects permit user-made translations. It is a source of edit-wars in communities where there aren't. And a source of potentially scarce quality contents. So I recommend to check if there is a wide consensus in Wikisource projects for such an extension. --Ruthven (msg) 10:59, 25 January 2022 (UTC)[reply]

Voting

Stick the toolbar in the page namespace

Discussion

  • This seems like it would already be fulfilled by Vector 2022 (skin with sticky header) and the 2017 wikitext editor. Although I guess the new wikitext editor doesn't support ProofreadPage at all... I would prefer to invest in the new wikitext editor however. —TheDJ (talkcontribs) 16:02, 7 February 2022 (UTC)[reply]

Voting

Export of modernised texts

Discussion

  • I was going to say that this is achievable via template styles, but then realised that the crux of it is that it needs to be user-selectable at the time of export. That's right isn't it? It seems like it might be doable by adding a system of wikis being able to define multiple stylesheets in the way that they can currently define ebook.css, and then showing those as options in the export form. SWilson (WMF) (talk) 05:16, 12 January 2022 (UTC)[reply]
It's must be user-selectable, indeed: I indeed thought of a new option in the export form. I don't know if it can be managed by multiple stylesheets, because it's not a question of formatting : when clicking on the "Modernise" button, some text is found and replaced by some other text (e.g. avoient, old spelling, by avaient, modern spelling). But I'm not a technical man, maybe I'm badly understanding what you're saying. — ElioPrrl (talk) 17:19, 13 January 2022 (UTC)[reply]
  • There is an system in MediaWiki that converts text, Language converter. It is capable of converting text without creating a new page. It might be better suitable than an module.--Snævar (talk) 18:44, 12 January 2022 (UTC)[reply]
    I have just tested WSexport with Chinese Wikipedia, and WSexport do not support Language Converter either. (In fact there are no such option) C933103 (talk) 22:14, 15 January 2022 (UTC)[reply]
    Good point. LanguageConverter might be suitable for doing this for language variants (and actually we should probably look at implementing that anyway in WS Export) but I'm not sure it'd work for things like translating long S to normal S (that's not a language variant but a typographical archaism). That's done in some Wikisources by having a template output HTML for both variants and then hiding one or the other via Javascript: <span class="long-s">ſ</span><class="normal-s">s</span>, which is why I wonder if it could be done by making multiple stylesheets available in WS Export. This whole topic could definitely take some more investigation though! SWilson (WMF) (talk) 01:40, 17 January 2022 (UTC)[reply]
    Maybe I has not chosen my example judiciously. The typical action of modernisation modules is not replacing typographical variants (like s/ſ), but replacing words by another words (in French, avoient/avaient or tems/temps, in English shew/show or reflexion/reflection). — ElioPrrl (talk) 12:45, 17 January 2022 (UTC)[reply]
    Unfortunately, doing this with CSS has two disadvantages: (1) you replace a 2-byte character with 56 byte of HTML, so you can much easier hit transclusion limit (note, this would be transcluded twice: once from a template and second time using ProofreadPage page transclusion), and (2) the words containing this code will likely be non-searchable due to HTML markups inside the words. I personally also appreciate, if this can be done without creatig extra pages or an extra namespace, like it is done here+here. Ankry (talk) 20:30, 17 January 2022 (UTC)[reply]
    These are wise remarks. And indeed, on French WS, modernisation does not create extra pages, and we do not want this behaviour to change. ElioPrrl (talk) 10:04, 18 January 2022 (UTC)[reply]
    In Polish WS we tend to decide to create separate pages just due to current ws-export limitations. Hovewer, we would hapily withdraw from this. Ankry (talk) 11:36, 18 January 2022 (UTC)[reply]
  • You do want only the syntax char replacement. Maybe, the one easy way is only to replace chars in frontend of the page over Javascript. ✍️ Dušan Kreheľ (talk) 16:36, 26 January 2022 (UTC)[reply]
    I agree, that is one way to do this (and I think it's how it's already done on some Wikisources). But it has shortcomings for other things, such as exporting to other formats via WS Export (or any other tool that uses the rendered HTML). I'm sure we'll figure something out though! :-) SWilson (WMF) (talk) 00:57, 27 January 2022 (UTC)[reply]

Voting

Index creation wizard

Discussion

Thanks for creating this proposal! I went ahead and added the most robust description to the problem, based on the previous wishes you linked to-- feel free to edit if you feel the problem statement seems outdates or like it needs more details! The more detailed the description of the problem, the better- hence my edit. Thanks so much! NRodriguez (WMF) (talk) 17:36, 17 January 2022 (UTC)[reply]

Voting

Complete the development of the integrated feature to display other language editions in toolbar from wikidata

Discussion

Voting

Bibliographic Structured Data on Wikisource

Discussion

Hi. It is the renewed partly enriched proposal from Wishlist Survey 2021. We got some strong support last year. I hope some planing and development will take place step by step. --Jeb (talk) 09:36, 22 January 2022 (UTC)[reply]
WE-Framework is helpful to add metadata, but Ankry and other's said it's dangerous tool. Matlin (talk) 11:34, 22 January 2022 (UTC)[reply]

Voting

Ability to perform batch tasks on all (or selected) pages in a work

Discussion

  • You may also be interested in AutoWikiBrowser. It can find and replace texts in several pages, add code to the beginning or end of several pages, add/remove/substitute categories on several pages, etc. — ElioPrrl (talk) 20:24, 15 January 2022 (UTC)[reply]
  • @Pigsonthewing: Thank you for your proposal. Would you mind being a bit more specific about what kind of changes you are referring to? Thanks DMaza (WMF) (talk) 16:01, 19 January 2022 (UTC)[reply]
    • I'm thinking initially of complex find & replace actions (as I said, like those in Commons 'perform batch task'). For example, I recently had a work with something like {{rh||LOREM IPSUM DOLOR SIT AMET|}} on half the pages and {{rh||{{uc|Lorem Ipsum Dolor Sit Amet}}|}} on the other half, and I wanted to standardise them to the latter. In another example, suppose one transcriber had used {{ls}} in half the pages, and another transcriber had just used "s". We might want to replace all the templates with plain text. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:19, 19 January 2022 (UTC)[reply]
  • Just a side note, this can be done easily with WP:JWB and some basic knowledge of RegExp. AFAIK, JWB works on all public Wikimedia wikis as long as the pages are technically edit-able and you have AWB access. NguoiDungKhongDinhDanh 22:00, 30 January 2022 (UTC)[reply]

Voting

Fix search and replace in the Page namespace editor

Discussion

I strongly support this proposal. Fixing known bugs should be alwais prioritized. --Alex brollo (talk) 14:11, 18 January 2022 (UTC)[reply]

Well, in fact fixing known bugs should be a routine, not a subject of annual pleading. --Jan Kameníček (talk) 00:33, 29 January 2022 (UTC)[reply]
Totally agree... — ElioPrrl (talk) 11:06, 29 January 2022 (UTC)[reply]

According to T183950, the problem here is that PRP's Page: content model breaks assumptions other parts of Mediawiki make regarding the data of a wikipage. Specifically, Page: wikipages consist of three distinct sections: the header, main content area, and footer. Other parts of the stack fundamentally assume a wikipage is one complete part. To handle this PRP overrides (among other things) the text selection methods from the jquery.textSelection plugin, but so far it has only implemented the getSelection method because that was needed to make VisualEditor work (I am unclear on what specifically this need was; VE has at least some specific knowledge of PRP for other reasons, so it's entirely possible the getSelecion override isn't even necessary any more). What it's doing is concatenating the header, body, and footer so that it can provide VE a single text unit to operate on. But when the 2010 Wikieditor's search and replace function runs, it gets the same concatenated text, meaning that when it finds matching substrings within the text it finds a range that is offset relative to the first character of the header rather than the first character of the body. PRP does not yet override setSelection, so when the 2010 Wikieditor selects the matched text it uses this offset and ends up selecting a range that is off by the number of characters in the header (which includes <noinclude> tags etc., so even an apparently empty header will throw it off). The same holds true for when it tries to replace the found text.

As a working assumption, the fix is simply to implement overrides for the other jquery.textSelection methods in PRP, in the vicinity of /mediawiki/extensions/ProofreadPage/modules/page/ext.proofreadpage.page.edit.js L385. The overrides should presumably just need to track the offset caused by the header and adjust the value when calling through to the original method. If we want search and replace to work in header and footer fields we'd need to be a little more fancy, keeping track of which ranges correspond to which text field and mapping to the correct offset depending which field we're in. To be hyper-hyper fancy we'd need to add UI to the 2010 Wikieditor to allow toggling on and of searching in the header/footer; but I don't think there is any real need for this. Just getting search and replace working in the body will be a massive improvement.

This isn't really a bug per se. Multiple parts of the tech stack have changed over the years, leading to the missing functionality (setSelection override and friends) that didn't used to be a problem now showing up as seeming bugs in other components. In other words, AIUI this should be firmly within the CommTech CW scope and ought to be a nicely manageably-sized task. tpt is also familiar with the existing code there (judging by T183950 and git blame) and has historically been very generous with their limited time in answering questions about such things. I also believe SWilson and Matmarex have touched this code for various reasons and may be able to assess whether my understanding expressed above is at least approximately correct. --Xover (talk) 11:24, 5 February 2022 (UTC)[reply]

On French Wikisource, there is another button (RegEx, in the Aide à la relectureProofread tools tab) that performs search and replace without any bug. Maybe it suffices to implement it in every Wikisources. — ElioPrrl (talk) 19:21, 12 February 2022 (UTC)[reply]

Voting

Allow side by side display of different version of same text

Discussion

Voting

2023



IIIF support

Discussion

  • I wonder what it'd take to implement this in Thumbor? It was requested once, in 2016. Sam Wilson 07:33, 3 February 2023 (UTC)[reply]
  • It would be very nice if we can get support for at least the base api, to provide tiling etc. Even for Support 360 photo viewing support for tiling is a requirement we probably cannot go without. Thumbor with an additional /iiif/ endpoint providing support for the Image API would in my opinion be very interesting. Implementing region/size/rotation/quality/format etc should all be relatively doable with existing thumbor and imagemagick functionality (even though we have not implemented any of those thumbor apis for our own plugins yet). I think it would bring lots of benefits to the movement at large and for Commons, GLAM and wikisource in particular. —TheDJ (talkcontribs) 14:18, 3 February 2023 (UTC)[reply]
    For wikisource specifically, the presentation api might also be interesting, as it allows you to describe a book for instance. There is a nice api example in the specification of that. —TheDJ (talkcontribs) 14:25, 3 February 2023 (UTC)[reply]

Voting

Allow Thumbor to generate images at higher DPI

Discussion

Voting

Automatically detect blank pages

Discussion

Magnoliasouth (talk) 23:20, 10 February 2023 (UTC)[reply]
  • @Sohom Datta: What do you mean by auto-mark? At what level - index NS or Page NS? Automatic status change to "Without text"? What about scans containing, for example, illustrations, scores, notes, ..., which we mark "Without text", but also add other required data. Zdzislaw (talk) 14:53, 12 February 2023 (UTC)[reply]
    @Zdzislaw By auto-mark, I mean changing the status to "without text" for a specific page.
    The way I picture this working is via a prompt in the Pagelist Widget/Index edit screen (at the Index: level) or at the top of a Page: page (at a Pgae: namespace level) that you can click on to automark a page/a set of pages as "Without text". For now, I think it makes sense to include only pages that are blank or are wholely composed of illustrations, however, we can looking into encompassing other types of "without text" pages as well later. Sohom Datta (talk) 16:16, 13 February 2023 (UTC)[reply]

Voting

Implement Extension:ProofreadPage functionality for Parsiod (aka native support of tag:pages to Parsoid)

Discussion

Voting

Integrate PageImages with Wikisource

Discussion

Voting

Remember OCR column/region profiles

Discussion

  • @Sohom data: Just to clarify: the 'profile' here would be e.g. an index-page level store of rectangular page regions that could then be used on all pages to do the OCR, without users having to re-select the same regions over and over again on each page (as they can currently do, and what's more they do it via the Advanced options form and then have to copy and paste the text). Is that right? SWilson (WMF) (talk) 00:50, 1 February 2023 (UTC)[reply]
    Yep, that is what I meant by profiles :) Sohom Datta (talk) 05:35, 1 February 2023 (UTC)[reply]

Voting