Community Wishlist Survey 2020/Wikisource

Wikisource
28 proposals, 224 contributors, 781 support votes
The survey has closed. Thanks for your participation :)



UI improvements on Wikisource

  • Problem: Big part of work on WS is proofreading of OCR texts. Wikitexteditor2010 have some useful functions, but these are divided in more tabs:
    • Advanced - there is very useful search and replace button
    • Special characters - there are many characters which are not on keyboard
    • Proofread tools (page namespace only) - some more tools.
    When I am working on some longer text from OCR, there are typical errors, which can be fixed by search and repace (e.g " -> “ or ii -> n) . So I must use first tab. Now there is missing character from another language, so I must switch to second tab and find this character. Then I find next typical error, so I must again switch to first...
  • Who would benefit: Wikisource editors, but useful for other projects too.
  • Proposed solution: Proofread is probably made mainly on desktops (notebooks) which have monitor wide enough to have all these tools on one tab without need of switching again and again
  • More comments:
  • Phabricator tickets:
  • Proposer: JAn Dudík (talk) 20:59, 22 October 2019 (UTC)

Discussion

Hi, did you know that you can customize the edittoolbar to your liking? See https://www.mediawiki.org/wiki/Manual:Custom_edit_buttons. Also I use a search-replace plugin directly in a browser as this works better for me. See e.g. https://chrome.google.com/webstore/detail/find-replace-for-text-edi/jajhdmnpiocpbpnlpejbgmpijgmoknnl https://addons.mozilla.org/en-US/firefox/addon/find-replace-for-text-editing/?src=search I use the chrome one and it works alright for simple stuff. For more advanced stuff I copy the text to notepad++/notepadqq/libreoffice writer and do the regex stuff there.--So9q (talk) 11:26, 25 October 2019 (UTC)

Very late to the party, but nothing has seemed to change. There is a need to customize the edit bar per book and not per user. Every book has special requirements for proofreading that are static. A book on Cicero’s letters may need quick access to Greek polytonic letters, but a book on Mediaeval poetry will have thorn/ Wynn/yogh and possibly long s. A book of poetry will need immediate access to the poem tag, but not a novel. Therefore, we need a scripting language to be able to set the edit bar according to the needs of the work. This is how typesetting used to work and it did an outstanding job for centuries. Languageseeker (talk) 19:04, 10 March 2021 (UTC)

Voting

Repair Index finder

  • Problem: It's rather similar to the first proposal on this page; that is, for at least a month, the Index finding thingy is broken; whatever title you put in it, it says something along the lines of "The index finder is broken. Sorry for the inconvenience." (This is just from memory!) It also gives a list of indexes, from the largest to the smallest. The compromise I at any rate am using now is the index-finder installed in the search engine.
  • Who would benefit: Everybody who wants to find an index.
  • Proposed solution: Somebody who has a good knowledge about bugs? I'm not good at wikicode!
  • More comments: Excuses for any vague terminology - I am writing via mobile.
  • Phabricator tickets: task T232710
  • Proposer: Orlando the Cat (talk) 07:00, 5 November 2019 (UTC)

Discussion

Voting

Enable book2scroll that works for all Wikisources

  • Problem: book2scroll is not enabled for all Wikisource and not working for any non -latin wikisource. It is very useful for Page marking numbering in index: pages any more..
    Français: book2scrool n’est pas activé pour tous les wikisources et ne fonctionne pas sur les wikisources non-latin. Cet outil est très utile pour la numérotation du marquage des Pages dans l’index:page.
  • Who would benefit: Whole Wikisource community.
    Français: Toute la communauté wikisource.
  • Proposed solution: problem is that this code is very old (as in Toolserver-old), and only works with some site naming schemes. Other languages don't work either for many titles.
    Français: Le problème est que le code est très anciens (??? as in Toolserver-old), et ne fonctionne que pour la nomenclature de nommage de certains sites et ne fonctionne pas pour plusieurs titres.
  • More comments: same as previous year list
  • Phabricator tickets: phab:T205549
  • Proposer: Jayantanth (talk) 15:58, 26 October 2019 (UTC)

Discussion

Voting

Migrate Wikisource specific edit tools from gadgets to Wikisource extension

  • Problem: There are many useful edit tools gadgets on some wikisources. Many of these should be used everywhere, but...
    • Not every user knows, he can import script from another wiki.
    • Some of these script cannot be only imported, they must be translated or localised.
    • Majority of users will search these tools on en.wikisource, but there are many scripts eg. on it.wikisource too
  • Who would benefit: Editors on other Wikisources
  • Proposed solution: Select the best tools across wikisources and integrate them as new functions.
  • More comments:
  • Phabricator tickets:
  • Proposer: JAn Dudík (talk) 13:24, 5 November 2019 (UTC)

Discussion

It would be good to point to these gadgets or describe the proposed process to choose and approve propositions of gadgets to integrate. --Wargo (talk) 21:35, 17 November 2019 (UTC)
1) Ask communities for the best tools on their wikisource
2) Make list of them, with comments, merge potentially duplicates
3) Ask communities again which ones should be integrated.
4) Make global version and integrate it (eg as beta function)
There is one problem, single-wikis gadgets are often hidden for others due language barrier etc. JAn Dudík (talk) 21:31, 18 November 2019 (UTC)

Voting

Batch move API

  • Problem: On Wikisource, the "atomic unit" is a work, consisting of a scanned book in the File: namespace, a set of transcribed pages in the Page: namespace, an index in the Index: namespace, and hopefully also one or more pages in mainspace that transcludes the pages for presentation. This is unlike something like a Wikipedia, where the atomic unit is the (single) page in mainspace, period.
    ProofreadPage ties these together using the pagename: an Index: page looks for its own pagename (i.e. without namespace prefix) in the File: namespace, and creates virtual pages at Page:filenameoftheuploadedfile.PDF/1 (and …/2 etc.). If any one of these are renamed, the whole thing breaks down.
    A work can easily have 1000+ pages: if it needs to be renamed, all 1000 pages have to be renamed. This is obviously not something you would ever undertake manually. But API:Move just supports moving a single page, leading to the need for complicated hacks like w:User:Plastikspork/massmove.js.
    The net result is that nothing ever gets renamed on Wikisource, and when it's done it's only done by those running a personal admin-bot (so of the already very few admins available, only the subset that run their own admin-bots can do this, and that's before taking availability into account).
  • Who would benefit: All projects, but primarily the Wikisources; it would be used (via scripts) by +sysop, but it would benefit all users who can easily have consistent page names for, say, a multi-volume work or whatever else necessitates renaming.
  • Proposed solution: It would wastly simplify this if API:Move supported batch moves of related pages, at worst by an indexed list of fromto titles; better with fromto provided by a generator function; and ideally by intelligently moving by some form of pattern. For example, Index:vitalrecordsofbr021916brid.djvu would probably move to Index:Vital records of Bridgewater, Massachusetts - Vol. 2.djvu, and Page:-namespace pages from Page:vitalrecordsofbr021916brid.djvu/1 would probably move to Page:Vital records of Bridgewater, Massachusetts - Vol. 2.djvu/1
    It would also be of tremendous help if mw.api actually understood ProofreadPage and offered a convenience function that treated the whole work as a unit (Index:filename, Page:filename/pagenum, and, if local, File:filename) for purposes of renaming (moving) them.
  • More comments: For the purposes of this proposal, I consider cross-wiki moves out of scope, so, e.g., renaming a File: at Commons as part of the process of renaming the Index:/Page: pages on English Wikisource would be a separate problem (too complicated). Ditto fixing any local mainspace transclusions that refer to the old name (that's a manageable manual or semi-automated/user-tools job).
  • Phabricator tickets:
  • Proposer: Xover (talk) 12:41, 5 November 2019 (UTC)

Discussion

@Xover: Why sysop bit is needed here? I think the bot flag is enough unless the pages are fully protected. Ankry (talk) 20:45, 9 November 2019 (UTC)
@Ankry: Because page-move vandalism rises to a whole `nother level when you can do it in batches of 1k pages at a time. And for the volume we're talking about, having to go through a request and waiting for an admin to handle it is not a big deal: single page moves happen all the time, but batch moves of entire works would typically top out at a couple per week tops (ignore a decade's worth of backlog for now). Given these factors, requiring +sysop (or, if you want to be fancy, some other bit that can be assigned to a given user group like "mass movers" or whatever) seems like a reasonable tradeoff. You really don't want inexperienced users doing this willy nilly!
But so long as I get an API that lets me do this in a sane way (and w:User:Plastikspork/massmove.js is pretty insane), I'd be perfectly happy imposing limitations like that in the user script or gadget implementing it (unless full "Move work" functionality is implemented directly in core, of course). Different projects will certainly have different views on that issue. --Xover (talk) 21:28, 9 November 2019 (UTC)
  • «Problem: On Wikisource, the "atomic unit" is a work». In an ideal world yes, but not for MediaWiki until phabricator:T17071 is fixed. Nemo 09:07, 22 November 2019 (UTC)

Voting

Activate templatestyles by Index page css field

  • Problem: templatestyles extension is almost magic into wikisource environment, but there's the need to activate easily it into all pages of an Index.
  • Who would benefit: all contributors
  • Proposed solution: to allow optionally to fill Index page css field with a valid templatestyle page. A simple regex could be used to see if css field content contains a valid css or a valid page name.
  • More comments: Presently it.wikisource and nap.wikisource are testing other tricks to load work-specific templatestyles into all pages of an Index, with very interesting results.
  • Phabricator tickets: phab:T226275, phab:T215165
  • Proposer: Alex brollo (talk) 07:24, 9 November 2019 (UTC)

Discussion

  • Reproducing original books is inherently layout and formatting heavy, presenting books to readers is inherently layout and formatting heavy. Inline formatting templates are carrying a lot of weight right now, with somewhat severe limitations and very little semantics. Getting a good system for playing with the power of CSS would help a lot. --Xover (talk) 11:08, 9 November 2019 (UTC)

Voting

  •   Support Liuxinyu970226 (talk) 15:24, 25 November 2019 (UTC)
  •   Support Nice tools Marknamz8931 (talk) 15:39, 25 November 2019 (UTC)
  •   Support 16:36, 25 November 2019 (UTC)
  •   Support Ciao • Bestoernesto 17:43, 25 November 2019 (UTC)
  •   Support Xover (talk) 12:35, 27 November 2019 (UTC)
  •   Support Paperoastro (talk) 11:34, 1 December 2019 (UTC)
  •   Support (NB: some patches for this exist on nap.source) Ruthven (msg) 12:37, 2 December 2019 (UTC)
  •   Support Novak Watchmen (talk) 17:52, 2 December 2019 (UTC)

Make content of Special:IndexPages up-to-date and available to wikicode

  • Problem: 1. The content of Special:IndexPages (eg. s:pl:Special:IndexPages) is not updated after changing status of some pages in an index page until the appropriate index page is purged. 2. The data from this page is not available to wikicode. Its availability would make possible creation of various statistics / sortable lists or graphic tools showing the status of index pages by users. In plwikisource, we make this data available to wikicode via bot which updates specific teplates regularily; these extra edits would be able to be avoided.
  • Who would benefit: All wikisources, mainly those with large number of indexes
  • Proposed solution: Make per-index numbers of pages with various statuses from Special:IndexPages available via mechanism like a magic function, a LUA function or something similar.
  • More comments:
  • Phabricator tickets:
  • Proposer: Ankry (talk) 19:12, 9 November 2019 (UTC)

Discussion

Voting

Transcluded book viewer with book pagination

 
Vis-itwikisource
  • Problem: When we view a transcluded (NS0) book, its a normal view of wikilike environments. Most of the book reader or lover don't like this kind of view and navigation. They are always like a book, page by page view two-page view like a physical book. Every-time we go to the next page subpage. For Italian Wikisource create one js to view like this, Vis, View In Sequence (two-sided view of our page).
  • Who would benefit: Wikisource editors and readers
  • Proposed solution: Create Vis like default viewer, View In Sequence (two-sided view of our page).
  • More comments:
  • Phabricator tickets:
  • Proposer: Jayantanth (talk) 15:43, 11 November 2019 (UTC)

Discussion

Voting

Repair Book Uploader Bot

  • Problem: Book Uploader Bot was a valuable tool for the upload of books from Google-Books on Commons for Wikisource. It is not working for a long time and it takes a long time for uploading a book from: Google Books (you need to download the book in PDF, make an OCR, convert into a djvu, upload on Commons and then fill the information). From IA, we have IA upload. It is working but also have some issues from time to time.
  • Who would benefit: Contributors of Wikisources
  • Proposed solution: Repair the tool or build a new one
  • More comments:
  • Phabricator tickets:
  • Proposer: Shev123 (talk) 14:58, 10 November 2019 (UTC)

Discussion

Voting

Inter-language link support via Wikidata

  • Problem: Wikidata's inter-language link system does not work well for Wikisource, because it assumes that pages are structured the same way as Wikipedia pages are structured, and this is not the case.
  • Who would benefit: Editors and readers of all Wikisources, and editors and readers of Wikidata
  • Proposed solution:
    1. Support linking from Wikidata to Multilingual Wikisource
    2. Support automatic interlanguage links between multiple editions that are linked to different items on Wikidata, where these items are linked by "has edition" and "edition or translation of"
  • More comments: This was also proposed last year
  • Phabricator tickets: phab:T138332, phab:T128173, phab:T180304, phab:T54971
  • Proposer: —Beleg Tâl (talk) 15:47, 23 October 2019 (UTC)

Discussion

This issue causes a lot of confusion for new editors on Wikisource and Wikidata, who frequently set up the interwiki links incorrectly in order to bypass this limitation. —Beleg Tâl (talk) 16:12, 23 October 2019 (UTC)

@Beleg Tâl: great proposal ! For information @Tpt: is working on something quite similar (Tpt: can you confirm?), we should keep this proposal as this is important and any help is welcome but still we should keep that in mind ;) Cdlt, VIGNERON * discut. 14:47, 27 October 2019 (UTC)
HI! Yes, indeed, I am working on it as part of mw:Extension:Wikisource. It's currently in the process of being deployed on the Wikimedia test cluster before a deployment on Wikisource. It should be done soon, so, hopefully no need from the Foundation on this (except helping the deployment). Tpt (talk) 13:59, 30 October 2019 (UTC)
@Tpt: Fantastic, thank you!! —Beleg Tâl (talk) 17:22, 2 November 2019 (UTC)
  • FYI I repeated T54971, which I asked for several decades to try to support it. --Liuxinyu970226 (talk) 13:17, 3 November 2019 (UTC)
  • I would just notify that in svwikisource and plwikisource there are javascript-based implementations of multi-version interwiki and they seem to work fine if appropriate structures are available in Wikidata. Ankry (talk) 20:09, 9 November 2019 (UTC)

Voting

Index creation wizard

  • Problem: The process of turning a PDF or DjVu file into an index for transcription and proofreading is quite complicated and confusing. See Help:Index pages and Help:Page numbers for the basics.
  • Who would benefit: Anyone wanting to start a Wikisource transcription
  • Proposed solution: Create a wizard that walks an editor though the process of creating an index from a PDF or DjVu file (that has already been uploaded). Most importantly, it will facilitate creating the pagelist, by allowing the editor to go through the pages and identify the cover, title page, table of contents, etc, as well as where the page numbering begins.
  • More comments: This is similar to a proposal from the 2016 Wishlist, but more limited in scope, i.e. this proposal only deals with the index creation process, not uploading or importing files.
  • Phabricator tickets: task T154413 (related)
  • Proposer: Kaldari (talk) 15:32, 30 October 2019 (UTC)

Update June 2020: a project page has been set up for this at Wikisource Pagelist Widget.

Discussion

  • A wizard for initial setup is a good start, but an interactive visual editor for Index: pages, and especially for <pagelist … /> tags, would be even better. The pagelist is often edited multiple times and by multiple people, and currently requires a lot of jumping between the scan and the browser, mental arithmetic and mapping between physical and logical page numbers, multiple numbering schemes and ranges in a single work, etc. etc. A visual editor oriented around thumbnails of each page in the book and allowing you to tag pages: “This thumbnail, physically in position 7 in the file, is logically the ‘Title’ page”; “In these 24 pages (physical 13–37) the numbering scheme is roman numerals, and numbering starts on the first page I've selected”; “On this page (physical 38) the logical numbering resets to 1, and we're now back to default arabic numerals”; “This page (physical 324) is not included in the logical numbering sequence, so it should be skipped and logical numbering should resume on the subsequent page, and this page should get the label ‘Plate’”. All this stuff is much easier to do in a visual / direct-manipulation way than by writing rules describing it in a custom mini-syntax. --Xover (talk) 11:40, 9 November 2019 (UTC)

Voting

Vertical display for classical Chinese content

  • Problem: Most content in Chinese Wikisource is classical Chinese, which has been printed or written in vertical for thousands of years.
  • Who would benefit: Chinese and Japanese Wikisource. Other Wikimedia projects of languages in vertical display (like Manchu).
  • Proposed solution: Add vertical support to the Wikimedia software. To the proposer's knowledge, MediaWiki already supports right-to-left display of Arabic and Hebrew.

    A switch button on each page and "force" setting in Special:Preferences should be added to allow readers to switch the display mode between traditional vertical text 傳統直寫 and modern horizontal text 新式橫寫. Magic word will be added that allow pages to set its own default display mode.

    Hypothetical vertical Chinese Wikisource as follows. (In this picture, some characters are rotated but they should not.)

     

  • More comments:
  • Phabricator tickets:
  • Proposer: 維基小霸王 (talk) 13:59, 1 November 2019 (UTC)

Discussion

Voting

Improve workflow for uploading books to Wikisource

  • Problem:
Uploading books to Wikisource is difficult.
In the current workflow you need to upload the file on Commons, then go to Wikisource and create the Index page (and you need to know the exact URL). :The files need to be DJVU, which has different layers for the scan and the text. This is important for tools like Match & Split (if the file is a PDF, this tool doesn't work).
More importantly, the current workflow (especially for library uploads) includes Internet Archive, and the famous IA-Upload tool. This tool is now fundamental for many libraries and uploaders, but it has several issues.
As Internet Archive stopped creating the DJVU files from his scans, the international community has struggled solving the issue of creating automatically a DJVU for uploading on Commons and then Wikisource.
This has created a situation where libraries love Internet Archive, want to use it, but then get stuck because they don't know how to create a DJVU for Wikisource, and the IA-Upload is bugged and fails often.
Summary
    • IA-Upload tool is bugged and fails often when creating DJVU files.
    • M&S doesn't work with PDF files.
    • Users do not expect to upload to Commons when transferring files from Internet Archive to Wikisource.
    • Upload to Internet Archive is an important feature expecially for GLAMs (ie. libraries).
  • Who would benefit:
    • all Wikisource communities, especially new users
    • new GLAMs (libraries and archives) who at the moment have an hard time coping with the Wiki ecosystem.
  • Proposed solution:
Improve the IA-Upload tool: https://tools.wmflabs.org/ia-upload/commons/init
The tool should be able to create good-quality DJVU from Archive files, and do not fail as often as it does now.
it should also hide, for the end-user, the uploading to Commons phase. The user should be able to upload a file on Internet Archive, and then use the ID of the file to directly create the Index page on Wikisource. We could have an "Advanced mode" that shows all the passages for experienced user, and a "Standard" one that makes things more simple.
  • More comments:
  • Phabricator tickets: related: phab:T154413
  • Proposer: originally proposed by Aubrey (talk) in 2017 - re-proposed by Candalua (talk) 16:15, 6 November 2019 (UTC)

Discussion

Voting

Ajax editing of nsPage text

  • Problem: Dealing with simple pages editing, much user time is lost into the cycle save - load in view mode - go to next page that opens in view mode - load it into edit mode.
  • Who would benefit: experienced users
  • Proposed solution: it.wikisource implemented an ajax environment, that allows to save edited text and to upload next page in edit mode (and much more) very fastly by ajax calls: it:s:MediaWiki:Gadget-eis.js (eis means Edit In Sequence). It's far from refined, but it works and it has been tested into other wikisource projects too. IMHO the idea should be refined and developed.
  • More comments:
  • Phabricator tickets:
  • Proposer: Alex brollo (talk) 07:16, 25 October 2019 (UTC)

Discussion

  • I enthusiastically support - I have often wished that I could move directly from page to page while staying in Edit mode - it would be particularly useful for error checking: making sure, for instance, that every page in a range which could have been proofread by different people over a number of months or even years all conform to the latest format/structure etc. CharlesSpencer (talk) 11:03, 25 October 2019 (UTC)
  • I think this is a very good project specific improvement that can be made within the remit of community wishlist. Seems feasible as well. —TheDJ (talkcontribs) 12:55, 4 November 2019 (UTC)
  • This would be a great first step towards something like a full-featured dedicated "transcription mode", that would likely involve popping into full screen (hiding page chrome, navbar, etc.; use all available space inside the browser window, but don't let the page scroll because it conflicts with the independently scrolling text field and scanned page display, in practice causing your whole editing UI to "jump around" unpredictably), some more flexibility and intelligence in coarse layout (i.e. when previewing a page, the text field and scanned page are side by side, but the rendered text you are trying to compare to the scanned page is about a screenworths of vertical scrolling away), prefetching of the next scanned page (cf. the gadget mentioned at the last Wikimania), and possibly other refinements (line by line highlighting on the scanned page? We often have pixel coordinates for that fro the OCR process). Alex brollo's proposal is one great first change under a broader umbrella that is adapting the tools to the typical workflow on Wikisource, versus the typical workflow on Wikipedia-like projects: the difference makes tools that are perfectly adequate for Wikipedia-likes really clunky and awkward for the Wikisources. Usable, but with needlessly high impedance. --Xover (talk) 12:53, 5 November 2019 (UTC)
    @Samwilson: Could s:User:Samwilson/FullScreenEditing.js be a piece of this larger puzzle? I haven't played with it, but it looks like a good place to start. If this kind of thing (a separate focussed editing mode) were implemented somewhere core-adjacent, it might also provide an opportunity to clean up the markup used ala. that attempt last year(ish) that failed due to reasons (I'm too fuzzy on the details. Resize behaviour for the text fields got messed up, I think.). Could something like that also have hooks for user scripts? There's lots of little things that are suitable for user scripting to optimize the proofreading process. Memoized per-work snippets of text or regex substitutions; refilling header/footer from the values in the associated Index:; magic comment / variables (think Emacs variables or linter options) for stuff like curly/straight quote marks. In a dedicated editing mode, where the markup is clean (unlike the chaos of a full skin and multiple editors), both the page and the code could have API-like hooks that would make that kind of thing easier. --Xover (talk) 11:20, 9 November 2019 (UTC)
  • Thanks for appreciation :-). Really the it.wikisource eis tool - even if rough in code - is appreciated by many users. I like to mention too its "ajax-preview" option, that allows to see very fastly (<1 sec) the result of current editing/formatting and that allows too some simple edit of brief chuncks of text nodes (immediately editing the underlying textarea). Some text mistakes are much more evident in "view" mode that in "edit" mode, but presently Visual Editor is too slow to be used for typical fast editing into wikisource. --Alex brollo (talk) 09:43, 7 November 2019 (UTC)

Voting

New OCR tool

  • Problem: 1) Wikisource has to rely on external OCR tools. The most widely used one has been out of service for many months and all that time we are waiting, whether its creator appears and repairs it or not. The other external OCR tools do not work well (they either have extremely slow response, or generate bad quality text). None of these tools can also handle text divided into columns in magazine pages and they often have problems with non-English characters and diacritics, the OCR output needs to be improved.
    2) The tool hOCR is not working for wikisources based on non-Latin scripts. PheTool hOCR is creating a Tesseract OCR text layer for wikisources based on Latin script. E. g. for Indic Wikisource, there is a temporary Google OCR to do this, but integrating non-Latin scripts into our tool would be more useful.
  • Who would benefit: Wikisource contributors handling scanned texts which do not have an original OCR layer or whose original OCR layer is poor, and contributors to wikisources based on non-Latin scripts.
  • Proposed solution: Create an integral OCR tool that the Wikimedia programmers would be able to maintain without relying on help of one specific person. The tool should:
    • be quick
    • generate good quality OCR text
    • be able to handle text written in columns
    • be able to handle non-English characters of Latin script including diacritics
    • be able to handle non-Latin languages

Tesseract, which is an open source application, also has a specific procedure to training OCR which requires corrected text of a page and an image of the page itself. On the Wikisource side, pages that have been marked as proofread show books that have been transcribed and reviewed fully. So, what needs to be done is to strip formatting the text of these finished trascriptions, expand template transclusions and move references to the bottom. Then take the text along with an image of the page in question and run it through the Tesseracts procedure. The improvement would then be updated on ToolLabs. The better the OCR the easier the process is with each book, allowing Wikisource editors to become more productive, completing more pages than they could do previously. This would also motivate users on Wikisource.

Some concerns have appeared that WMF nearly always uses open source software, which excludes e. g. Abby Reader and Adobe, and that the problem with free OCR engines is their lack of language support, so they are never really going to replace Phe's tools fully. I do not know whether free OCR engines suffice for this task or not, but I hope the new tool to be as good or even better than Phe's tools and ideological reasons that would be an obstacle to quality should be put aside.

Discussion

I think this is the #1 biggest platform-related problem we are facing on English Wikisource at this time. —Beleg Tâl (talk) 15:09, 27 October 2019 (UTC)

Yeah. For some reason neither Google Cloud nor phetools support all of the languages of Tesseract. Tesseract in comparision to the wikisources is missing Anglo-Saxon, Faroese, Armenian, Limburgish, Neapolitan, Piedmontese, Sakha, Venetian and Min nan.--Snaevar (talk) 15:12, 27 October 2019 (UTC)

Note that you really don't want a tool that scans all pages for all languages as that is so compute-intensive that you'd wait minutes for every page you tried to OCR. Tesseract supports a boatload of languages and scripts, and can be trained for more, but you still need a sensible way to pick which ones are relevant on any given page. --Xover (talk) 07:27, 31 October 2019 (UTC)
I know. Both the Google Cloud and phetools gadgets pull the language from the language code of the wikisource that the button is pressed on and thus only uses one language. The same thing applies here. These languages are mentioned however so it is clear which wikisources this proposal could support, and witch ones it would not. P.S. I am not american, so I will never try to word things to cover all bases.--Snaevar (talk) 23:01, 2 November 2019 (UTC)

Even aside from the OCR aspect, being able to extract the formatting out of a PDF int wikitext would be highly valuable for converting pdfs (and other formats via pdf) into wikimarkup. T.Shafee(Evo﹠Evo)talk 11:19, 29 October 2019 (UTC)

I am not sure about formatting. Some scans or even originals are quite poor and in such cases the result of trying to identify italics or bold letters may be much worse than if the tool extracted just pure text. I would support adding such feature only if it were possible to be turned on and off. --Jan.Kamenicek (talk) 22:05, 30 October 2019 (UTC)

Many pages requires only simple automatic OCR. But there are pages with another font (italics, fraktur) or pages with mixed languages (e.g. Missal both in local language and latin), where would be usseful to have possibility of some recognizing options. This can be more easily made on local PC, but not everybody have this option. JAn Dudík (talk) 11:21, 31 October 2019 (UTC)

Would also be great to default the OCR formatting to match the MOS, rather than having to change it all to conform to the MOS manually. --YodinT 14:19, 25 November 2019 (UTC)

Voting

  •   Support Bodhisattwa (talk) 06:45, 21 November 2019 (UTC)
  •   Support JAn Dudík (talk) 07:15, 21 November 2019 (UTC)
  •   Support Le ciel est par dessus le toit (talk) 13:00, 21 November 2019 (UTC)
  •   Support Lyokoï (talk) 17:32, 21 November 2019 (UTC)
  •   Support Tpt (talk) 19:36, 21 November 2019 (UTC)
  •   Support: impossible to contribute since Phe’s tool is down. —Pols12 (talk) 21:03, 21 November 2019 (UTC)
  •   Support Pamputt (talk) 21:38, 21 November 2019 (UTC)
  •   Support Sadads (talk) 21:41, 21 November 2019 (UTC)
  •   Support Balajijagadesh (talk) 05:24, 22 November 2019 (UTC)
  •   Support Libcub (talk) 08:13, 22 November 2019 (UTC)
  •   Support Jahl de Vautban (talk) 09:22, 22 November 2019 (UTC)
  •   Support Lionel Scheepmans Contact French native speaker, sorry for my dysorthography 10:47, 22 November 2019 (UTC)
  •   Support Alan Talk 12:46, 22 November 2019 (UTC)
  •   Support JLTB34 (talk) 13:29, 22 November 2019 (UTC)
  •   Support GPSLeo (talk) 21:10, 22 November 2019 (UTC)
  •   Support DraconicDark (talk) 02:29, 23 November 2019 (UTC)
  •   Support FreeCorp (talk) 05:25, 23 November 2019 (UTC)
  •   Support Pavithra.A (talk) 12:14, 23 November 2019 (UTC)
  •   Support Emptyfear (talk) 17:12, 23 November 2019 (UTC)
  •   Support @ջեօ 17:15, 23 November 2019 (UTC)
  •   Support --Armenmir (talk) 17:27, 23 November 2019 (UTC)
  •   Support আফতাবুজ্জামান (talk) 23:18, 23 November 2019 (UTC)
  •   Support Liuxinyu970226 (talk) 10:26, 24 November 2019 (UTC)
  •   Support VIGNERON * discut. 10:40, 24 November 2019 (UTC)
  •   Support Pymouss Tchatcher - 11:38, 24 November 2019 (UTC)
  •   Support Eatcha (talk) 12:22, 25 November 2019 (UTC)
  •   Support --Bander7799 (talk) 12:34, 25 November 2019 (UTC)
  •   Support JogiAsad (talk) 13:27, 25 November 2019 (UTC)
  •   Support Murma174 (talk) 13:27, 25 November 2019 (UTC)
  •   Support Also in rtl language wikisource, do not insert ltr tags before punctuation marks. This causes problems. Naḥum (talk) 13:37, 25 November 2019 (UTC)
  •   Support --YodinT 14:19, 25 November 2019 (UTC)
  •   Support Blue Rasberry (talk) 15:32, 25 November 2019 (UTC)
  •   SupportMJLTalk 15:35, 25 November 2019 (UTC)
  •   Support Husky (talk) 16:12, 25 November 2019 (UTC)
  •   Support A garbage person (talk) 16:19, 25 November 2019 (UTC)
  •   Support 16:43, 25 November 2019 (UTC)
  •   Support Sgvijayakumar (talk) 19:09, 25 November 2019 (UTC)
  •   Support Ninovolador (talk) 21:27, 25 November 2019 (UTC)
  •   Support Vkalaivani (talk) 22:46, 25 November 2019 (UTC)
  •   Support Risker (talk) 05:03, 26 November 2019 (UTC)
  •   Support Geonuch (talk) 05:32, 26 November 2019 (UTC)
  •   Support Hsarrazin (talk) 14:31, 26 November 2019 (UTC)
  •   Support β16 - (talk) 15:08, 26 November 2019 (UTC)
  •   Support Thibaut120094 (talk) 16:51, 26 November 2019 (UTC)
  •   Support Noting that Community Tech forking and fixing Phe's tools will help precisely nothing in the long run. We need a WMF-supported tool that's within some WMF team's responsibilities to maintain and properly integrated into Mediawiki release cycles. Make use of volunteers where available, certainly, but someone at the WMF needs to own the OCR tool or it might as well stay a community gadget. Do please feel free to use this Wish to spend the necessary time kicking Phe's OCR tools until they start working again though. It's bound to be something stupid that's making it fail: like, has anybody tried to simply restart the tool? It could be hanging on a stale NFS file handle for all we know! Xover (talk) 06:10, 27 November 2019 (UTC)
    That is exactly what I hope is going to be solved. In this proposal I stated the problem: "Wikisource has to rely on external OCR tools" and proposed the solution: "Create an integral OCR tool that the Wikimedia programmers would be able to maintain without relying on help of one specific person." --Jan Kameníček (talk) 10:14, 1 December 2019 (UTC)
  •   Support Acélan (talk) 13:19, 27 November 2019 (UTC)
  •   Support Harkawal Benipal (talk) 16:08, 27 November 2019 (UTC)
  •   Support Indic Wikisource community members at Wiki Advanced Training 2019 asked for a Bulk OCR tool not dependent on platform (Linux, Windows etc.). I hope this tool allows Bulk OCRing pages. Satdeep Gill (talk) 16:43, 27 November 2019 (UTC)
  •   Support WhatamIdoing (talk) 16:55, 27 November 2019 (UTC)
  •   Support Pyb (talk) 18:05, 27 November 2019 (UTC)
  •   Support This would be number my #1 for wikisource. Of course it should be open source. Wellparp (talk) 19:03, 28 November 2019 (UTC)
  •   Support Peter Alberti (talk) 19:54, 28 November 2019 (UTC)
  •   Support 94rain Talk 12:53, 30 November 2019 (UTC)
  •   Support Satpal Dandiwal (talk) 21:07, 30 November 2019 (UTC)
  •   Support while also agreeing with Xover's thoughts. Mahir256 (talk) 07:37, 1 December 2019 (UTC)
  •   Support Candalua (talk) 16:35, 1 December 2019 (UTC)
  •   Support Rahmanuddin (talk) 06:49, 2 December 2019 (UTC)
  •   Support सुबोध कुलकर्णी (talk) 12:25, 2 December 2019 (UTC)
  •   Support Ruthven (msg) 12:41, 2 December 2019 (UTC)
  •   Support Sannita - not just another it.wiki sysop 13:19, 2 December 2019 (UTC)
  •   Support Jberkel (talk) 13:22, 2 December 2019 (UTC)
  •   Support Saederup92 (talk) 13:24, 2 December 2019 (UTC)
  •   Support Omshivaprakash (talk) 14:14, 2 December 2019 (UTC)
  •   Support Novak Watchmen (talk) 17:54, 2 December 2019 (UTC)
  •   Support --Yoosef Pooranvary (talk) 11:38, 19 November 2020 (UTC)

Repair search and replace in Page editing

  • Problem: Actually, "Search and replace", as provided by the code Editor (top left option in the advanced editing tab), just doesn't work when using it at "Page" namespace.

This is the basic tool to... search and replace text when editing, mass correct OCR mistakes, etc. It is simply not working.

  • Who would benefit: All editing users
  • Proposed solution: Reimplement the function, or fix the bug in the Mediawiki software.
  • More comments: There are some workarounds, as implemented in it.source, but they are new gadgets that mimic this basic functionality of MediaWiki.
  • Phabricator tickets: phabricator:T183950, phab:T198688 and phab:T212347
  • Proposer: Ruthven (msg) 11:44, 29 October 2019 (UTC)

Discussion

  • Extending the proposal: This would profit all Wiki-Projects.
    • I would suggest something more general: when I use Search and replace, I cannot go a step backwards anymore, in case my replace (or more importantly something before) was wrong. This is a general problem with the text-editor. Every time I use any of the already existing buttons (like Bold, or math or what so ever), I cannot do this step backwards. So, if I' m editing for sometime and then do something wrong and then use one of these buttons (or search and replace), I must do the whole work from the beginning, because I cannot go back to the mistake that I did before using one of these buttons. This is not the case with the visual editor, so, I think, it would be possible to change this in the texteditor rather easily.
    • There are only two options in search and replace: you can either replace one after the other, or the whole text. I would be really grateful if I could use search and replace only in a marked text (and not the whole one)Yomomo (talk) 22:24, 8 November 2019 (UTC)
    • About Search and replace. If I want to replace something with more lines, the new-line-mark will not be included. I don't know how difficult it is to change this, but it would be a profit to be able to replace parts also when they (and the new part) have more lines. Yomomo (talk) 14:52, 1 November 2019 (UTC)

Voting

Offer PDF export of original pagination of entire books

Français: Pouvoir exporter en pdf en respectant la pagination de l'édition source.
  • Problem: Presently PDF conversion of proofread wikisource books doesn't mirrors original pagination and page design of original edition, since it comes from ns0 transclusion.
    Français: La conversion en PDF des livres Wikisource ne reflète pas la pagination et le design original des pages de l’édition originale, car la conversion provient de la transclusion et non des pages.
  • Who would benefit: Offline readers.
    Français: Lecteurs hors ligne.
  • Proposed solution: To build an alternative PDF coming from conversion, page for page, of nsPage namespace.
    Français: Élaborer un outil pour générer un PDF alternatif provenant d’une conversion page par page.
  • More comments: Some wikisource contributors think that nsIndex and nsPage are simply "transcription tools"; I think that they are much more - they are the true digitalization of a edition, while ns0 transclusioni is something like a new edition.
    Français: Certains contributeurs de wikisource pense que nsIndex et nsPage sont simplement des « outils de transcription » ; je pense qu’ils sont beaucoup plus que cela – ce sont la vraie numérisation d’une édition, tandis que la transclusion ns0 constitue une nouvelle édition.
  • Phabricator tickets: T179790
  • Proposer: previous year proposer Alex brollo got voted 57, Jayantanth (talk) 16:03, 26 October 2019 (UTC)

Discussion

  • I think I would have actually Opposed this: I don't want to reproduce original pagination, we have the original PDF for that. For this proposal to make sense, to me, it would need to be about having some way to control PDF generation in the same way transclusion to mainspace controls wikitext rendering. I wouldn't necessarily want to reproduce each original page in a PDF page there (often, yes, but not always), and I might want to tweak some formatting specifically for a paged medium (PDF) that doesn't apply in a web page, or vice versa. In other words, I'm going to abstain from voting on this proposal but I might support something like it in the future if it was better fleshed out. --Xover (talk) 06:03, 27 November 2019 (UTC)

Voting