More comments: When extracting the JPEG for an individual file, that JPEG can be uploaded. But when the JPEG is baked into a PDF, no thumbnail is generated. Is it because of its size? Small pages (books) work fine, but newspapers (large pages) fail.
The problem is very easy to understand. I find a free, digitized PDF and upload it to Commons, then start to proofread in Wikisource. This always works fine for normal books, but when I try the same for newspapers, no image is generated. Apparently this is because the image has a larger number of pixels. I haven't tried to figure out what the limit is. --LA2 (talk) 21:36, 25 October 2019 (UTC)[reply]
For File:Finlands_Allmänna_Tidning_1878-00-00.pdf at least, ghostscript correctly rendered the file locally, but took a lot of time (Like a ridiculous amount of time. evince seems to render it instantly, so I don't know why ghostscript takes so long). So at a first guess, I suppose its hitting time limits. Bawolff (talk) 20:20, 25 October 2019 (UTC)[reply]
Maybe the solution is to fix ghostscript? Another way is to navigate around ghostscript and use pdfimages to extract the embedded JPEG images, and render them. Since JPEG rendering seems to work fine. I don't know. --LA2 (talk) 21:34, 25 October 2019 (UTC)[reply]
pdfimages is not a solution as a PDF page may consist of multiple images and it is hard to extract their relative location (at least not possible with pdfimages). Ankry (talk) 20:23, 9 November 2019 (UTC)[reply]
I was going to write to the National Library about this (I think I know at least one of the persons involved) but I don't observe this slowness on gs 9.27, I think: phabricator:P9760. Maybe I should try a non-dummy command. Nemo09:06, 27 November 2019 (UTC)[reply]
What about to provide for ProofReading more compact desight at all. Those seconds scrolling counts. If we have on one site the window with the extracted text and in the other site the same size window with scan in which you can zoom and move fast, that should save your time and be more attractive for newbies. The way it is now it looks kind of techy and in some cases difficult to handle. E.g. there should be also more content help or a link to discussion page covered in more attracitve design. Juandev (talk) 09:22, 4 November 2019 (UTC)[reply]
I think that a tool that allows to generate such thumbnails manually / on request / offline with much higher limits and available to a specific group of users (commons admins? a dedicated group?) maybe a workaround for this problem. Ankry (talk) 20:23, 9 November 2019 (UTC)[reply]
@LA2: -- This problem is occurring in highly compressed files and linked to the ocr layer. The fix consists of decompressing the file (so that the size in mb increases) and either flattening or removal of the ocr layer. I first tried flattening; it usually works but did not in this case; so I removed the ocr. Now it works. And yes, it is potentially usable for other files in your category. Extract the pages as png/jpg and rebuild the pdf. Hrishikes (talk) 01:39, 11 December 2019 (UTC)[reply]
Voting
Support Important issue for every project which relies on multi-page documents (PDF is a notoriously bad format but that's what we have in practice). It probably doesn't require much coding, but the Community Tech team could help by lobbying the appropriate WMF departments to get more resources assigned to the thumbnail generation. Nemo09:16, 22 November 2019 (UTC)[reply]
Support Not that I want to encourage more use of PDF, but we run into too many pointless problems with all multi-page formats (the majority with PDF it seems) and reducing this will reduce both wasted time and frustration (which often hits new contributors: the old hands have learned to avoid the pain points). Xover (talk) 05:54, 27 November 2019 (UTC)[reply]