User:Tbayer (WMF)/Converting Google Slides to wikitext

Some notes on efficiently converting a Google Slides deck into a wiki page (developed in this case: Google Slides version ---> wiki version), while roughly preserving font styles, illustrations, tables, text/table coloring etc.

Here is the method I used successfully for this and this report:

Text and tables edit

  • Copypaste the deck (in edit mode) slide by slide into a LibreOffice Writer (not LibreOffice Impress) document. (Select a slide in the film strip on the left, click somewhere outside the slide in the content area on the right, select the slide's entire content with e.g. Ctrl+A - see Google Slides keyboard shortcuts - and paste into the LibreOffice document.)
  • Use the approach described here to export the resulting ODT file from LibreOffice as wikitext. This will preserve bold and italic text, hyperlinks and tables (including cell background colors, but borders need some more tweaks, see below).
  • Font sizes get lost and need to redone manually.
  • Improve table formatting e.g. by adding the wikitable class (example) and correcting the border thickness (example)
  • The slide structure gets lost. One may want to manually add horizontal lines (----) or section headings, in order to separate different slides.
  • While the content of text boxes is exported fine, they are not preserved as elements - in particular, their formatting (e.g. background color) gets lost. So in case they carry extra formatting, one needs to recreate the boxes manually in the wiki version, e.g. as spans and divs, and readd the formatting.

Images edit

  • Images can be extracted in bulk (as PNG or JPG) by exporting the deck as PDF and using the pdftohtml command line tool from Poppler. NB: Cropping of images done inside Google Drive will not be preserved, i.e. one needs to crop the exported images again.
  • For vector images, it's preferable to export them as SVG instead. One possibility is to copypaste each image into a Google Drawing document (File -> New - > Drawing), adjusting the page setup so that the image fits the canvas (not sure whether this can be done automatically), and exporting the resulting drawing as SVG via File -> Download.
  • Upload the images to Commons manually and insert them into the new wiki page.
Notes about what doesn't work
  • Google Slides does not offer an HTML export feature as of February 2015, and After a quick glance into the abyss that is the HTML code of a Google Slides deck as usually displayed in the browser, any attempt to directly use that HTML seems futile (e.g. even simple text seems to be present only in form of HTML5 Canvas path drawings there). NB: This also applies to the "lightweight" version one gets by using "Publish to the web".
  • As of May 2015, Google Slides actually offers a "HTML View" option (accessible under "View" in the menu bar, or by appending the "htmlpresent" suffix to the URL, example: "htmlpresent" vs. "pub"). It seems to be quite beta, e.g. tables that look OK in the normal presentation view are severely broken with content and borders mispositioned and overlapping. It's possible to copypaste tables from there into VisualEditor (where their positioning looks fine again), but formatting such as colors and fonts gets lost.
  • One can export tables as described here, by copypasting them from Google Slides into Google Sheets, exporting the resulting spreadsheet as ODF (LibreOffice file), "Save as" HTML from LibreOffice, excerpting the table's HTML code, and using it in a wiki page either directly or as converted by html2wiki. But this approach does not preserve (e.g.) cell colors and links.
  • Exporting as PDF and using pdftohtml:
    • export the deck as a PDF, using the PDF download feature of Google Slides
    • convert this PDF into HTML and extract the images at the same time, using the pdftohtml command line tool from Poppler
    • Use html2wiki or simply copypaste in VisualEditor to convert the PDF into a wiki page
    • Upload the images to Commons manually and insert them into the new wiki page
    • Unfortunately table structures get lost in the exported PDF, so this requires too much manual cleanup in case the deck contains many tables that need to be preserved.

Comments and tips welcome (I did not find anything useful at w:Wikipedia:Tools#Importing (converting) content to Wikipedia (MediaWiki) format or mw:Manual:Importing external content).