Grants:IdeaLab/Djvu text layer editor

status: idea
project:
please add a title
idea creator:
project contact:
alex.brollo(_AT_)gmail.com
participants:
summary:
Use some of VE features to edit djvu text layer
created on: 12:10, 13 March 2014

Project idea edit

What is the problem you're trying to solve? edit

Wikisource makes a large use of OCR text layer, but effectively uses just a little bit of it (naked text). Djvu text layer contains much more information (words, lines, paragraphs, regions, columno, page text coordinates), unluckily better exportable in a lisp-like format or as xml instead of hOCR.

What is your solution? edit

  • To test VE or other WYSIWYG simpler html/xml editors for editing text only, saving information wrapped into xml tags;
  • to test conversion extraction/upload of text layer into djvu files using a simple web interface.

Ideas for a test tool edit

A test could be done with existent tools:

  • djvuLibre (running into Tool Labs), and particularly:
    • djvutoxml, that extracts internal mapped text of djvu pages as an xml file;
    • djvuxmlparser, that loads back modified mapped text into djvu file;
  • tinyEditor, to edit xml text with a WYSIWYG comfortable interface (xml tags are hidden, only editable text is shown into any html textarea;
  • a little bit of cgi from Tool Labs to manage such a web editing interface.

Project goals edit

  • to split proofreading into two steps:
    • djvu text editing (saving the result into djvu text layer)
    • text formatting

Get involved edit

Welcome, brainstormers! Your feedback on this idea is welcome. Please click the "discussion" link at the top of the page to start the conversation and share your thoughts.

See also edit


Does this idea need funding? Learn more about WMF grantmaking. Or, expand to turn this idea into an Individual Engagement Grant proposal
Step 1. Change your infobox from IdeaLab to IEG:

Step 2. Create the rest of your IEG proposal:

Ready to create the rest of your proposal?
Use the button below just once to create the remaining sections you'll need!