Transkribus Model Creation and Training Guide

Translate this page

This page will make you familiar with the Transkribus interface. It can be used to transcribe documents, create and train new models or even test existing models on Transkribus.

General Overview of the Procedure

The entire process of creating and training a new model is quite extensive. This flowchart given below broadly details the various steps involved in the whole workflow right from getting the required model training data to making the model available on your Wikisource.

Flowchart depicting the workflow involved in creating and training a model

NOTE: Certain advanced processes like customizing shapes of polygons or editing baseline data are not mentioned in the flowchart for sake of simplicity. They will be detailed in their respective sections.

Prerequisites

The following are the prerequisites to creating and training a new model

Have a functional account on Transkribus with enough credits to perform OCR operations
Keep at least 5,000 and 15,000 words (around 25-75 pages) of transcribed material in your desired language ready to be uploaded
- If you are working with printed text and not handwritten text, a lower amount of training data will be needed (around 50 pages)
- Please note that the number of pages of a particular type for which the model is being created is crucial to the performance of the model
When creating a model for a particular style of handwritten text, ensure that all the manuscripts available are of that particular style only
And a lot of patience, for this is going to take some time!

Transkribus Tools

Transkribus

Formerly known as Transkribus Lite, the latest version of the web app, released on 30th August 2023, claims to be a feature rich editor cum model creation and testing workspace. It allows the user to transcribe documents, use existing models on their documents, perform model creation and training, and eventually publish a model of their own. All processes and workflows in this guide are related to the web interface, unless mentioned otherwise.

Transkribus Expert

The desktop client provided by Transkribus has everything that Transkribus can offer, including some features that the web interface is yet to accommodate. The modification of polygon data and advanced options to export/import data to collections and models are only available on the desktop client.

Overview of the Transkribus Interface

Transkribus has a feature rich web interface that provides a host of functionality including proofreading, text recognition, accessing models that deal with multiple languages, and experimenting with manuscripts. This is where you will spend the majority of your time as you prepare ground truth documents, build a new model, train it on the relevant documents, and validate its accuracy. Once you have logged in with a Transkribus account, you will be directed to a dashboard that looks similar to the one shown below. Don’t worry if you do not have any collections, yet!

Important terminology

While working with Transkribus it is important to be familiar with a few terms. All of them are not immediately relevant, but you can always come back for reference!

Document

Any image or page of a manuscript that is uploaded to Transkribus is considered a document

Collection

A collection is a group of related documents (e.g. of a particular language or style) that helps you to organize your work desk better

Baseline model

A Transkribus model that deals with only the baseline common to all the textual material in the document

Note Having a dedicated baseline model is helpful in some cases

HTR model

A Handwritten Text Recognition (HTR) model is what performs the actual OCR by detecting the handwritten text and generating the required output text

Note It is often used in tandem with a baseline model

Ground truth

All documents that have already been proofread and have correct transcription of text can be labeled as ground truth data, to form the basis of building a new model

Job

Any process run on Transkribus, like performing text recognition on a document, is classified as a job and is queued on the Transkribus server

Training set

Usually consisting of 90% of the entire data set, the training data contains documents that the algorithm uses to train a new model on a particular handwriting

Validation set

Usually consisting of 10% of the entire data set, the validation set contains documents on which the model validates its performance in recognizing the handwriting effectively

Epoch

The time period for which the model is trained on the training data is called epochs

Note Having a very high number of epochs can cause the model to be over-trained on the training set, causing it to perform poorly on new data

Uploading documents to a collection

The easiest way to add documents to Transkribus is by creating a new collection. Once you create a new collection on the Collections tab, you will be redirected to a screen as shown below.

The interface includes the following options (numbered accordingly):

Name of the collection you are currently working with
Click on Upload Document to upload new documents to the collection
An option to choose whether you are uploading an image or a PDF
Set title of the document you are uploading
As indicated, this allows you to upload file(s) to the collection

After the document is successfully uploaded, the collection screen should display the list of documents in that collection. Clicking on any of the documents will take you to a list of individual pages of the document, as shown in the figure below.

The user has the option to add or delete a page from the document, perform handwriting recognition (using an HTR model) on a page, set the status of the page to one of the four allowed page statuses, as well export a subset of the pages. Further options to filter the pages being displayed are available via the Filter option on the right side of the toolbar on the page.

Your Transkribus work area

Once you click on any of the documents under the Work Desk section on your Transkribus interface, you will be redirected to a screen as shown below.

It is where all the work related to your manuscript will take place. The interface includes the following options (numbered accordingly):

Cursor tool for moving the manuscript around
Pen tool to indicate baselines for your manuscript
Region selector tool to define the various regions in your manuscript
A tool to add tables to the manuscript regions
A button to provide more information and keyboard shortcuts
A layout editor that allows you to see your lines and regions in one place
Zoom controllers
Center your document with respect to the viewing area
Fit the document to the viewing area
Rotate your document
Change the view to full screen
Start transcription with an existing model
Option to download the existing document
A drop down to change the status of the page to one of the following
1. In Progress
2. Ground Truth
3. Done
4. Final
Save progress on your current document

Apart from these, there are also buttons to undo/redo changes, a virtual keyboard, and options to share your work.

Adding ground truth

Before training a model, you will need to prepare your training data, this means preparing enough images and their corresponding correct transcriptions to train the model. This process known as the addition of ground truth, ensures that the model can be trained on existing validated data.

This involves transcribing manuscripts using the Transkribus editor, and saving each page as ground truth. This indicates that the pages can be used to train your model. The process of transcribing on Transkribus using Wikisource as a reference is outlined below:

Open a manuscript of your choice on Wikisource, and have a local copy ready to be uploaded to your Transkribus collection
Once the document is uploaded to the collection, you can proceed to open the first page (or any page of your choice) to begin transcription. The page will open in an editor as shown below

Open the corresponding page on Wikisource in another window. You will need to work with both tabs open simultaneously
Once you have completed drawing regions and marking baselines satisfactorily, you can proceed with adding the corresponding text
For each line drawn on the Transkribus editor there should exist a line on the Wikisource page. The text from the relevant line of the Wikisource page is copied and pasted in the relevant line of the Transkribus editor. Continue this process until lines in every region marked in the Transkribus editor have corresponding transcribed text

The above video depicts how to transcribe text using Tranksribus and Wikisource simultaneously.

Training a custom model

Layout Recognition Model (optional)

This is an optional activity. If you are not sure whether your language requires a layout recognition model, please raise a ticket on Phabricator. The layout recognition/line detection model is primarily intended to be constructed if the handwriting or script is difficult to be trained upon directly and has varying placements of letters or characters. By default, Transkribus internally uses the Mixed Line Orientation model as the layout detection model. This works well for most Western scripts.

The process of training the layout model begins with a section as shown below.

Screenshot showing the beginning of the model creation workflow

Go to the Training section and choose a collection as prompted. Select the Baselines model option, as shown in Fig 2.
In the dialog box that appears, proceed to fill required details like model name (numbered 3 in the figure above) and description (numbered 4 in the figure above). The field named epochs (numbered 5 the figure above) determines how long the model will iterate over the provided data set.
The next step involves selecting the training data containing the corrected baselines that were prepared in the previous step. Select all relevant documents or collections that you want the model to learn from. Similarly, select the data set to be used for validation as well.
- NOTE: Ideally, 90% of the entire data available should be used for training while 10% should be used for validation.
Trigger the model training process

The training process takes a few minutes to complete. You can check the progress of the training process in the Jobs tab. Once complete, this job readies the layout recognition model that can further be used to create the main model!

Correcting layouts (optional)

After the training phase, Transkribus takes the generated text regions and represents them as polygons, offering the capability to modify these shapes. This functionality, however, is exclusively accessible within the Transkribus Expert Client, which provides advanced features for more intricate document processing.

highlighted polygonal ground truths

The region highlighted as 1 in the above figure showcases a chosen polygonal shape. It is important to note that these shapes are essentially composed of individual points linked by straight lines. The visualization consists of interconnected dots that form the outline of the polygon, with each straight line connecting two adjacent dots.

The tool referenced by 2 introduces the ability to include supplementary points to an already selected shape, enhancing the versatility of the tool. These added points can be positioned on either the text region itself or its baseline, allowing for a higher degree of precision in customization.

Should any adjustments be needed, the tool pointed to by 3 in the above figure removes a designated point from the chosen shape. This particular tool is particularly advantageous for refining or shortening baselines, ensuring they accurately correspond to the layout of the document.

The process of tailoring the shape to specific requirements involves the manipulation of these defining points. By relocating the points that make up the polygon, users have the flexibility to modify the shape to better match the contour of the corresponding text block.

In essence, the capability to adjust polygonal ground truths in Transkribus, facilitated through the Expert Client, introduces a multifaceted toolset. The combination of interconnected points forming polygons, the addition of new points, the freedom to move points, and the option to eliminate points provides an extensive range of controls.

In case of languages like Balinese and Javanese, this feature is particularly helpful as the script and its corresponding baselines are more erratic than in other Western languages. This helps to enhance the accuracy of the model being trained and, in turn, the transcribed text.