User:A ka es/OpenRefine/wikimania2019 postersession

Poster Session at #wikimania2019 - empower yourself: first steps

edit
Description File
* Wikimania 2019 - Poster Session
* The Magic of OpenRefine
 
The Real Magic of OpenRefine

Installation

edit
Description Screenshot
Sources: Linux kit, Mac kit, Windows kit

Documentation for users, Installation Instructions:
"... it runs as a small web server on your own computer and you point your web browser at that web server in order to use Refine. So, think of Refine as a personal and private web application." Installation Instructions
 
start desktop

Acquiring Data

edit

stored at your own computer

edit

Source for data examples: (the-nerd.be)

Notes: You can open and upload more than one file at the same time: choose more than one (it is easier if the files are in the same file directory at your computer). This is a good process if the data structure in the files is equal.

"flat" data formats like .csv, .tsv, .xls, .xlsx, .odt

edit
Description Screencast
* Startsite OpenRefine
* column left: select "Create Project"
* select "Get data from - This Computer"
* main column: push "Browse..."-button
* choose the file from your local directory
* push "Next"
* process: uploading data => preview
* choose the data format (below the columns on the left side; mostly it is detected automatically)
* check the options below the columns, try out the best combination, update the preview
* if everything fits: name the project, set a tag (fields above the columns)
* push the "Create Project"-button on the right side)
import "flat" data formats

structured data formats like .xml, .json

edit
Description Screencast
* Startsite OpenRefine
* column left: select "Create Project"
* select "Get data from - This Computer"
* main column: push "Browse..."-button
* choose the file from your local directory
* push "Next"
* specify the data path in the preview window (hover at the curly brackets and choose per click, if all needed data are included)
* check the preview - if you miss something push the "Please specify a record path first"-button and start again
* if everything fits: name the project, set a tag (fields above the columns)
* push the "Create Project"-button on the right side)
import .json
import .xml

special case .html

edit
Description
* open the .html file in a browser
* copy the table-structure
* paste it in the clipboard

(see the next section)

copy & paste from tables

edit
Description Screencast
* copy a table structure from a source (website, .pdf-file, textfile, spreadsheet e.g.)
* Startsite OpenRefine
* column left: select "Create Project"
* select "Get Data from - Clipboard"
* paste the copied table structure in the clipboard window
* push the "Next"-button below
* process: uploading data => preview
* choose the data format (below the columns on the left side; mostly it is detected automatically)
* check the options below the columns, try out the best combination, update the preview
* if everything fits: name the project, set a tag (fields above the columns)
* push the "Create Project"-button on the right side)
copy & paste from tables

load data via API or URL

edit

Source for data examples: abgeordnetenwatch.de API parliaments

Notes: You can request more then one URL at the same time: push the "Add Another URL"-button and the next URL. If all URLs are in, push the "Next"-button. This is an good process if you are sure, that the data structure behind the URL is equal.

Description Screencast
* Startsite OpenRefine
* column left: select "Create Project"
* select "Get data from - Web Addresses (URLs)"
* paste or write the URL in the field
* push "Next"
* next step depends from the data format: select a data path or check options
* if everything fits: name the project, set a tag (fields above the columns)
* push the "Create Project"-button on the right side)
import from a single URL
import from more URLs at the same time

Exploring Data

edit
Description Screencasts
If you have a data project in OpenRefine you can explore and edit the content in many ways; the easiest are facets and filter.
facets
editing directly in facets
filter
You can cluster values to find failures and to correct them.
clustering values

Preparing Data

edit
Description Screencast
The file in the example came with the following note:
"Brussels phone numbers start with +32(0)228 45; change the 5 to 9 for the fax.
Strasbourg phone numbers start with +33(0)388 1 75; again, change the 5 to 9 for the fax."

We have to create the fax numbers and we have to delete the "@" in the Twitter user name.
enriching and changing data

Combining Data

edit
Description Screencast
There are two OpenRefine-projects: the file from the European Parliament, enriched with the Q-Numbers for the MEPs, and a wikidata query.
We want to combine both to know, which MEPs have an parliamentery term-entry in wikidata and where are the gaps.
We use the Q-Numbers as key.
combining two projects

Exporting Data

edit
Description Screencast
You can export your data with one click to many formats: as an OpenRefine-project to share with others, as common spreadsheet-formats or csv/tsv, as html-file. Or you can make your own choice of columns with an exporter.
data export

"Magic" (Bonus)

edit

regex

edit
Description Screencast
first impressions ... (content and screencast are coming soon)

GREL

edit
Description Screencast
first impressions ... (content and screencast are coming soon)

reconcilation services

edit
Description Screencast
first impressions ... (content and screencast are coming soon)

"about" section => editing meta data

edit
Description Screencast
If you work with many projects using the meta data and tags to organize them is very useful. If you missed the function: you can do this in the "about" section for every project.
organizing OR-projects