Research:Wikipedia Edit Types
This project seeks to reboot past work on automated classification of edit diffs -- namely Halfaker and Taraborelli -- to identify a basic taxonomy of edit types and set of language-agnostic detectors for each edit type such that they can be used to analyze edits on Wikipedia.
Edit Diffs and Detectors edit
The initial phase of the project focused on the technical implementation of processing Wikipedia diffs and mapping changes to basic edit types. The resulting Python library (mwedittypes) can identify insertions, removals, changes, and moves to the following types of nodes: tables, references, lists, formatting, categories, media, wikilinks, external links, templates, headings, comments, whitespace, punctuation, words (or characters), sentences, paragraphs, and sections.
Edit Categories edit
The second phase of the project focuses on taking the core edit types and mapping them to higher-order categories of edits. For instance, this might be identifying combinations of edit types that differentiate between edits that generate content versus those that maintain or annotate existing content.
Edit Summaries edit
The third phase of the project examines how edit types might be used to help improve edit summaries. It focuses on the hard case of auto-generating edit summary recommendations for edits that changed textual content on English Wikipedia.
Use Cases edit
The mwedittypes library can be used for a wide variety of different use-cases, some of which are mentioned below: