Research:MediaWiki events: a generalized public event datasource
This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.
Wiki-tool builders & researchers rely on various sources of information about what's happened and is currently happening in Wikipedia. These data sources tend to be structured in differently and contain incomplete or poorly structured information. Some datasources are queryable, but require complexity to "listen" to ongoing events while others are intended to only be used to "listen" to current events. In this project, we'll describe a common structure for public events in MediaWiki that mimics recentchanges
, but also contains historical information. We'll also explore means for implementing this functionality on top of existing datasources and propose changes to infrastructure that would allow us to improve efficiency and completeness of data.
Events edit
Available datasources edit
- API
list=recentchanges
-- Gathers a joined set of revision/logging and does some event metadata parsing- MySQL db
recentchanges
-- Sequences both revision and logging events.revision
-- Revision and page creation events.logging
-- All non-revision and page creation events.- RCStream -- see https://wikitech.wikimedia.org/wiki/RCStream
- IRC Stream -- see Research:Data#IRC_Feeds
- EventLogging -- see mw:Extension:EventLogging
Relevant events edit
- RevisionSaved
fields
|
---|
|
- RevisionsDeleted
fields
|
---|
|
- PageCreated
fields
|
---|
|
- PageMoved
fields
|
---|
|
- PageDeleted
fields
|
---|
|
- PageRestored
fields
|
---|
|
- PageProtectionModified
fields
|
---|
|
- UserRegistered
fields
|
---|
|
- UserRenamed
fields
|
---|
|
- UserRightsModified
fields
|
---|
|
- UserBlocked
fields
|
---|
|
- UserUnblocked
fields
|
---|
|
Desired functionality edit
Listening edit
for event in mw_events.listen(start="20140729000000"):
# do thing with event
if isinstance(event, RevisionSaved):
revision_saved = event
# do thing with revision_saved
elif isinstance(event, RevisionDeleted):
revision_deleted = event
# do thing with revision_deleted
else:
pass
Querying edit
events = mw_events.query(start="20140729000000", end="20140731000000", types={RevisionSaved})
for revision_saved in events:
# do thing with revision_saved
Dumps edit
events = MWEventReader("event_dump.enwiki.1.json.7z")
for user_registered in mw_event_reader.filter(types={UserRegistered}):
# do thing with user_registered
Relevant bugs edit
Standardization edit
- MediaWiki events
-
- consolidates domain knowledge and wiki archaeology
- hides complexity -- produces standardized data structures
- reads from MySQL database and api.php. Extendable to new formats.
- produces JSON
- provides a special Unavailable datatype to flag critical data that is not currently available
Support needed edit
- DBA's at the Wikimedia Foundation to explore means of publishing EventLogging infrastructure
- Developers in non-python languages to talk over cross-language API similarities
Ready to create a project page?