Research:MediaWiki events: a generalized public event datasource
This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.
Wiki-tool builders & researchers rely on various sources of information about what's happened and is currently happening in Wikipedia. These data sources tend to be structured in differently and contain incomplete or poorly structured information. Some datasources are queryable, but require complexity to "listen" to ongoing events while others are intended to only be used to "listen" to current events. In this project, we'll describe a common structure for public events in MediaWiki that mimics recentchanges
, but also contains historical information. We'll also explore means for implementing this functionality on top of existing datasources and propose changes to infrastructure that would allow us to improve efficiency and completeness of data.
This user has autopatrolled rights on MediaWiki.org. (list) |
link /list of all your own web pages that will help you to find the right place.
Expensive parse function count and easy to copy instructions or options /fit event
Events
editAvailable datasources
edit- API
list=recentchanges
-- Gathers a joined set of revision/logging and does some event metadata parsing- MySQL db
recentchanges
-- Sequences both revision and logging events.revision
-- Revision and page creation events.logging
-- All non-revision and page creation events.- RCStream -- see https://wikitech.wikimedia.org/wiki/RCStream
- IRC Stream -- see Research:Data#IRC_Feeds
- EventLogging -- see mw:Extension:EventLogging
Relevant events
edit- RevisionSaved
fields
|
---|
|
- RevisionsDeleted
fields
|
---|
|
- PageCreated
fields
|
---|
|
- PageMoved
fields
|
---|
|
- PageDeleted
fields
|
---|
|
- PageRestored
fields
|
---|
|
- PageProtectionModified
fields
|
---|
|
- UserRegistered
fields
|
---|
|
- UserRenamed
fields
|
---|
|
- UserRightsModified
fields
|
---|
|
- UserBlocked
fields
|
---|
|
- UserUnblocked
fields
|
---|
|
Desired functionality
editListening
editfor event in mw_events.listen(start="20140729000000"):
# do thing with event
if isinstance(event, RevisionSaved):
revision_saved = event
# do thing with revision_saved
elif isinstance(event, RevisionDeleted):
revision_deleted = event
# do thing with revision_deleted
else:
pass
Querying
editevents = mw_events.query(start="20140729000000", end="20140731000000", types={RevisionSaved})
for revision_saved in events:
# do thing with revision_saved
Dumps
editevents = MWEventReader("event_dump.enwiki.1.json.7z")
for user_registered in mw_event_reader.filter(types={UserRegistered}):
# do thing with user_registered
Relevant bugs
editStandardization
edit- MediaWiki events
-
- consolidates domain knowledge and wiki archaeology
- hides complexity -- produces standardized data structures
- reads from MySQL database and api.php. Extendable to new formats.
- produces JSON
- provides a special Unavailable datatype to flag critical data that is not currently available
Support needed
edit- DBA's at the Wikimedia Foundation to explore means of publishing EventLogging infrastructure
- Developers in non-python languages to talk over cross-language API similarities
Ready to create a project page?
See also
editReferences
edit- ↑ Bold text