Research talk:Revision scoring as a service/Archived

Learn more about this page

Work log

[ refresh list ]

Archives

2014
2015

Progress report: 2015-10-17

Latest comment: 9 years ago1 comment1 person in discussion

Hello all, your weekly report.

We implemented translation template structure on meta for general interface of wikilabels and for edit quality campaign forms. This allows easier translation of the two UI pages. Furthermore it allows machine readable output for interface and edit quality campaign forms which can be quickly turned into pull requests. T115214
Chinese language compatibility was implemented which relies on CJK word tokanization. T110841 T111179
Interwiki links are now omitted from generated stop words. T109844

That was your weekly report. -- とある白い猫 ^chi? 14:34, 2 November 2015 (UTC)Reply

Progress report: 2015-10-24

Latest comment: 9 years ago1 comment1 person in discussion

Hello all, your weekly report.

Travis-ci fixed for revscoring. T116397 [1]
Testing coverage reports for revscoring. T116402 [2]
Draft implementation SigClust in Python. T113761
Features from revered edits extracted to be used in clustering. T110580
All Wikibase-related parts of pywikibot are pulled out to pywikibot/wikibase and used as a submodule. T108440

That was your weekly report. -- とある白い猫 ^chi? 14:41, 2 November 2015 (UTC)Reply

Progress report: 2015-10-31

Latest comment: 9 years ago1 comment1 person in discussion

Hello all, your weekly report.

Features for a balanced set of reverted/not-reverted edits in Wikidata extracted. T116983
Wikidata revert detection model trained and tested. T116980
Wikidata revert model deployed to ORES just in time as a present for the Wikidata:Third Birthday. T116984
Configurable logging setup support for ORES implemented. T108421

That was your weekly report. -- とある白い猫 ^chi? 14:50, 2 November 2015 (UTC)Reply

Thread on Toxic communities from wikimedia-l

Latest comment: 9 years ago2 comments2 people in discussion

See wikimedia-l thread: "On toxic communities"

There's a thread that started recently about aggressive behaviors in community spaces. See the wikimedia-l thread "On toxic communities". I replied to say that I thought there was some untapped potential in getting a dataset of on-wiki discussions -- possibly labeled by "toxicity" or "aggressiveness" -- out there for researchers to study and for machine learning projects like ours to try to make predictions with. Fluffernutter offered to help us gather a labeled dataset. I wanted to start a thread here to guage interest on three components of this:

Open "conversations" dataset: I've already started work on building a common talk page parser for Wikimedia projects. See github.com/halfak/talk-parser. If we could finish that parser and start releasing regular datasets it generates, that would lubricate the gears of science in this area.
Labeled data: If we have a good dataset of interactions between editors in discussion spaces, we could run various subsets through wiki labels to gather human judgement about the aggressiveness of conversations. This is something that Fluffernutter has volunteered to help us with. Such a labeled dataset would both help basic research into aggressiveness/toxicity and help us potentially build useful models for inclusion into ORES.
Revscoring/ORES model: We have a lot of options in revscoring since the "Scorer" pattern is very general. We don't just have to make basic probabilistic predictions about whether a discussion posting is "toxic" or not. A Scorer could also flag words/phrases that a user should be cautious about using when posting a message. Just so long as we can fit this "score" in a JSON document, we won't have to change revscoring or ores to handle it.

Thoughts? --EpochFail (talk) 16:41, 19 November 2015 (UTC)Reply

Glad to hear it. See also Fluffernutter's comments here: 2015_Community_Wishlist_Survey#Machine-learning_tool_to_reduce_toxic_talk_page_interactions. --Andreas JN466 05:29, 24 November 2015 (UTC)Reply

Progress report: 2015-11-07

Latest comment: 9 years ago1 comment1 person in discussion

Hello all, your weekly report.

We added language features for Dutch, German and Italian. T107590 T109367 T107591
Parallelism added to feature extraction. T117422
Duplicate clustering with old kmeans strategy T117253
Added trim() function for gathering basic (non-modified) features T117424
[Spike] Trained a model on sample of 100K edits for wb-vandalism T117258
[Spike] Figured out why clustering is behaving weird T118003
Compare R sigclust to python sigclust implementation. T118004

That was your weekly report. -- とある白い猫 ^chi? 07:38, 30 November 2015 (UTC)Reply

Progress report: 2015-11-14

Latest comment: 9 years ago1 comment1 person in discussion

Hello all, your weekly report.

We generated a Revert Model for German, Hebrew, Indonesian, Italian, Dutch and Vietnamese Wikipedias T118314 T118316 T118317 T118318 T116937 T118319
We deployed Edit Quality campaign model for ORES generated for Turkish. T118008
We launched an Edit Quality campaign on wikilabels on Russian, Ukranian, Spanish and Dutch Wikipedias. T116478 T114502 T114507 T115210
We established a backpressure for ORES has been setup to limit queue sizes in Celery. T115534
We deployed new revert models to ORES. T118564
We implemented soft threshholding in python sigclust T118583
Testing python sigclust for relationship between full cluster & damaging clusters T116403

That was your weekly report. -- とある白い猫 ^chi? 08:07, 30 November 2015 (UTC)Reply

Progress report: 2015-11-21

Latest comment: 9 years ago1 comment1 person in discussion

Hello all, your weekly report.

We expanded the number of features of WikiData reverted detector. T117254
Security Review of Revscoring and some dependencies T110072

More to come! -- とある白い猫 ^chi? 06:26, 2 December 2015 (UTC)Reply

Progress report: 2015-11-28

Latest comment: 9 years ago1 comment1 person in discussion

Picking up on a number of on going tasks that did not make it to last week's report...

We added language features for Estonian and Ukranian. T106844 T106837
We launched an Edit Quality campaign on wikilabels on Estonian, German and Japanese Wikipedias. T114504 T114499 T117999
We announced new models for your quality control/curation work. T118656
We helped prepare a revision scoring blog post on the Wikimedia blog: http://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/ T116441

That was your weekly report. -- とある白い猫 ^chi? 06:45, 2 December 2015 (UTC)Reply

Media coverage for Revscoring/ORES

Latest comment: 9 years ago2 comments2 people in discussion

Pageviews to ORES and Revscoring docs. A graph of daily pageviews for m:Objective Revision Evaluation Service and m:Research:Revision scoring as a service shows a sudden burst in interest after a post on the Wikimedia blog.

Hey folks! Over the past couple of weeks, I have been working with the WMF Communications department and User:DarTar to write a blog post about our project. After going through a few iterations, the comms team got kind of excited about the potential for media attention, so we reached out to a couple of reporters that we knew. Well, coverage of the project has blown up. I've lost count of how many interviews I have given. I'll use this post a sort of summary of the articles that are out there about the project. Please feel free to extend the list if you find any more articles. --Halfak (WMF) (talk) 17:12, 2 December 2015 (UTC)Reply

I created a dedicated subpage: Research:Revision_scoring_as_a_service/Media for easy transclusion, cross-linking etc)--DarTar (talk) 18:11, 6 December 2015 (UTC)Reply

Progress report: 2015-12-04

Latest comment: 8 years ago1 comment1 person in discussion

Hello all, your weekly report of our progress:

Flake8 of aetilley/sigclust committed. T118730 [3] [4]
Parameter tuning utility implemented in Revscoring. T119769 [5]

That was your weekly report. -- とある白い猫 ^chi? 15:05, 1 January 2016 (UTC)Reply

Progress report: 2015-12-11

Latest comment: 8 years ago1 comment1 person in discussion

Weekly report for your consumption!

Edit quality campaign for WikiData. T120531 Wikidata:Edit labels
Implemented an ORES testing server that can be run against any wiki in a testing environment for vagrant. T120956 [6]
Edit quality campaign for Italian Wikipedia launched! T114505 w:it:Wikipedia:Labels
Revscoring hyperparameter tuning for all of the feature/label sets in editquality datasets. T121009
We had a spike for experimenting with using bag-of-words badwords features and general NLP strategies. T102343
We Investigated an anomaly with vandalism detection on Water (Q283) because of bad scaling in some features, fixed with wb-vandalism PR #17 T118731

That was the weekly report. -- とある白い猫 ^chi? 15:42, 1 January 2016 (UTC)Reply

Progress report: 2015-12-18

Latest comment: 8 years ago1 comment1 person in discussion

Our weekly progress is detailed as follows.

We deployed a tuned random forest model for Wikidata. Tuning reports suggest that we can get a very high amount of fitness out of an RF model See [7] T121350
Init edit type campaign for English Wikipedia! w:en:Wikipedia:Labels/Edit types T117237
Edit quality campaign for Indonesian Wikipedia launched! w:id:Wikipedia:Labels T114506
Complete beta version of pcfg_scorer and approximate overhead. T121258
We deployed edit types pilot campaign for English Wikipedia to gather initial user feedback. T121713

That was your weekly report! -- とある白い猫 ^chi? 15:51, 1 January 2016 (UTC)Reply

Progress report: 2015-12-25

Latest comment: 8 years ago1 comment1 person in discussion

Presenting our progress for your consumption.

We implemented SemanticOperationsSelector for edit types campaign. T121403
We implement config merging for ORES (passwords and connection details), ORES should now be able to read multiple config files so that it can merge private or location-specific information into public configuration. T122272 [8], this also required a new release of yamlconfig [9]
Switch from AOF+RDB to RDB persistence strategy for ORES redistribution to minimize file usage for ORES redistribution cache. T121658
"monolingualtext datatype is not supported yet" bug is fixed. T118565

That was your last weekly report of 2015! -- とある白い猫 ^chi? 16:03, 1 January 2016 (UTC)Reply

Feedback, churnalism and 32C3 video on Watching Algorithms.

Latest comment: 8 years ago2 comments2 people in discussion

Prediction and Control - Watching Algorithms. Helsby (32c3)

"Autorenschwund in der Wikipedia: Algorithmen als Ursache und Lösung?". Netzpolitik. 2015-12-18.

"Mit ORES soll jetzt also ein Algorithmus helfen ein Problem zu lösen, für das andere Algorithmen zumindest mitverantwortlich sind." (Now with ORES, an algorithm is supposed to help solve a problem, that other algorithms are partly responsible for causing.)
The author is Leonhard Dobusch, co-author of Work-to-Rule: The Emergence of Algorithmic Governance in Wikipedia (Claudia Müller-Birn, Leonhard Dobusch, James D. Herbsleb, 2013).

I am quite frustrated with the awful en:Churnalism on ORES by german media, which is mostly blindly copying WMF blogposts and american media reports. German Wikipedia DOES NOT use revert-bots like enwiki, which makes all reference to Research:The Rise and Decline ("In order to maintain the quality of encyclopedic content in the face of exponential growth in the contributor community, Wikipedians developed automated (bots) and semi-automated tools (Huggle, Twinkle, etc.) to make the work of rejecting undesirable contributions waste as little effort as possible. (...) it was the successful implementation of algorithmic bureaucracy in form of bots that turned away larger portions of potential future editors.") pretty pointless with regard to German Wikipedia. --Atlasowa (talk) 20:36, 1 January 2016 (UTC)Reply

Hey Atlasowa, we don't say that automated tools are the cause, but rather an exacerbating symptom of a larger switch toward restrictive quality control and primarily negative feedback for newcomers. German Wikipedia has flagged revisions and Huggle (and other tools, I imagine). IMO, Huggle's quality control dynamics are much more problematic than auto-revert bots that mostly deal in egregious damage because Huggle users interact with what's left -- mostly good-faith newcomers. But I don't want to just point at this. I think that new page patrol is equally problematic. Any time we have a filter in place that primarily affects newcomers and is not designed to help them learn and contribute productively, they won't.

Regarding en:Churnalism, I agree. While it is maybe good for me and ORES that the media parrots our framing of what we are doing, I don't think it suggests good things for humanity/society as a whole. There are a lot of people who are critical of the politics of algorithms in social spaces who I have reached out to for comment. Regretfully, none of them chimed in, so the media is a Wikimedia Blog echo chamber for this round. --EpochFail (talk) 15:24, 21 January 2016 (UTC)Reply

Progress report: 2016-01-01

Latest comment: 8 years ago1 comment1 person in discussion

Presenting our progress for your consumption.

We investigated issue with the Dutch Wikipedia Edit Quality campaign. Dutch Wikipedia's edit quality campaign got loaded with the wrong revision IDs. T122511
We looked into error correcting output codes in SciKit Learn. (Spike) T105517
We investigated how Chinese writing variants are stored in Chinese Wikipedia. (Spike) T119687

That was your first report of 2016! -- とある白い猫 ^chi? 19:03, 21 January 2016 (UTC)Reply

Progress report: 2016-01-08

Latest comment: 8 years ago1 comment1 person in discussion

Weekly progress report is as follows.

We investigated wikidata's revert model's precision and recall to determine what portion of human edits will need to be reviewed. T122687
We introduced quality control and newcomer socialization tools with revscoring and ORES. These are as follows: T114246
- Quality control tools
- Newcomer socialization tools
- MediaWiki integration
- New model types
We created a Mediawiki Extension for Wikilabels. This eliminates the reliance to the custom user script which has proven to be confusing for some users. [10] T120664

That was the weekly report. -- とある白い猫 ^chi? 19:03, 21 January 2016 (UTC)Reply

Progress report: 2016-01-15

Latest comment: 8 years ago1 comment1 person in discussion

Weekly progress report is summarized below.

We resolved a bug where some revisions do not load in Wikilabels where as old revision info remains. T122815
We implemented word frequency diff features. This way badwords etc are treated based on the frequency they appear in the article so for example an article on a specific curse word or insertion of the word Nazi into articles on WW2 do not treat this addition the same as into other articles where the addition of such words are typically disruptive. T121003
We implemented common features between languages as a meta-language feature beyond simple space delimited words. This paves way for features for Chinese, Japanese, Korean (CJK) languages. T121008
We merged wb-vandalism features/datasources into revscoring. T122304
We implemented Meta datasource/feature refactoring for revscoring reducing code duplication. T121005
We implemented a balanced not-damaging/maybe-damaging edit extractor for "editquality". This is very useful for wikis dominated by bot edits, particularly smaller wikis. T120999
We added documentation on what Wikilabels "Campaigns" are for. T123129

That was the weekly report. -- とある白い猫 ^chi? 19:03, 21 January 2016 (UTC)Reply

Progress report: 2016-01-22

Latest comment: 8 years ago1 comment1 person in discussion

Documenting our progress for the week.

We created Rule and Symbol objects in pcfg.py. This Generalized types of rules that can be read into PCFG object. T123759
We resolved a bug where ORES "r" flag did not work when grouping in recent changes is disabled. T122766
We determined how to build WP phrase-structure tree-bank. [11] T122728
We built a simple GUI for ORES. [12] T123348
We worked out issues with Sphinx in generating Revscoring docs where attributes were not being documented. T123124 T123758
We investigated and resolved RDB snapshot issue on ORES T122666

This was your weekly dose of our progress. -- とある白い猫 ^chi? 19:03, 21 January 2016 (UTC)Reply

ORES UI visual JSON representation

Latest comment: 8 years ago3 comments2 people in discussion

Visual JSON mockup. A mockup of a JSON visualization is presented next to an ORES prediction

Ladsgroup has been working on a nice UI to sit on top of ORES. Currently, the UI uses a table to represent hierarchical data. I suggested we try some nested HTML divs or tables. I wanted to share a mockup of what I had in mind. See the mockup on the right. --EpochFail (talk) 15:53, 22 January 2016 (UTC)Reply

@EpochFail: That looks similar to e.g. Schema:Analytics. Maybe there is some code which can be reused? Helder 19:07, 10 February 2016 (UTC)Reply

Ooh! Good point. I was digging around looking for a library that would do this for us. I'll dig into the code that presents schemas to see if there's something we can re-use. Thanks for pointing that out! --EpochFail (talk) 20:09, 10 February 2016 (UTC)Reply

Why real-time catch is important

Latest comment: 8 years ago1 comment1 person in discussion

I'm writing this short essay, Research:Revision scoring as a service/Why real-time catch is important. please read and comment :) Amir (talk) 17:34, 23 January 2016 (UTC)Reply

Two proposals for new ORES behaviors

Latest comment: 8 years ago2 comments1 person in discussion

https://github.com/wiki-ai/ores/issues/128 -- Historical model variants

We should provide a means to give features/datasources to the ORES API that it will use when scoring. This will allow users to, for example, see how an editquality score changes for the same edit between anon & registered users or see how an articlequality score changes with a few more reference. I've posted two proposals (both of which I think are good) for how this could be accomplished. See also arlolra's WIP pull request: https://github.com/wiki-ai/ores/pull/115 --EpochFail (talk) 21:21, 9 March 2016 (UTC)Reply

https://github.com/wiki-ai/ores/issues/101 -- Provide datasource/feature overrides for scoring

This one has been bugging me for a long time. A user of ORES should never be surprised when we switch from, I.e., a LinearSVC model trained on a balanced set to a GradientBoosting model trained on a representative set, but these two models produce very different score ranges. Still, we should have the flexibility to deploy new modeling strategies. This proposal describes supporting multiple models for the same "modeling problem" in the form of "variants" that would allow ORES users to continue using the same URL pattern they know an love as well as providing them the ability to specify a "variant" that will give better guarantees against sudden changes. This strategy would also allow us to continue updating and refining "variants" as we add new sources of signal. --EpochFail (talk) 21:21, 9 March 2016 (UTC)Reply

FA questioned

Latest comment: 8 years ago2 comments2 people in discussion

i have nominated some articles based on revscore, but one got summarily rejected. en:Talk:Elizabeth_Catlett#GA_nomination. Duckduckstop (talk) 19:39, 4 April 2016 (UTC)Reply

Hey Duckduckstop! Sorry for the delay. Thanks for letting us know about the false-positive. Generally, I would rely on the prediction models to help give a gist of the quality of an article. In the end, a real review from human eyes will be necessary. Still, it looks like there's some work that we could do in looking for obvious grammatical mistakes and other issues that were brought up. I've filed a task for that. See Phab:T132533. --EpochFail (talk) 23:49, 12 April 2016 (UTC)Reply

Weekly update (April 8th)

Latest comment: 8 years ago1 comment1 person in discussion

Hey folks,

This is the weekly update for the Revision Scoring project for the week of April 2nd through April 8th.

New developments:

Solved some issues that block a major performance improvement for score requests using multiple models phab:T134781
Improved the performance of feature extraction for features that use mwparserfromhell phab:T134780
We applied regex performance optimizations to badwords and informal word detection for many languages phab:T134267

Maintenance and robustness:

Solved a regression in ScoredRevisions that caused most revisions in RecentChanges to not be scored phab:T134601
Set ORES load balancer to rebalance on 500 responses from a web node phab:T111806
Enabled CORS for error responses from ORES -- this makes it easier to report errors from a gadget on a wiki phab:T119325
Sade the staging instance of Wikilabels look a lot more like the production instance phab:T134627

Stay tuned --EpochFail (talk) 21:11, 10 May 2016 (UTC)Reply

[Cross-post] Including new filter interface in ORES review tool

Latest comment: 8 years ago1 comment1 person in discussion

The new filtering interface demo

Hey folks,

I made a post at mw:Topic:Tflhjj5x1numzg67 about including the new advanced filtering interface that the Collaboration Team is working on in the ORES beta feature. See the original post and add any discussion points there. --EpochFail (talk) 23:05, 18 November 2016 (UTC)Reply

Moved this page!

Latest comment: 7 years ago1 comment1 person in discussion

The new home for this team is mw:Wikimedia Scoring Platform team. See you there! --Halfak (WMF) (talk) 22:25, 16 May 2017 (UTC)Reply

Join my Reddit AMA about ORES

Latest comment: 7 years ago1 comment1 person in discussion

Hey folks, I'm doing an experimental Reddit AMA ("ask me anything") in r/IAmA on June 1st at 21:00 UTC. For those who don't know, I create artificial intelligences that support the volunteers who edit Wikipedia like ORES. I've been studying the ways that crowds of volunteers build massive, high quality information resources like Wikipedia for over ten years.

This AMA will allow me to channel that for new audiences in a different (for us) way. I'll be talking about the work I'm doing with the ethics and transparency of the design of AI, how we think about artificial intelligence on Wikipedia, and ways we’re working to counteract vandalism. I'd love to have your feedback, comments, and questions—preferably when the AMA begins, but also on the ORES flow board.

If you'd like to know more about what I do, see my WMF staff user page, this Wired piece about my work or my paper, "The Rise and Decline of an Open Collaboration System: How Wikipedia’s reaction to popularity is causing its decline" --EpochFail (talk) 15:42, 24 May 2017 (UTC)Reply

Add topic