Community Tech/Page Curation and New Pages Feed improvements
This page documents a project the Wikimedia Foundation's Community Tech team has worked on or declined in the past. Technical work on this project is complete.
We invite you to join the discussion on the talk page.
Background
editThis project was proposed in the Community Wishlist 2019 and was voted #1 with 157 votes. Community Tech team has committed to addressing as many of the project goals as possible.
In 2018, the Community Tech and Growth teams worked on a project aimed at general improvement of the AfC process. For more information, see the project page and background research for the project.
Problem statement
editThe wishlist proposal presents a broad goal of improving the New Page Review process and enlists key phabricator tickets that are important for the project. These tickets were prioritized and deemed important by the NPP community. These are listed below (in no special order).
Task title | Phabricator link | Notes |
---|---|---|
Redirects with RfD tags should still display in the New Pages Feed as 'Nominated for deletion' | task T157046 | Done |
'Potential Issues' flagged in Page Curation Toolbar Page Info flyout | task T207847 | Done |
Allow filtering by no citations in page curation | task T169120 | Done |
Send Message to creator without needing to 'unreview'/'re-review' the article | task T207442 | Done |
Page curation adds text to first deletion discussion page if it already exists | task T169441 | Done |
Implement addition of un-redirected pages to Special:NewPages and Special:NewPagesFeed | task T92621 | Declined after analysis (details below) |
Redirects converted into articles should appear in the New Pages Feed indexed by the date of creation and creator of the article, not of the redirect | task T157048 | Declined after analysis (details below) |
Adding a "Potential COI" alert to the feed | task T207757 | Declined. Solution proposed in T233115. No consensus reached. |
Add "previously deleted" as a possible issue (flagged in red) in the New Pages Feed/Page Curation Tool | task T189929 | Done |
Allow filtering by date range in Special:NewPagesFeed | task T167475 | Done |
Special:NewPageFeed - add option to filter by pageviews | task T207238 | Declined. Original proposal was out of scope. Alternative proposed in T230567. No consensus reached. |
Keyword Search for New Pages Feed | task T207761 | Declined after analysis (details below) |
Enable page curation tools to be loaded on any page (optionally) | task T207485 | Done |
Reviewer Notes system in Page Curation Tools: system for reviewers to flag talk page comments on new pages to other reviewers | task T207452 | Done |
Tagging Feedback in Page Curation Tools should also be sent to talk page | task T207443 | Done |
Page Curation Tools to add userspace CSD Log/PROD Log functionality | task T207237 | Done |
Dragable Corners on Page Curation toolbar windows (for resizing) | task T207439 | Done |
Page Curation toolbar: do not mark pages as 'reviewed' when adding CSD and PROD tags | task T208685 | Done |
Make PageTriage wiki agnostic | task T50552 | Declined after analysis (details below) |
Status updates
editDecember 17, 2019
editHello, everyone! We have an important update for the Page Curation community: The Page Curation Improvements project is now complete, after 7+ months of work. As of today, we have addressed all 19 requests (with explanations below). This project was a huge endeavor! We released 17 changes that fulfilled 13 separate requests. Additionally, we dedicated substantial time and resources to this project, and we collaborated with many community members.
For this project, our goal was to address all 19 requests. This meant that, whenever possible, we tried our hardest to fulfill each wish, which required: early investigations and mockups, finalization of the requirements (i.e., technical, product, and design), implementation of the work by engineers, technical review and testing, and release of the changes. This was followed by community outreach in order to validate the changes.
Unfortunately, some wishes were out of scope. They were simply too large or complex, so they were inappropriate for our team. However, we still wanted to address these wishes. This meant that we shared technical analyses of each wish, which outlined the primary challenges that we faced. We also tried to propose an alternative approach that was technically feasible, whenever possible, to the Page Curation community.
In total, we dedicated considerable effort to each request. We have shared the details of the remaining work below, and we thank you for all of your feedback!
Recently Completed Work:
edit- T207443: Feedback for creator should also be posted to article talk page: You can now post messages to the article creator within the Page Curation toolbar, which will be posted in the article talk page and the creator’s talk page.
- T167475: Allow filtering by date range in Special:NewPagesFeed: You can now filter by date range in the New Pages Feed.
- T231357: Page Curation should create a new AfD discussion page if one already exists: AfD discussion pages are now incrementally created (per each AfD discussion), rather than appending to the original one. In order to do this work, we first conducted an investigation, which determined that the work was feasible and within the scope of the team.
- T207485: Enable page curation tools to be loaded on any page (optionally): You can now see a new link under the Tools menu on article pages: "Add article to NewPagesFeed.” This link is on all article pages that are not already in the feed. When a user clicks on that link, the article gets added to PageTriage feed and the Curation toolbar becomes visible.
- T207237: Page Curation Tools to add userspace CSD Log/PROD Log functionality: We have completed our portion of the work, which is to add a hook for PageTriage after specific actions are taken (to allow access from gadgets).
- Explanation from Engineers: When you tag a page for speedy or proposed deletion using Twinkle, an entry is added to a log page in your userspace. It was requested that Page Curation do the same thing, and respect the user’s Twinkle preferences. It was determined that this kind of reliance on a gadget can’t happen from Page Curation. Instead, we added “hooks” to allow scripts and gadgets to integrate with Page Curation. Twinkle developers are now discussing adding the necessary code, which would implement the CSD/PROD logging functionality for users of Page Curation.
- T233729: Page Curation: Provide Customized Message to Previous Reviewer: After feedback from the Page Curation community related to T207442, we have restored the ability to send custom messages to previous reviewers (when a page is unreviewed) via Page Curation. Thank you, everyone, for explaining the importance of this functionality and working with us to find a reasonable solution.
- T239749: Update Informational Text (custom message to previous reviewer): This ticket was complementary to T233729. Now, users can see targeted messaging when a message is only posted to the previous reviewer's talk page.
Work That Presented Challenges:
edit- T207757: Adding a Potential COI Alert: As a team, we wanted to find a way to make this work, so we communicated a proposed solution, which we shared on Phabricator, Meta-Wiki, and Wikipedia. This proposal came out of careful discussion, (between engineers, product, and UX/design) about how we could implement a manageable solution, after first broadly discussing the request on Phabricator. However, this proposal never reached a general consensus among the Page Curation community, with some users expressing ambivalence regarding its usefulness. We didn’t want to continue with the work unless we felt there was strong community support, so we did not proceed with this request.
- T207238: Special:NewPageFeed - add option to filter by pageviews: The original request was too large in scope, and we outlined the reasons (from an engineering standpoint) in the August 20th update. At that time, we also presented an alternative proposal: Display the number of pageviews in the article record, without allowing for sorting or filtering. This alternative was shared on Phabricator, Meta-Wiki, and Wikipedia, and a ticket was written to potentially take on this work (T230567: Display Number of Pageviews in New Pages Feed). Like the COI alert alternative, this proposal came out of careful discussion, (between engineers, product, and UX/design) about how we could implement a manageable solution. However, this proposal never reached a general consensus among the Page Curation community, with some users expressing ambivalence regarding its usefulness. We didn’t want to continue with the work unless we felt strong community support, so we were unable to proceed with this request.
- T207761: Keyword Search for New Pages Feed: This work was ruled as out of scope, after analysis from the team.
- Explanation from engineers: Like all NPP requests, we hoped that we could make this work, but it’s unfortunately beyond the scope of the team. Keyword search is an extremely volatile operation with significant performance implications. To search through the entire content of a revision for each page, stored in PageTriage, is unmanageable (in terms of performance). This is because it requires combining data from two or more MediaWiki tables into the PageTriage table and searching on the combined data. Moreover, searching through a database depends on fields that are “indexable.” Searching through text fields is an extremely heavy operation; search within MediaWiki is not done through the text fields, but rather through other systems like CirrusSearch, which have internal mechanisms to allow users to search for text content. This cannot be easily used with PageTriage pages. Even if we only search through the snippet of content that PageTriage stores, this means the database operation will need to sift through tens of thousands of rows, looking into text-based content, which is a significant performance challenge, and cannot be easily implemented.
- T157048: Redirects converted into articles should appear in the New Pages Feed indexed by the date of creation and creator of the article, not of the redirect: This work was ruled as out of scope, after analysis from the team. In addition, an engineer attempted to work on it for two weeks, but it was found to be infeasible.
- Explanation from engineers: The main issue is with the MediaWiki software itself. It does not record when an article was created from a redirect, so we would need to build this system from scratch. Page Curation involves a lot of very complex code, and after a thorough analysis, it was decided the request was beyond the scope of the team. There were also concerns about performance. A task has been created to add this functionality into MediaWiki at T240065: Introduce redirect log, but we cannot guarantee that it will be accepted by any teams. Further notes related to this work can be found in the original Phabricator ticket.
- T92621: Implement addition of un-redirected pages to Special:NewPages and Special:NewPagesFeed: This work was ruled as out of scope, after analysis from the team. In addition, an engineer conducted an investigation of the work to see if it was feasible (findings can be found in the original ticket), though it turned out to be out of scope.
- Here’s an explanation from our engineers: This work is complicated, given PageTriage's current architecture. It can be broken into two parts: 1) storing a hash of a reviewed revision at the time at which it's marked reviewed; and 2) when a page in the queue is edited, checking to see if the new revision's hash matches the already-reviewed one and if it does then mark it reviewed. The first part seems reasonably solid and would only add complexity to the review process, although there looks likely to be an issue around wanting to have the same review details (reviewer, date, etc.) as the earlier review. It's the second part that is more complicated and may have performance implications. Because of the way things are processed and the structure of PageTriage, we don't have access to some of the required data in the required places, and so would have to query it even when it's not strictly required. For more details, please refer to notes in the Phabricator ticket.
Overall, this work is now complete. We’re thrilled that we had the opportunity to improve Page Curation for the English Wikipedia community. We learned a great deal about the important work you all do, and we’re glad that we could address all 19 requests. It is our hope that Page Curation is now a more flexible and manageable process. Thank you all for your help during this process, and we look forward to seeing how Page Curation continues to evolve in the future.
August 20, 2019
editIt’s been a few months, and we’re excited to post some updates. We’ve been continuously working on Page Curation & New Pages Feed improvements, and the team has made solid progress. With that in mind, we’ll share some recent news — both highlights and challenges — at this stage in the process. We look forward to your feedback on the Talk page. Thank you!
Work We’ve Completed So Far
editWe’ve been updating the project page table with the requests marked as “Done.” Before this update, we had 5 requests completed (T189929, T169120, T207439, T208685, and T157046). We now have a few more completed items:
- T207847: 'Potential Issues' from ORES should be flagged in Page Curation Toolbar Page Info flyout: You can now see the potential issues added by the Growth team (“Spam,” “Vandalism,” and “Attack”) flagged in the Page Curation toolbar. Also, the logic used to pull ORES data for the feed is the same logic used in the toolbar flyout. For this reason, the labels should have the same behavior as those displayed in Special:NewPagesFeed.
- T227218: Surface copyvio as possible issue in Page Curation toolbar "Page info" flyout: This work is complementary to T207847. With these changes, “Copyvio" was included included in the list of "Possible issues" in the Page Curation toolbar.
- T207442: Send Message to creator without needing to 'unreview'/'re-review' the article: The development work is complete, and the template (Sentnote-NPF) has now been updated by the community (and thanks for that!). With this work, we have decoupled two sections that were previously tied together: a) “mark as reviewed,” b) send a message to the creator process.
- T229779: Toolbar: Don't show "Add a message for the creator" without a textbox: This work is now on production. With this change, the “Add a message to the creator” is only shown when a textbox is provided for the message.
- T207452: Flag talk page feedback in toolbar: The development work is complete, and it will be released to production after Wikimania. Once it’s live, you’ll see talk page feedback flagged (and linked) within the info flyout. In addition, you’ll see a notification regarding the number of new Talk page messages within the toolbar itself (thanks to the great suggestion from Barkeep49).
Work That is Almost Complete
edit- T207443: Feedback for creator should also be posted to article talk page: The development work is complete, and the changes will be released to production soon. Once these changes are live, reviewers will able to post messages to the article Talk page in a new section.
- T207485: Enable page curation tools to be loaded on any page (optionally): This work is in advanced stages of development. We’re reviewing the code. When it’s complete, reviewers will be able to access the “Add article to NewPagesFeed link” in the Tools menu. When they click on the link, the article will be added to the PageTriage feed and the Curation Toolbar will become visible.
Work that Presents Challenges
edit- T50552: Make PageTriage wiki agnostic: We’ve discussed this request, and we unanimously feel that it’s beyond our scope. Here’s why (according to analysis from the engineering team): PageTriage, while a useful extension, is written in a way that’s completely based on English Wikipedia processes. In order to convert the extension to work on other wikis, the extension would need to be adjusted — not only for other processes, but also to have a configurable process definition that each wiki could define for itself, based on each community’s needs. Consequently, this request would require a slew of analyses and decisions, such as: what it means to tag an article for deletion (e.g. what pages messages goes to, what templates are used, if there are follow-ups the system should be aware of, etc), the way we tag articles, which articles show up in the queue, and more. Moreover, we couldn’t easily trim down the scope by disabling some features. The internal workings of the extension are deeply intertwined with English Wikipedia. We would still need to do a significant amount of development work to ensure that the behavior remained stable and useful to other wikis. For these reasons, this request is unfortunately too big, so we cannot take it.
- T207238: Special:NewPageFeed - add option to filter by pageviews and the associated spike: T225169: [4 hours] Investigate whether it's efficient to order by tag value (DBA input requested): This work presents significant challenges, but there may be an alternative solution.
- First, the challenges (according to analysis from the engineering team): In order to filter/sort by inputted numbers, the numbers must be stored in the database in a specific manner. This first step alone would take several weeks, if not months, according to the estimates provided by Wikimedia database experts. Then, we would need to populate the sortable cells with pageview data, which comes from an external service. To do this, we would need to create a process that pulls the data from the external service and stores it in MediaWiki’s PageTriage table. Then, we would do this work repeatedly, so that the numbers would remain up-to-date, over the entire PageTriage database (which consists of tens of thousands of rows, if not more). This process is both uncommon (in MediaWiki servers) and complex; we would need to define this process and identify the correct way to implement it, in collaboration with Operations and Database experts. In total, we do not find the request, in its current form, within our scope. For more details on the technical analysis and discussion with the database administrators, you can check out the associated investigation ticket.
- Second, the alternative solution (as described in the T225169 investigation): We could display the number of pageviews in the article record, without allowing for sorting or filtering. Would this be a satisfactory alternative to the community? And, if so, how would you like the number of pageviews displayed (e.g. average per day, median per day, total views in the last 30 days, etc)? Note that the results displayed will be from 24 hours earlier than the display time, and we’ll want to query from a maximum of 30 days ago (for the sake of general efficiency and manageability of this feature). We do not yet know if we can do this work — but, if we could, would it be worth our time and effort, in your opinion?
Requesting Feedback
editWe want to know your thoughts. Please let us know your thoughts on the Talk page. Thank you!
April 30, 2019
editThe Community Tech team has kicked off development work on this project. You may follow progress on the project tickets by looking at the phabricator board. I will also be updating the ticket status in the table above as things progress.
5 February, 2019
editThis project is in its early stages of research to investigate project goals, dependencies and potential roadblocks. Your feedback is welcome on the talk page.
12 March, 2019
editWe are beginning to assess technical feasibility of tickets prioritised in the wishlist proposal. The technical work on this project is slated to start in late April/early May.