Grants:Programs/Wikimedia Community Fund/Rapid Fund/Comprehensive anti-spam external link service (the Citron project) (ID: 22667747)

statusUnder review
Comprehensive anti-spam external link service (the Citron project)
proposed start date2024-07-20
proposed end date2024-10-20
budget (local currency)4950 USD
budget (USD)4950 USD
grant typeIndividual
funding regionESEAP
decision fiscal year2023-24
applicant• Plantaest
organization (if applicable)• N/A

This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the grantmaking web service of Wikimedia Foundation where the user has submitted their application. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.

Applicant Details

edit
Main Wikimedia username. (required)

Plantaest

Organization

N/A

If you are a group or organization leader, board member, president, executive director, or staff member at any Wikimedia group, affiliate, or Wikimedia Foundation, you are required to self-identify and present all roles. (required)

N/A

Describe all relevant roles with the name of the group or organization and description of the role. (required)


Main Proposal

edit
1. Please state the title of your proposal. This will also be the Meta-Wiki page title.

Comprehensive anti-spam external link service (the Citron project)

2. and 3. Proposed start and end dates for the proposal.

2024-07-20 - 2024-10-20

4. Where will this proposal be implemented? (required)

Vietnam

5. Are your activities part of a Wikimedia movement campaign, project, or event? If so, please select the relevant project or campaign. (required)

Not applicable

6. What is the change you are trying to bring? What are the main challenges or problems you are trying to solve? Describe this change or challenges, as well as main approaches to achieve it. (required)

External link spamming is a serious issue for wiki communities. As of now, the number of blocked links due to spam has reached 15,000 links on Meta's "Spam blacklist" (m:Spam blacklist), 6,000 links on English Wikipedia (en:MediaWiki:Spam-blacklist), and nearly 4,000 links on Vietnamese Wikipedia (vi:Đặc biệt:BlockedExternalDomains).

To address the issue of external link spamming on Vietnamese Wikipedia, patrollers typically manually check each edit and review each link within the edits. This is an extremely challenging and time-consuming task, and does not guarantee that all instances can be checked, especially as the methods of inserting spam links become increasingly sophisticated. Evidence of this is seen in a recent check conducted by one of our community’s checkusers, which confirmed that over 140 accounts were involved in a network providing link insertion services on Wikipedia (see vi:Wikipedia:Yêu cầu kiểm định tài khoản/Nhóm seeding quảng cáo). This is a very serious incident for the community.

To help resolve this issue, I propose developing a software project called Citron. The Citron project is a web service that can operate continuously and assist in evaluating wiki edits. Citron’s evaluation process uses heuristic rules, machine learning, and some supplementary data to determine whether the links in an edit are good or bad. If they are bad, Citron will report those links to the community, and administrators can add them to the blacklist. The advantage of Citron is that it can operate continuously, ensuring that no cases are missed.

Currently, the Vietnamese Wikipedia community has reached a consensus on the proposal to develop the Citron project. For more details, see here: vi:Wikipedia:Thảo luận/Dự án Thanh yên.

7. What are the planned activities? (required) Please provide a list of main activities. You can also add a link to the public page for your project where details about your project can be found. Alternatively, you can upload a timeline document. When the activities include partnerships, include details about your partners and planned partnerships.

I have written a brief description of the Citron project on this page: vi:Thành viên:Plantaest/Citron (in Vietnamese). On that page, I have listed important ideas to guide the future development of the project.

I plan to develop and complete the project within 3 months.

In the first month, I will undertake the following tasks:

  • Analyze the software requirements
  • Develop the basic functions of the back-end, including features such as continuously monitoring the RecentChanges stream, filtering edits that need to be checked, extracting links from edits, and generating reports for community pages.
  • Develop the link evaluation process and algorithms using heuristic rules such as whitelists, Whois information of domains, website rankings, as well as applying machine learning to support the evaluation process by examining the occurrence of characteristic phrases in a link.

In the second month, I will undertake the following tasks:

  • Develop a collaborative evaluation process through a user interface so that people can help verify links that Citron deems uncertain, ensuring that no suspicious links are overlooked.
  • Deploy the project on the Toolforge platform
  • Introduce it to the Vietnamese Wikipedia community to gather feedback

In the third month, I will continue to improve the software to complete the project:

  • Explore and develop the ability to deploy the software for multiple language wiki projects beyond Vietnamese Wikipedia
  • Write user guides for end-users
  • Gather feedback from the community to adjust features and fix bugs


8. Describe your team. Please provide their roles, Wikimedia Usernames and other details. (required) Include more details of the team, including their roles, usernames, Wikimedia group, and whether they are salaried, volunteers, consultants/contractors, etc. Team members involved in the grant application need to be aware of their involvement in the project.

I am the sole person working on this project: Plantaest (Vietnamese Wikipedia administrator and interface administrator).

9. Who are the target participants and from which community? How will you engage participants before and during the activities? How will you follow up with participants after the activities? (required)

I am the sole person working on this project. However, during the implementation process, I will need assistance, feedback, and review from the Vietnamese Wikipedia community.

To attract participation from community members, I will organize discussions in the community’s common area, which is this page: vi:Wikipedia:Thảo luận.

I will receive and address feedback from the community.

10. Does your project involve work with children or youth? (required)

No

10.1. Please provide a link to your Youth Safety Policy. (required) If the proposal indicates direct contact with children or youth, you are required to outline compliance with international and local laws for working with children and youth, and provide a youth safety policy aligned with these laws. Read more here.

N/A

11. How did you discuss the idea of your project with your community members and/or any relevant groups? Please describe steps taken and provide links to any on-wiki community discussion(s) about the proposal. (required) You need to inform the community and/or group, discuss the project with them, and involve them in planning this proposal. You also need to align the activities with other projects happening in the planned area of implementation to ensure collaboration within the community.

I have informed the Vietnamese Wikipedia community about this proposal, and the community has agreed to allow the project to be implemented. For details, see here: vi:Wikipedia:Thảo luận/Dự án Thanh yên.

12. Does your proposal aim to work to bridge any of the content knowledge gaps (Knowledge Inequity)? Select one option that most apply to your work. (required)

Not applicable

13. Does your proposal include any of these areas or thematic focus? Select one option that most applies to your work. (required)

Open Technology

14. Will your work focus on involving participants from any underrepresented communities? Select one option that most apply to your work. (required)

Not applicable

15. In what ways do you think your proposal most contributes to the Movement Strategy 2030 recommendations. Select one that most applies. (required)

Improve User Experience

Learning and metrics

edit
17. What do you hope to learn from your work in this project or proposal? (required)

With this project, I hope to learn the following:

  • Understanding how to reasonably apply machine learning to support evaluation processes
  • Gaining insights into collaborating with the community to develop software
  • Gaining a better understanding of Wikimedia's software systems
18. What are your Wikimedia project targets in numbers (metrics)? (required)
Number of participants, editors, and organizers
Other Metrics Target Optional description
Number of participants 10 They are volunteers who participate by providing feedback and reviewing the software's operation.
Number of editors 1 In this context, I think that 'editor' refers to a software developer, meaning only myself.
Number of organizers 1 I am the sole organizer of this project.
Number of content contributions to Wikimedia projects
Wikimedia project Number of content created or improved
Wikipedia 50
Wikimedia Commons
Wikidata
Wiktionary
Wikisource
Wikimedia Incubator
Translatewiki
MediaWiki
Wikiquote
Wikivoyage
Wikibooks
Wikiversity
Wikinews
Wikispecies
Wikifunctions or Abstract Wikipedia
Optional description for content contributions.

The software is expected to observe edits that meet the checking criteria, and I anticipate it can significantly improve articles by reporting inserted bad links. I have set the number of potentially improved articles during the initial testing phase of the project to be 50. After the testing phase, the software will continue to enhance articles as long as it remains operational.

19. Do you have any other project targets in numbers (metrics)? (optional)

No

Main Open Metrics Data
Main Open Metrics Description Target
N/A N/A N/A
N/A N/A N/A
N/A N/A N/A
N/A N/A N/A
N/A N/A N/A
20. What tools would you use to measure each metrics? Please refer to the guide for a list of tools. You can also write that you are not sure and need support. (required)

To allow everyone to track the software development process, I plan to post project activities on this page: vi:Thành viên:Plantaest/Citron.

To evaluate the effectiveness of the machine learning model in checking and classifying links, I plan to use basic metrics such as accuracy, precision, recall, and F1 score.

Financial proposal

edit
21. Please upload your budget for this proposal or indicate the link to it. (required)
22. and 22.1. What is the amount you are requesting for this proposal? Please provide the amount in your local currency. (required)

4950 USD

22.2. Convert the amount requested into USD using the Oanda converter. This is done only to help you assess the USD equivalent of the requested amount. Your request should be between 500 - 5,000 USD.

4950 USD

We/I have read the Application Privacy Statement, WMF Friendly Space Policy and Universal Code of Conduct.

Yes

Endorsements and Feedback

edit

Please add endorsements and feedback to the grant discussion page only. Endorsements added here will be removed automatically.

Community members are invited to share meaningful feedback on the proposal and include reasons why they endorse the proposal. Consider the following:

  • Stating why the proposal is important for the communities involved and why they think the strategies chosen will achieve the results that are expected.
  • Highlighting any aspects they think are particularly well developed: for instance, the strategies and activities proposed, the levels of community engagement, outreach to underrepresented groups, addressing knowledge gaps, partnerships, the overall budget and learning and evaluation section of the proposal, etc.
  • Highlighting if the proposal focuses on any interesting research, learning or innovation, etc. Also if it builds on learning from past proposals developed by the individual or organization, or other Wikimedia communities.
  • Analyzing if the proposal is going to contribute in any way to important developments around specific Wikimedia projects or Movement Strategy.
  • Analysing if the proposal is coherent in terms of the objectives, strategies, budget, and expected results (metrics).

Endorse