Grants talk:Programs/Wikimedia Community Fund/Rapid Fund/Comprehensive link checking tool (ID: 22451570)

Latest comment: 5 months ago by Plantaest in topic Report

Follow-up Questions from ESEAP Regional Funds Committee and Programme Officer

edit

Hello @Plantaest,

Thank you for your interest in the rapid fund programme.

Your grant application has moved from the grant eligibility phase to grant review phase and it is undergoing the review process.

Would you be able to provide responses to the following questions? Thank you and looking forward to hearing from you.

Regards, Jacqueline on behalf of the ESEAP Funds Committee and Technical Engagement team

--

1) Would you be able to provide more details (i.e. context and explanation - what, why, when, how etc) to facilitate our understanding of your vision of building a ”modern”, “comprehensive” “web application”?


2) In your proposed first month plan, could you provide further information on how you plan to carry out the task and how much time would you would take to complete each segment of work listed?


3) Would you be able to share in greater detail how the “machine learning classification algorithms to categorize links” fits into the goal of a comprehensive link checker?


4) We observed (though could be mistaken) that the current engagement is targeting the Vietnamese community and could be from 3 years ago. Would you be able to provide updates on the recent feedback from wider community (maybe Village pump (technical)) to understand the importance of this project since it will be transferable to many languages? Would you also be able to provide other references that this proposal has been raised to the global technical community and also the ESEAP community?


5) Would you be able to share your thought process behind your proposed budget? For example how did you derive the unit cost (i.e. point of reference based on local context?), how do you estimate the number of work hours? How do you decide which elements of your contributions are voluntary and which parts are paid as a wikimedian? Are there tax implications for individuals receving a grant for this type of work in Vietnam?


6) Do you have a preliminary idea of who the 10 voluntary participants will be, how do you plan to select/ invite them to participate in the feedback process? And how do you plan to engage them throughout your project duration re providing support in terms of advice, feedback, and trying out the application?


7) In your report, apart from time spent on the project, how would the wider community be able to learn about a) the process of creating this application and b) the effectiveness of using the application? JChen (WMF) (talk) 07:36, 19 February 2024 (UTC)Reply

Thank you, Jacqueline, and all involved parties. I would like to respond to the questions as follows:
1) Would you be able to provide more details (i.e. context and explanation - what, why, when, how etc) to facilitate our understanding of your vision of building a "modern", "comprehensive" "web application"?
I think this is a rather complex question, as it seems you need specific information about "what it is." Here are some details to help you understand my intentions more clearly:
What: It is a link checker tool. Users provide the name of an article, the tool retrieves the content of the article (i.e., source code, such as wikitext and/or HTML-parsed text), extracts the links from that source code, and checks the status of those links. This is useful when you want to ensure the quality of an article, which could be used for purposes such as nominating it for "Good Article" or "Featured Article" status, or simply for maintaining articles you've written.
Why: As I mentioned in my proposal, the link checker tool that people used to use, Dispenser's Checklinks, has been defunct for a long time and is closed source. Therefore, I want to create an open-source tool with similar functionality, and possibly better (with a more modern interface), to support Wikipedia users in checking links. I've had this idea for a long time, but now I have the time and knowledge to pursue and build it.
When: I will start building the tool according to the timeline outlined in the proposal.
How: It is a typical web application, so it will have two main parts: back-end and front-end, both of which I will deploy on Toolforge. For the back-end, I am developing it in Java and using the Quarkus framework (quarkus.io). Quarkus is a popular framework in the Java world developed by Red Hat, suitable for building modern enterprise-grade web applications. For the front-end, I am developing it in TypeScript and using the React framework (react.dev). TypeScript is a modern version of JavaScript that makes building reliable and maintainable applications easier. React is a front-end framework developed by Facebook that makes it easy to create, manage, and maintain web applications. I am using the Mantine library (mantine.dev) to create the tool's interface. Deployment to the server will be done using a CI/CD tool like GitHub Actions to easily continuously update the versions.
My tool is a modern web application because: (1) it uses modern web technologies actively developed by the community like Quarkus and React; (2) it adheres to modern development processes and patterns (applying RESTful API, single-page application, optimistic UI, CI/CD pipeline); (3) it has a modern, visually appealing, and clean interface with the support of Mantine.
My tool is a comprehensive web application because it will address the need for "link checking" of Wikipedia users comprehensively. Users can use the tool to check, review the status of links and related information, and fix broken links in articles with correct (live) links suggested by the tool or filled in by users as they wish.
2) In your proposed first month plan, could you provide further information on how you plan to carry out the task and how much time would you would take to complete each segment of work listed?
Details on how I will carry out the tasks of the first month and the estimated time needed to complete them:
  1. Survey cases of broken links
    How: Research what a broken link is through the support of online resources.
    Estimated time: 1 day
  2. Analyze the application's requirements.
    How: Brainstorm to list the functions that the tool must have and should have. From there, select the necessary functions to develop and draw a Use Case diagram.
    Estimated time: 4 days
  3. Develop the back-end of the application, including exploring the mechanism for analyzing the source code of articles to extract a list of proper links, researching the use of machine learning classification algorithms to categorize links, implementing link-editing functionality, and creating RESTful APIs for client use.
    How: Developing the back-end is a complex task. In general, I need to thoroughly understand the goals of each task above and specify it for easy development. These tasks require research, analysis, and programming skills.
    Estimated time: 4 weeks
  4. Deploy the back-end to Toolforge.
    How: I have not had experience with Toolforge, so I need time to browse it and learn how to use it. I have experience deploying web applications to servers, so I think Toolforge will be similar.
    Estimated time: 2 days
3) Would you be able to share in greater detail how the “machine learning classification algorithms to categorize links” fits into the goal of a comprehensive link checker?
I think this is an interesting question. I learned this idea from Dispenser's similar tool, although I don't know exactly how he implemented the machine learning mechanism (as his software is closed source). Dispenser used machine learning to classify links into 3 groups: Working links (white and yellow labels), Broken links (orange and red labels), and Indeterminate (green and blue labels); and this mechanism achieved 98% accuracy (see: https://web.archive.org/web/20230818203737/http://69.142.160.183/~dispenser/view/Checklinks).
Similarly to Dispenser's tool, my tool will also apply machine learning models to classify links into separate groups, helping users visualize the status of links in articles more easily.
I have applied classic machine learning algorithms such as SVM, Logistic Regression, k-Means, Gradient Boosting, Random Forest, k-NN to classify Wikipedia platform edits based on information obtained from the ORES system, and have compared the effectiveness of each model. You can read more about this in this link: vi:Thành viên:Plantaest/Blog/Phát hiện sửa đổi phá hoại trên Wikipedia tiếng Việt (in Vietnamese). This is an internal research project at my university that I led, published in 2021 at a university-level conference, and I posted it on Wikipedia in 2023 for the community to reference my step-by-step process.
For this project, I will use Python libraries such as pandas, numpy, scikit-learn to create the model; and use the Tribuo library (tribuo.org) to apply the machine learning model created to the Java platform. Regarding the features of the model, I need time to survey and learn, which could be information about HTTP Status when calling a link, request time, response structure, etc.
However, I am not a machine learning expert, so the implementation of this part will mainly be at the "usable" level.
4) We observed (though could be mistaken) that the current engagement is targeting the Vietnamese community and could be from 3 years ago. Would you be able to provide updates on the recent feedback from wider community (maybe Village pump (technical)) to understand the importance of this project since it will be transferable to many languages? Would you also be able to provide other references that this proposal has been raised to the global technical community and also the ESEAP community?
Regarding the Vietnamese Wikipedia community, currently, the community is temporarily using an external tool called DeadLinkChecker (deadlinkchecker.com) that I introduced 4 years ago, see vi:Wikipedia:Ứng cử viên bài viết chọn lọc#Gợi ý, vi:Wikipedia:Ứng cử viên bài viết tốt#Gợi ý, and it is not suitable for the specific environment of Wikipedia (checking unnecessary links, difficult to find, cannot edit each link, cannot expand features) as complained by 2 members Nguyenhai314 and ThiênĐế98 (bureaucrat) in the discussion I sent in the proposal: vi:Thảo luận Thành viên:Plantaest/Lưu 4#Công cụ check link mới.
Regarding the international community, some related information that I have seen includes:
In general, I only proposed the proposal on Vietnamese Wikipedia because I mainly operate here. As for other areas, based on the popularity and current status of Dispenser's Checklinks tool, I believe that communities need a similar tool for their operational processes.
5) Would you be able to share your thought process behind your proposed budget? For example how did you derive the unit cost (i.e. point of reference based on local context?), how do you estimate the number of work hours? How do you decide which elements of your contributions are voluntary and which parts are paid as a wikimedian? Are there tax implications for individuals receving a grant for this type of work in Vietnam?
I calculated based on my current hourly income (my current job title at my company is Technical Architect).
The number of working hours is based on the free time I currently have, I have about 3-8 hours of free time a day. And with my management experience over the years, the number of 450 hours for this project is relatively reasonable. The possibility of delay is low, for example when I am sick (currently I am relatively healthy), or my work has problems (currently my work is relatively stable).
Regarding the question "How do you decide which elements of your contributions are voluntary and which parts are paid as a wikimedian?", I don't quite understand this question, but I believe that this entire project is supported by the WMF.
Regarding tax issues, I will consult a lawyer about this issue. As far as I know, I need to pay a small part of the grant to the tax authorities according to the "Personal Income Tax" regulations of the Vietnamese state.
6) Do you have a preliminary idea of who the 10 voluntary participants will be, how do you plan to select/ invite them to participate in the feedback process? And how do you plan to engage them throughout your project duration re providing support in terms of advice, feedback, and trying out the application?
10 people is an estimate, I think of some names like:
  • vi:User:NguoiDungKhongDinhDanh (eliminator on viwiki): He has good technical knowledge and often gives me a lot of advice when working together on Vietnamese Wikipedia.
  • vi:User:Hide on Rosé (eliminator on viwiki): He loves technical issues and often chats with me on Vietnamese Wikipedia.
  • vi:User:Baoothersks (eliminator), vi:User:GDAE (eliminator), vi:User:Nguyenmy2302 (eliminator), vi:User:Mongrangvebet (eliminator), vi:User:TUIBAJAVE: These are members with skills and experience in editing quality articles with tens of thousands of edits on Vietnamese Wikipedia, I need their perspective on how to use the tool for end users.
  • vi:User:Nguyentrongphu (administrator on viwiki): He is an administrator that I often interact with, he can give me some opinions about administrative perspectives in applying the tool.
  • en:User:Novem Linguae (administrator on enwiki), en:User:SD0001: Kind Wikipedia users who have supported me on English Wikipedia, thanks to which my TwinkleMobile tool has been widely spread to the English Wikipedia community with hundreds of users.
I will invite these people by sending letters, and posting announcements in public areas such as en:Wikipedia:Village pump (technical) and vi:Wikipedia:Thảo luận.
The way I engage them is by asking specific questions so that relevant people can evaluate whether the tool can be used or not, whether it can solve the problems they want or not, and I will ask more about their expectations, as well as how to fix errors that may occur when they try to use the tool. With these people, I think this communication is normal in the Wikipedia community, as volunteering to contribute to something here is very simple and comfortable.
7) In your report, apart from time spent on the project, how would the wider community be able to learn about a) the process of creating this application and b) the effectiveness of using the application?
Regarding issue a, I need to keep a log of what I do during the time of building the tool. At the appropriate time, I will write a blog post about the process I carried out this project, step by step; like some of my articles at: vi:Thành viên:Plantaest/Blog.
Regarding issue b, I think this is a future issue, but can be predicted. With the popularity of Dispenser's Checklinks tool when it deeply participates in some operational processes of many different wikis, I think the tool I make will also have a similar level of popularity and effectiveness. It is certainly a useful and meaningful tool for Wikipedia communities. I may need to conduct a survey to let the community know the effectiveness of the tool.
Sincerely, Plantaest (talk) 13:36, 19 February 2024 (UTC)Reply
Hello @Plantaest,
Thank you for your quick and detailed response.
On question 5, we observed you require further clarification on How do you decide which elements of your contributions are voluntary and which parts are paid as a wikimedian? We hope to understand which parts of your contribution to the movement do you assess should be paid (i.e. funded by a grant for example) vis-a-vis which parts of your contribution do you assess to be unpaid (i.e. pro bono)? How do you decide?
Thank you.
Regards,
Jacqueline on behalf of the ESEAP regional funds committee JChen (WMF) (talk) 03:10, 21 February 2024 (UTC)Reply
Hello Jacqueline and the Committee,
I think this question is similar to the question: "Why do you contribute for free to Wikipedia in particular and the Wikimedia movement in general?"
I contribute for free to the Wikimedia movement because of my passion for knowledge. I have been contributing to Wikipedia for 14 years anonymously using an IP address when I was a middle school student, and I created an official account 8 years ago (2016). I can spend weeks translating a long article with more than 20,000 words like en:Chloroplast into the Vietnamese version vi:Lục lạp, and I also carefully translate vector illustrations using Inkscape. The cost of doing such a translation work is relatively expensive in the current market, especially for translating scientific documents (I have a bachelor's degree in biology in addition to my main field of IT). Meanwhile, I do this work for free because I am passionate about knowledge and want to share that knowledge with everyone.
Over time, in addition to writing articles, I have continued to take on additional management tasks with administrator and interface administrator rights as nominated by the community. At this point, I realize that there are many issues that need to be addressed, but there are not enough resources (time, knowledge, manpower) to do so. These tasks are often technical issues, such as fixing bugs in templates, Lua modules, scripts, software, tools, or developing new tools or features for existing tools. These are labor-intensive tasks that take a lot of time because they are often unpredictable, influenced by external factors outside of Wikipedia, or may affect a certain policy, requiring community input to resolve. For example, there was a time when someone asked me about changing the name of a category, but it was related to the Unicode CLDR database, and if I wanted to change it, I would have to propose it to CLDR, or implement stopgap solutions, but my time was limited so I couldn't do it (source: vi:Thảo luận Thành viên:Plantaest/Lưu 4#Nhờ một chút).
Since Dispenser's Checklinks tool stopped working in 2020, the community has struggled to find a replacement solution. First, they used a copy at an IP address of Dispenser's tool, but the IP address was often interrupted and difficult to access, and last year it was no longer accessible. Second, like the Vietnamese Wikipedia community, they used external tools like DeadLinkChecker that I introduced at the time Dispenser's tool stopped working in 2020, but it has many drawbacks and complaints as mentioned above. Even if Dispenser's tool is working again in some copy, the author has been absent for 4 years, making software bug fixes seem impossible; the Vietnamese Wikipedia community has complained many times about bugs but no one has been able to fix them.
That's why I wanted to build a specialized link checker tool for the Wikipedia community. And this is a complex technical task, it takes time and effort to learn knowledge, understand technologies, and exchange with the community. My time is limited, I need time to work, relax, and socialize outside; I can't spend too much time on Wikipedia, except for simple tasks (reverting a disruptive edit, blocking a disruptive IP), or out of passion (like writing and translating articles). I could spend more time if I were a student, but unfortunately, I am working; and even if I were a student with more free time, I wouldn't have enough knowledge to do projects like this. I'm just an ordinary person, not a genius.
Therefore, in my opinion, relatively complex technical tasks should be funded by WMF because they consume a lot of resources to implement, are not easy to do, and are often quite boring, like the way the Foundation has funded some technical projects like Grants:Project/Rapid/Chlod/Contributor copyright investigation tool, Grants:Project/Rapid/SD0001/Twinkle localisation, etc. These tools will help make Wikipedia management more convenient, help editors easily check the status of articles, and thereby contribute to the development of the Wikimedia movement.
Sincerely, Plantaest (talk) 15:55, 21 February 2024 (UTC)Reply

Your grant application has been approved

edit

Hello @Plantaest,

Thank you for your comprehensive response to the follow-ups questions.

Congratulations! Your grant application has been approved in the amount of VND 103,500,000 VND from 1 April 2024 to 30 June 2024.

Let’s continue having regular conversations over the course of your grant implementation. Please let me know if you require support in any way or would like to share your experiences with a wider community through the Let's Connect Programme or at ESEAP community meetings.

The reporting requirements and templates for the grant can be found here. Timelines for reporting can be found in your grant agreement or on Fluxx. All reports are to be completed and submitted via Fluxx.


We thank you for your participation in the grant application process and hope to continue to journey with you as you embark on this project.

Regards, Jacqueline JChen (WMF) (talk) 01:16, 22 February 2024 (UTC)Reply

Thank you Jacqueline and the Committee. I have followed the instructions in the email on the Fluxx platform. If anything is incorrect, please let me know so I can adjust. Sincerely, Plantaest (talk) 09:23, 23 February 2024 (UTC)Reply
Hello @Plantaest,
Hope you are well.
We would like to circle back on the documentation of this project and would like to explore how you plan to make the learnings and metrics on usage from this project more widely available. Do you have any thoughts on this?
Thank you.
Regards,
Jacqueline JChen (WMF) (talk) 04:05, 4 March 2024 (UTC)Reply
Hello Jacqueline, I apologize for the delayed response, as I have had some issues in real life recently. Regarding this question, I have expressed some thoughts in previous responses. As for widespread dissemination of the knowledge of this project, I think this is a great opportunity for the community to better understand how to create a typical tool to serve the operational processes of the wikis. I believe I can share an article on how I carried out this project, which would be valuable knowledge for others passionate about creating similar tools to meet community needs. I am also considering participating in an online or in-person conference to showcase my solutions. Due to privacy concerns, I am currently not interested in this, but I may change my mind in the coming years.
Regarding the issue of "usage metrics" for the project, I understand that you need this information to determine if the tool I am creating is useful or not. Evaluating this is challenging at the initial stage when the tool is new. Usually, after some time, around six months, the new metrics gradually become meaningful, and at this point, the data can be objectively analyzed. In general, we can measure traffic, queries, error occurrences, etc. These metrics are essential for improving the tool to achieve a better version. However, this can pose challenges in data collection, requiring careful consideration and discussion. I can implement this but will need to consider how to collect the data reasonably, perhaps by seeking users' opinions through a pop-up, for example. Additionally, in a previous response, I also suggested conducting surveys, but this method is not as proactive as automated data collection and is quite labor-intensive.
These are some of my thoughts. I always hope that my product can reach many users. If there is any way to accomplish this, I will consider implementing it to achieve this goal.
Sincerely, Plantaest (talk) 15:01, 6 March 2024 (UTC)Reply
Hello @Plantaest,
Thank you for sharing, would any of these be useful avenues to document your work in some ways: Wikimedia tool forge, Github, Mediawiki? These are widely used by contributors of the movement.
Alternatively, you can also create a meta page documenting the tools and progress that you have made and linking it to your userpage?
What do you think?
Thank you.
Regards,
Jacqueline JChen (WMF) (talk) 21:53, 6 March 2024 (UTC)Reply
Hello Jacqueline,
The pages you've sent are suitable for documenting, but based on my experience, they might receive fewer visits compared to English Wikipedia, as I've verified with the en:User:Plantaest/TwinkleMobile tool. Therefore, I will write the documentation on English Wikipedia. The English Wikipedia community is vibrant and extensive, making dissemination easier compared to other platforms.
In your opinion, I hope we can agree that some documentation I need to write includes the following:
  • A page to track the progress of the application development. I'm not entirely sure about its format, but I think we can refer to this page: vi:Thảo luận Thành viên:Plantaest/Zinnia; this is a discussion page documenting the progress of another open-source tool of mine, although it's currently on hold for some reasons. Every one or two weeks (the average time of one sprint in the Scrum methodology), I will post project progress information here, to show what has been accomplished during that time.
  • A page to document my experiences during the application development process. I've proposed this page in previous response. In general, anything useful in building this tool I will share here, such as deploying back-end/front-end to Toolforge, connecting Translatewiki, for example. This page can be divided into several smaller pages for each issue.
I think that's okay. What do you think? Feel free to suggest, as this is my first grant, and I welcome constructive feedback to improve grant writing skills. Thank you.
Best regards, Plantaest (talk) 00:47, 7 March 2024 (UTC)Reply
Hello @Plantaest,
Thank you for taking time to think through the documentation process.
We really appreciate your thought partnership in this.
Yes, your suggestions work well.
Thank you for your quick response.
Good luck for the project implementation and keep in close contact.
Regards,
Jacqueline JChen (WMF) (talk) 01:07, 7 March 2024 (UTC)Reply

Report

edit

Dear Jacqueline,

I apologize for bothering you. I hope it's okay if I ask a question. I've just submitted the report for this grant on Fluxx, and CR-FluxxBot may post it here tomorrow: Grants:Programs/Wikimedia Community Fund/Rapid Fund/Comprehensive link checking tool (ID: 22451570)/Final Report. Is there anything else I need to do?

Regarding my recent second grant application, which was rejected (Comprehensive anti-spam external link service, the Citron project), I have requested a reconsideration. However, upon reviewing the timeline on the Grants:Project/Rapid page, I noticed that July 1st was the deadline for decision notifications. Therefore, I believe this second grant opportunity has expired for this round. I will resubmit it for the next round and arrange my schedule to implement it by the end of the year if it is approved.

Thank you for your support during this time.

Best regards, Plantaest (talk) 02:13, 2 July 2024 (UTC)Reply

Return to "Programs/Wikimedia Community Fund/Rapid Fund/Comprehensive link checking tool (ID: 22451570)" page.