Grants talk:IEG/WikiBrainTools
Some algorithmically intensive tools that already exist
editThis is a neat idea! Here are a few tools that already use rich algorithms and might be helpful to look into/talk to their developers:
- ClueBot (anti-vandalism): https://en.wikipedia.org/wiki/User:ClueBot_NG
- STiki (anti-vandalism/spam): https://en.wikipedia.org/wiki/Wikipedia:STiki
- CorenSearchBot (anti-plagiarism): https://en.wikipedia.org/wiki/User:CorenSearchBot
Useful people to talk to:
- Coren: https://en.wikipedia.org/wiki/User:Coren
- Stu Geiger: https://en.wikipedia.org/wiki/User:Staeiou
- Aaron Halfaker: https://en.wikipedia.org/wiki/User:EpochFail
Useful places to seek feedback and post notifications:
- Bot Approvals Group: https://en.wikipedia.org/wiki/Wikipedia:Bot_Approvals_Group
- Research Committee: https://meta.wikimedia.org/wiki/Research:Committee
- Research Mailing list: https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Looking forward to hearing more about your idea! Cheers, Jake Ocaasi (talk) 17:16, 25 September 2014 (UTC)
- Thanks for the suggestions, User:Ocaasi! I've already made some of the content-based changes you suggested (e.g. useful mailing lists to tap). I've also been in touch with User:EpochFail. Once the feedback period is open on Wed, I'll email the remaining people to see what kinds of improvements they'd suggest. Shilad (talk) 19:31, 28 September 2014 (UTC)
Finalize your proposal this week!
editHi Shilad and Brenthect. Thanks for drafting this proposal!
- We're hosting one last IEG proposal help session in Google Hangouts this weekend, so please join us if you'd like to get some last-minute help or feedback as you finalize your submission.
- Once you're ready to submit it for review, please update its status (in your page's Probox markup) from DRAFT to PROPOSED, as the deadline is September 30th.
- If you have any questions at all, feel free to contact me (IEG committee member) or Siko (IEG program head), or just post a note on this talk page and we'll see it.
Cheers, Ocaasi (talk) 20:04, 25 September 2014 (UTC)
Promoting WikiBrain?
edit@Shilad and Brenthecht: Hey there. Very pleased to read this as someone who reads up on Wikipedia-related research. I'm not surprised to hear many researchers stray away from research in the area due to the interface-related obstacles they face that prevent data extraction. As this is one of the main problems you identify, though, I wanted to ask what this team might consider doing to inform and attract potential researchers to this new treasure trove of algorithms. Presumably (but correct me if I am wrong), many in the WikiTools community are already familiar with extracting data to inform their work. I get that this proposal would make their jobs easier and would open up new research avenues, but how will you be reaching out beyond the WikiTools community? I JethroBT (talk) 22:08, 29 September 2014 (UTC)
- Good question! We've already taken some first steps to promote WikiBrain to algorithmic researchers. We've published a paper describing WikiBrain, and several other papers that use WikiBrain and refer algorithmic and community researchers of Wikipedia to WikiBrain. We have begun to make some inroads, and have received algorithmic contributions from some other research groups. In addition, this grant would also support traveling to two major algorithmic conferences (SIGIR and WWW), where we would present demo posters and organize "Birds of a Feather" sessions. I'd also be interested to hear any other ideas you have! Shilad (talk) 03:18, 30 September 2014 (UTC)
Suggestions
editHere are a couple of suggestions based on a short poke around the website(s)
- It seems like WikiBrain is entirely based on the wikipedia dumps. If it is it needs to be made clear that data no tin the wikidumps is not accessible via WikiBrain.
- It seems like WikiBrain relies on downloads of the wikipedia files, which are huge downloads. The pitfalls of this need to be made clear.
- https://github.com/shilad/wikibrain has contributions from 13 contributors, which is better than I expected.
- It seems to me that to make non-trivial use of WikiBrain, an intricate java development environment needs to be installed, this needs to be made clearer.
- Is there a continuous integration server? That seems like the kind of thing that would be very useful
- The mailing list needs to show active use. You may need to encourage your co-located devs to switch to communicating via it.
- The beginners example at https://shilad.github.io/wikibrain/# links to https://github.com/shilad/wikibrain/blob/master/wikibrain-cookbook/src/main/java/org/wikibrain/phrases/cookbook/ResolveExample.java which is 404
- I STRONGLY recommend that you move from a co-located team to a geographically diverse team.
cheers Stuartyeates (talk) 01:41, 3 October 2014 (UTC)
- Stuartyeates, Thanks for all the great feedback! I want to follow up on a few of user suggestions.
- WikiBrain installation needs: WikiBrain makes use of a few other data sources (page view data, Natural Earth GIS data, several public NLP datasets), but you are correct that it primarily uses WikiDumps. One of the primary goals of this project is to eliminate the need for tool developers to install WikiBrain at all. We would install WikiBrain on Wikimedia Labs, preprocess the data, and provide a web API for bots and researchers. I think this point should address your first few concerns.
- The primary issue with reliance on WikiDumps is not size (as you point out that can be overcome) but the fact that not all information is there so some questions can't be answered. Defining the data available defines the nature of the research that can be undertaken and helps researchers identify early whether the project is right for them. (deleted articles? old versions of live articles? user edit traces? etc) Stuartyeates (talk)
- Integration tests: At the moment, we do have a continuous unit test server (Travis CI), but not an integration test server. I have a short term (next month) goal to revive our integration tests.
- The need to test on versions of java and java libraries on the Wikimedia servers as well as defaults on other servers is very important for getting things to 'just work' for a large group of people. Stuartyeates (talk) 07:57, 3 October 2014 (UTC)
- Mailing list: Totally agreed! I'll use your suggestion as a catalyst to encourage this change.
- 404: Thanks for the tip. Looks like the link didn't survive a recent refactoring. I've now fixed it.
- Geographically diverse team: YES! Are you volunteering? :) I'm only partially kidding. I do hope that a side-effect of the engagement plan for this IEG is to build a broader coalition of developers. I understand we'll need to be better about communication patterns to make this work (e.g. the mailing list).
- Maybe, but I'm still genuinely confused as to whether the data I'm interested in is available via WikiBrain. I'm interested in tracking the growth of individual articles over time and the long-term edit patterns of users ('edit traces'). Stuartyeates (talk) 08:53, 3 October 2014 (UTC)
- WikiBrain installation needs: WikiBrain makes use of a few other data sources (page view data, Natural Earth GIS data, several public NLP datasets), but you are correct that it primarily uses WikiDumps. One of the primary goals of this project is to eliminate the need for tool developers to install WikiBrain at all. We would install WikiBrain on Wikimedia Labs, preprocess the data, and provide a web API for bots and researchers. I think this point should address your first few concerns.
Community engagement
editI don't understand if you have or not discussed it within the community. You stress that the community engagement is crucial, but the main action seems to be done. --Ilario (talk) 21:20, 10 October 2014 (UTC)
- Hi Ilario. Thanks for the question. Which community are you referring to? The research community, tools developers, or both? Shilad (talk) 03:40, 16 October 2014 (UTC)
Eligibility confirmed, round 2 2014
editThis Individual Engagement Grant proposal is under review!
We've confirmed your proposal is eligible for round 2 2014 review. Please feel free to ask questions and make changes to this proposal as discussions continue during this community comments period.
The committee's formal review for round 2 2014 begins on 21 October 2014, and grants will be announced in December. See the schedule for more details.
Jtud (WMF) (talk) 17:17, 7 October 2014 (UTC)
Other similar projects
editHow does this relate to other projects doing similar things like ContentMine and Spa. I think it would be important that this proposal finds a way of working with them instead of duplicating the work. Both of them are open source/open science projects.
- Greetings! Those both look like great projects. At a high-level they seem like projects that are more closely related to Wikidata and Wikidata Toolkit. While WikiBrain is open source / open science, it is closely tied to Wikipedia as a knowledge base. It would be fun to integrate those other projects as knowledge bases for WikiBrain, but that seems beyond the scope of this IEG. Shilad (talk) 03:50, 16 October 2014 (UTC)
The proposal mentions some people you'll contact. I have some more, please add to the lists on openhub (ex ohloh) when you find any missing one:
- https://www.openhub.net/p/mediawiki-webtools
- https://www.openhub.net/p/mediawiki-scripts
- https://www.openhub.net/p/wikibots
- https://www.openhub.net/p/mediawiki-clients
Additionally, any tool particularly relevant for research should be documented on wikipapers, including yours.
--Nemo 15:08, 17 October 2014 (UTC)
- Nemo, Thanks for your suggestions. If the proposal is funded, we'll definitely contact these constituencies. Shilad (talk) 20:45, 21 October 2014 (UTC)
Conference travels
edit@Shilad: I'm not sure if the three conference travels, especially SIGIR and WWW, are an effective way to promote the tool. SIGIR and WWW are both big and busy conferences, where many participants may be busy with catching up what is hot in their own domain of expertise. Do you have a rough estimation of the number of people you can directly reach out? Also, would there be a need to pass peer review to present a demo? [1] If it is pending to the results of the reviews, you might want to mention that in the proposal. whym (talk) 13:34, 1 November 2014 (UTC)
- Whym, Thanks for your question. It's a good one that we did think carefully about. Both Brent and I are active members of the WWW and SIGIR communities. Between the two of us, we've first-authored papers at, peer reviewed for, and attended both conferences. You are right that these are big conferences with lots going on. We considered (and will pursue) several other options for engaging algorithmic researchers. However, in-person engagement is a tremendously efficient way to increase adoption of WikiBrain. We saw this at WikiSym 2014, where we encouraged several key researchers of the Wikipedia community to became WikiBrain promoters. SIGIR and WWW would provide a venue to similarly engage algorithmic NLP, IR, and GIS researchers --- a key constituency for WikiBrain's success. The people who can convince their industry and research teams to use and participate in WikiBrain will be at these conferences. We want to be there to persuade them to do so!
- You are correct that the project would have to pass peer review to be officially presented at these conferences, and this does present a small risk. However, we do not perceive this to be a major problem for several reasons. First, we have been quite successful at these and related conferences in the past, and there are several tracks at both of these conferences (particularly demo sessions) that have reasonably high acceptance rates. Second, most of these conference now support "birds of a feather" (BoF) sessions. A session about Wikipedia-processing organized and promoted by us would be a tremendous format for promoting the library. Third, inclusion of acceptance-contingent conference funding is standard in computer science (at least in the United States). The typical approach here is that if something does not work out on the first try, researchers present/demo their work at the next relevant conference (e.g. for us, that might be AAAI, ICML, KDD or IJCAI). Shilad (talk) 19:28, 2 November 2014 (UTC)
- @Shilad: Thank you for answering my blatant question. I'm mostly convinced of that the face-to-face approach could be irreplaceable by other means, but $3,000 and 5 days for reaching out (let's say) a dozen still sounds a bit expensive in the context of IEG. Would it be possible to share the attendance (and thus the expenditure, either literally or as a barter) in part with other research projects you might have in a related topic? whym (talk) 13:57, 4 November 2014 (UTC)
- @Whym: Small note: I'd sincerely hope that we would reach far more than a dozen researchers at both those conferences. Big note: I was trying to think of a cautious way of saying this, but I couldn't so I'll be blunt. As a professor at a small liberal arts college, my travel budget is severely limited. In fact, my yearly travel budget pays for about one half of one of those conferences. Sadly, this means I need to reserve my travel funds for conference appearances where I am first author on a full (non-demo) paper. It's possible that SIGIR or WWW will be those conferences (I plan on submitting full papers to both), but the main-tracks are very selective. So I can't really say yet.Shilad (talk) 01:02, 6 November 2014 (UTC)
- @Shilad: Thank you for your reply. I believe even the uncertain possibility of sharing attendance is a positive piece of information, as we always look for efficient ways to use the donated money to the Wikimedia movement. That said, I now feel that I went a bit too far (with a not well-grounded number estimate I did late at night, as you pointed out), and that I could have tried other channels including suggesting WMF staff to discuss it with you in their due diligence check - my apologies for this. whym (talk) 08:55, 6 November 2014 (UTC)
- @Whym: No worries! These are all good questions. I'm grateful that the proposal is being considered.Shilad (talk) 14:55, 6 November 2014 (UTC)
- @Shilad: Thank you for your reply. I believe even the uncertain possibility of sharing attendance is a positive piece of information, as we always look for efficient ways to use the donated money to the Wikimedia movement. That said, I now feel that I went a bit too far (with a not well-grounded number estimate I did late at night, as you pointed out), and that I could have tried other channels including suggesting WMF staff to discuss it with you in their due diligence check - my apologies for this. whym (talk) 08:55, 6 November 2014 (UTC)
- @Whym: Small note: I'd sincerely hope that we would reach far more than a dozen researchers at both those conferences. Big note: I was trying to think of a cautious way of saying this, but I couldn't so I'll be blunt. As a professor at a small liberal arts college, my travel budget is severely limited. In fact, my yearly travel budget pays for about one half of one of those conferences. Sadly, this means I need to reserve my travel funds for conference appearances where I am first author on a full (non-demo) paper. It's possible that SIGIR or WWW will be those conferences (I plan on submitting full papers to both), but the main-tracks are very selective. So I can't really say yet.Shilad (talk) 01:02, 6 November 2014 (UTC)
- @Shilad: Thank you for answering my blatant question. I'm mostly convinced of that the face-to-face approach could be irreplaceable by other means, but $3,000 and 5 days for reaching out (let's say) a dozen still sounds a bit expensive in the context of IEG. Would it be possible to share the attendance (and thus the expenditure, either literally or as a barter) in part with other research projects you might have in a related topic? whym (talk) 13:57, 4 November 2014 (UTC)
Aggregated feedback from the committee for WikiBrainTools
editScoring criteria (see the rubric for background) | Score 1=weak alignment 10=strong alignment |
(A) Impact potential
|
7.9 |
(B) Innovation and learning
|
7.7 |
(C) Ability to execute
|
7.3 |
(D) Community engagement
|
7.9 |
Comments from the committee:
|
Thank you for submitting this proposal. The committee is now deliberating based on these scoring results, and WMF is proceeding with its due-diligence. You are welcome to continue making updates to your proposal pages during this period. Funding decisions will be announced by early December. — ΛΧΣ21 16:56, 13 November 2014 (UTC)
Round 2 2014 decision
editCongratulations! Your proposal has been selected for an Individual Engagement Grant.
The committee has recommended this proposal and WMF has approved funding for the full amount of your request, $29500
Comments regarding this decision:
We are thrilled that the research community wants to feed its formulas back into Wikipedia. Really looking forward to seeing good outcomes from your conference presentations too, noting that the disbursement of travel funds depends on paper acceptance or other concrete presentation plans.
Next steps:
- You will be contacted to sign a grant agreement and setup a monthly check-in schedule.
- Review the information for grantees.
- Use the new buttons on your original proposal to create your project pages.
- Start work on your project!