Grants talk:Project/DBpedia/CrossWikiFact/Archive 1

September 26 Proposal Deadline: Reminder to change status to 'proposed'

As posted on the Project Grants startpage, the deadline for submissions this round is September 26, 2017. To submit your proposal, you must (1) complete the proposal entirely, filling in all empty fields, and (2) change the status from "draft" to "proposed." As soon as you’re ready, you should begin to invite any communities affected by your project to provide feedback on your proposal talkpage.

Warm regards,
--Marti (WMF) (talk) 04:40, 26 September 2017 (UTC)


Eligibility confirmed, round 2 2017

 
This Project Grants proposal is under review!

We've confirmed your proposal is eligible for round 2 2017 review. Please feel free to ask questions and make changes to this proposal as discussions continue during the community comments period, through 17 October 2017.

The committee's formal review for round 2 2017 begins on 18 October 2017, and grants will be announced 1 December. See the schedule for more details.

Questions? Contact us.

--Marti (WMF) (talk) 21:50, 3 October 2017 (UTC)

Questions and concerns form Ruslik0

I have a few questions for you:

  • Can you give a short overview of the present state of data (and references) mining of Wkipedias, of WWW and of other data bases for the purpose of their inclusion into WikkData? I mean such projects as Librarybase, WikiFactMine, StrepHit and may be some others. In what way your proposal is different from them? What additional benefits will it provide?
  • You mentioned Primary Source Tool but is it in a usable state now? I remember it had problems with usability. Is it widely used now?
  • Community outreach appears to be quite limited. You have basically reached only to the Wikidata community. The wider Wikipedia community probably knows nothing about your website. So, there is a real risk that that only few editors will use it.

Ruslik (talk) 17:53, 28 October 2017 (UTC)

Hi @Ruslik0:, sorry to only answer now, I had this page on watch, but seemed to have overlooked the notifications.

  • An overview of which data is available can be found here: http://wiki.dbpedia.org/downloads-2016-10 . We have been extracting the Infobox information from all larger Wikipedias for over 10 years now. The software for this is here https://github.com/dbpedia/extraction-framework/ and we have around 200 editors who write refinement and parsing rules since all Wikipedia language versions are different: http://mappings.dbpedia.org/ . Since around 2 years we are working on mining the references and the text as such from Wikipedia as well. Here is around 2.5GB of citation data for the en wikipedia: http://downloads.dbpedia.org/2016-10/core-i18n/en/ . Compared to LibraryBase/WikiFactMine/StrepHit, we have a very different foundation. Wikidata would greatly benefit from having the info and references transferred from Wikipedia (and vice versa), where they were already added by humans and validated. Yes, there is a technical part to it, but the most important thing is to aggregate information, so humans can make an informed decision on which reference to trust and which fact to choose among the different variants. So our first focus is to unlock info from Wikipedia (facts and references). Later we can also integrate approaches like StrepHit and WikiFactMine.
  • We are currently not fixated on the exact approach on how to aid editors to find and fix. We are aware that an external website is not ideal. Speaking from a technical perspective, it is quite easy to see how to break it down into components or microservices and then feeding this into UserScripts or the VisualEditor. So our strategy would be like this: Rapid prototyping of the website to have a concrete basis for discussion with editors on how to most effectively make edits. This takes a lot of feedback. You are right, that we didn't discuss the proposal wikipedia. We were discussing about posting it on the 10 largest Wikipedias, but then the deadline was close. We will definitely do this. In hindsight, we could have discussed it on En Wiki at least. Also discussing usability is a big part of the work in software engineering. So it also would have been like doing the work the proposal is supposed to fund. Overall, we are aware of the not ideal state of an external website. So if anybody has a better idea how to more tightly integrate it and make it more accessible, we would definitely switch this. The project is calculated to include some unforeseen ideas. After all, we are using agile processes for development.

SebastianHellmann (talk) 00:49, 30 November 2017 (UTC)

The Primary Sources Tool is currently under development. It demands an individual setup and is not integrated in the default Wikidata view. This might also be the reason it is not very widely used. Some of the tools statistics are available here [[1]]. Such a tool is not only required by the project proposed here, but also strongly needed by the Wikidata community for various other datasets. Hence, we trust in a near future availability. We are also in contact with Marco Fossati, the main developer, and will be happy to contribute to and give feedback on the tools future development.

Magnus (talk) 09:49, 1 December 2017 (UTC)

Aggregated feedback from the committee for DBpedia/CrossWikiFact

Scoring rubric Score
(A) Impact potential
  • Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both?
  • Does it have the potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
7.8
(B) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
7.4
(C) Ability to execute
  • Can the scope be accomplished in the proposed timeframe?
  • Is the budget realistic/efficient ?
  • Do the participants have the necessary skills/experience?
6.4
(D) Measures of success
  • Are there both quantitative and qualitative measures of success?
  • Are they realistic?
  • Can they be measured?
5.6
Additional comments from the Committee:
  • Interesting the feeding of Wikidata. The project follows the strategy.
  • The proposal fits with Wikimedia's strategic priorities and has a great potential for online impact. However it sustainability and scalability are less clear because the proposed website may quickly fall into irrelevance.
  • The project has potential for online impact on Wikidata with several statements now on infoboxes moving into Wikidata. It is important to ensure code hygiene to be able to adapt and modify it later on.
  • There is a really good potential for online impact. DBpedia already have tools to extract (and compare!) data from infoboxes on different wikis, a logical next step is to analyse this data and work on adding it to Wikidata and fixing errors in local wikis. Looks like DBpedia association secures sustainability of the output.
  • Innovative but some concerns to feed Wikidata from DBPedia
  • The project looks innovative. Its potential impact is high but there some risks. The main risk is that few people will use their website. The success can be measured though measures of success can be improved,
  • The approach is innovative, and benefits seem to outweigh the risks.
  • The solution is innovative, but it is well-planned and impact relative to investment is high. I think that having just 500 people use the website per month is a bit low for a measure of success, worth checking with corresponding wmflabs tools like Mix'n'match (I don't know their stats either)
  • The project can be accomplished in 12 months and the budget appears to be realistic. The participants probably have necessary skills.
  • I am unable to determine if the scope can be accomplished on time, since there can be several unseen obstacles in the course of development. It is always good to stretch the project a little longer and write a better code, than to finish a poor code on time. Extra funding may be allotted in case the project needs more time to complete.
  • They seem to have relevant experience and to be able to accomplish the project. The plan is good, the budget is reasonable.
  • The community engagement appears to be limited - basically only the Wikidata community.
  • Community support does not appear to be high enough. Most users who supported the project seem to be new users or inactive users. For a project of this kind, it is important to seek consensus on Wikidata. It is unsure if the project leader has contacted the WMF staff working on Wikidata to know their take on this.
  • I am not a big fan of funding external tools or projects, and this is one. It has rather low community engagement given the size of Wikidata community, so I really expect this will be a powerful tool with a strategy of broader involvement of Wikidata community.
  • Too much expensive in my opinion to give a full support at the moment. In my opinion a more detailed analysis and feedback from the tech team would be required.
  • Lately there have been a number of proposals related to data mining for the purpose of adding the new data to Wikidata. So, there may be some duplication between them. This is not necessary bad thing but this proposal still needs a review by the Wikidata developer community to be sure that they do something that is really necessary. I am also concerned with lack of a response to my talk page questions.
  • I support this project after hearing what WMF staffers working on Wikidata are thinking about this project.
  • The entire project is reasonable and impactful, there is nothing worth cutting, hence full funding.
 

This proposal has been recommended for due diligence review.

The Project Grants Committee has conducted a preliminary assessment of your proposal and recommended it for due diligence review. This means that a majority of the committee reviewers favorably assessed this proposal and have requested further investigation by Wikimedia Foundation staff.


Next steps:

  1. Aggregated committee comments from the committee are posted above. Note that these comments may vary, or even contradict each other, since they reflect the conclusions of multiple individual committee members who independently reviewed this proposal. We recommend that you review all the feedback and post any responses, clarifications or questions on this talk page.
  2. Following due diligence review, a final funding decision will be announced on Thursday, May 27, 2021.
Questions? Contact us at projectgrants   wikimedia  · org.


Answer to selected feedback points

First of all, thank you for the feedback, very helpful!

Innovative but some concerns to feed Wikidata from DBPedia
  • DBpedia is only middleware here. DBpedia tries to capture the information in Wikipedia as good as possible. We have been refining the process for 10 years now. Text, references and facts have been curated by editors manually in Wikipedia and are quality-controlled, but are currently not easily accessible for Wikidata. There is a certain degree of loss in extraction, but DBpedia is state-of-the-art here.
The project looks innovative. Its potential impact is high but there some risks. The main risk is that few people will use their website. The success can be measured though measures of success can be improved,
  • The website is easy to prototype and experiment in the early stages. It is just the tip of the iceberg, the underlying data analysis can be exposed easily via web services and used in other tools (Gadgets, VisualEditor, etc.). We did not write too much about it as it is very technical. From our experience, any inclusion into the main software and adoption needs several months of discussion to build consensus and find the best way to engineer it. (Magnus, Julia and me will also be involved in this discussion, although our work time is not listed on the budget). Later the website might be hosted in WMFlabs. Not sure about the data analytics. This would be a huge commitment for WMF.
I think that having just 500 people use the website per month is a bit low for a measure of success, worth checking with corresponding wmflabs tools like Mix'n'match (I don't know their stats either)
  • Our goal is to build a power tool, which aggregates information and let's editors make an informed decision: proper ranking (a simplified ranking is already implemented in the overview page), what articles have the most issues/errors to guide users where to edit best, fast and easy comparison and reference evaluation. If we manage to speed up the time for editors to decide on which fact is the best to 20-30 seconds including the edit, then 500 users can insert or correct 60,000 facts in one hour of editing.
The project can be accomplished in 12 months and the budget appears to be realistic. The participants probably have necessary skills.
  • KILT group and DBpedia Association in Leipzig has 15 (mainly) data science researchers including students. It is embedded on the one hand in AKSW (http://aksw.org/Team.html, 50 PhD Students) and the wider DBpedia community on the other hand. DBpedia Association maintains relations to all major Semantic Web research groups, companies and also AI labs worldwide.
It is unsure if the project leader has contacted the WMF staff working on Wikidata to know their take on this.
I support this project after hearing what WMF staffers working on Wikidata are thinking about this project.
  • We had quite a lot of discussion over the last years with Wikidata staff, this includes Lydia Pintscher, Daniel Kinzler, Anja Jentzsch, Denny Vrandečić and recently in October in SF: Stas Malyshev, John Vandenberg, Dario Taraborelli and Amanda Bittaker. As far as we understand, WMF staff is not supposed to directly influence grant decisions, which is why we didn't ask for their recommendations.
The community engagement appears to be limited - basically only the Wikidata community.
Community support does not appear to be high enough. Most users who supported the project seem to be new users or inactive users. For a project of this kind, it is important to seek consensus on Wikidata. 
I am not a big fan of funding external tools or projects, and this is one. It has rather low community engagement given the size of Wikidata community, so I really expect this will be a powerful tool with a strategy of broader involvement of Wikidata community.
  • We are aware of these issues, however, we see the proposal as a potential to build a bridge between the communities. DBpedia's community is very much focused on data technology and we are living in this technology bubble in co-existence. This project will give us the chance to more actively work in the direction of Wikipedia / Wikidata and its community and foster understanding. Naturally, the Wikidata community is easier to approach and understand, but we will outreach to Wikipedia as well (probably with a better prototype to have a more targeted discussion). Furthermore, DBpedia can outreach via its 20 language chapters, where people have good contacts to Wikipedians.
Return to "Project/DBpedia/CrossWikiFact/Archive 1" page.