Project summary
edit- Project Name
- Adding support of DBLP and OpenCitations to Wikidata
- Start/End dates
- 1 December 2020 - 30 April 2021
- Amount requested (and the currency you wish to receive it in)
- 11130.79 TND
- Amount requested (in US$ equivalent)
- 4000 USD
The people
edit- Contact person name/Wikimedia username
- Mohamed Ali Hadj Taieb (User:Mohamedalihaj)
- Contact person e-mail address
- mohamedali.hajtaieb fss.usf.tn
- Organisation (optional)
- University of Sfax, Tunisia
- Project participants
- Who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.
- Houcemeddine Turki, Research Assistant, University of Sfax, Tunisia
- Research scientist in Library and Information Science with publications in Scientometrics and other venues.
- A long-term Wikimedian familiar with Wikidata API and interface (User:Csisc).
- Mohamed Ali Hadj Taieb, Assistant Professor, University of Sfax, Tunisia
- Research scientist in Semantic Technologies and Natural Language Processing with publications in Engineering Applications of Artificial Intelligence and other venues.
- Experience in conducting a research project related to the use of wikis for the construction of semantic resources.
- Mohamed Ben Aouicha, Associate Professor, University of Sfax, Tunisia
- Research scientist in Semantic Technologies and Natural Language Processing with publications in Engineering Applications of Artificial Intelligence and other venues.
- Experience in conducting a research project related to the use of wikis for the construction of semantic resources.
The project
editDescription
editDescribe the project or event.
The project aims to create two bots to mass import bibliographic information released in DBLP and OpenCitations under CC0 License to Wikidata:
- DBLP: a computer science bibliography website launched in 1993 at the University of Trier, Germany. It is currently the most complete bibliographic database for computer science research. Its author disambiguation methods are robust as shown at https://link.springer.com/article/10.1007/s11192-018-2824-5 and can be reliably used to add full coverage of computer scientists in Wikidata.
- OpenCitations: an open science project trying to publish free bibliographic citation information in RDF. It is run by Infrastructure Services for Open Access (IS4OA), a non-profit charitable company founded in 2012 in the United Kingdom and founded by open access advocates Caroline Sutton and Alma Swan.
The project will make use of deep learning algorithms to generate new knowledge (Research Topics, Affiliations) from the extracted ones and consequently to further enrich Wikidata with bibliographic information. When the project will be finished, the bots will continue to work for years to regularly curate and update scholarly information in Wikidata.
Motivation
editWhy is this project needed? What will it solve or improve?
Currently, Wikidata lacks full coverage of scholarly citations and computer science publications giving a distorted mirror of worldwide research productivity and quality. This task will enrich the Wikidata citation graph and significantly ameliorate the coverage of computer science research scientists, conferences and journals in Wikidata.
Activities
editTell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project?
- Development of OpenCitations Bot to enrich Wikidata with bibliographic information and citations of publications from OpenCitations (one month)
- Development of DBLP bot to enrich Wikidata with bibliographic information about scientists, venues and journals (two months)
- Applying for bot flags (one month)
- Running the bots on a server (one month)
Measures of success
editWhat are criteria you will define success for your project, and how do you intend to measure for them? What are your targets for these measurements?
- Edit count for the two bots > 100000 edits
- Significant increase in the number of citations per paper (As returned in https://scholia.toolforge.org/)
- Significant increase in the number of publications (As returned in https://scholia.toolforge.org/)
Community
editWho is your target audience for this project, How will you engage the community you’re aiming to serve at various points during your project?
The target audience is the WikiCite Community and the Wikidata Community. I am willing to engage the community by:
- Inviting them to go through the source codes of the two bots implemented by Python using the mailing lists or the Telegram channels of WikiCite, LD4, Wikimedia and Libraries User Group, and Wikidata.
- Inviting them to provide comments on the two bot flag requests in Wikidata.
The Budget
editHow you will use the funds you are requesting? List bullet points for each expense. (You can create a table or link to a separate (public) document if needed.
The items that are put here will be used for years to regularly import DBLP and OpenCitations to Wikidata. They will not be only used for the five months of the project:
- High-Performance Computer with GPU and CPU: 3500 USD
- Internet Connection: 500 USD
The High-Performance Computer will be hosted in Faculty of Sciences of Sfax, a public scholarly institution in Tunisia. It will be used by a large team of scientists to develop Wikimedia-related applications including the two bots. It will not be a personal property of any member.
COVID risk assessment (for in-person events)
editIf the project is for an in-person event, you must complete the risk assessment tool and checklist, and provide a link to copies of these documents here. Events must not include any international travel, and must follow all applicable local health guidelines.
The project is a bot development initiative for WikiCite project. All activities are remote and no in-person event will be organized for the work.
Feedback
editCommunity notification
editYou are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc.
Please provide links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions.
Endorsements
editOptional: Community members are encouraged to endorse your proposal and leave a rationale here.
- Sounds like a good plan to me, would be nice to get a more complete database of authors and works and to improve identifiers for existing items. Iwan Aucamp (talk) 17:20, 25 September 2020 (UTC)
- Support DBLP is the best open source on the web of CS publications. This looks like a valuable contribution! Jodi.a.schneider (talk) 20:08, 25 September 2020 (UTC)
- Support Absolutely endorse this. Those two databases are essential to the value of the citation graph in Wikidata and will allow accurate tracking of the flow of citations between publications. Even though initially just computer science works, I hope that this lays further groundwork and protocols for other fields to be better represented.
Oppose The mission is good but I need more detail before I would endorse such a project. See my questions on the talk page.Support conditioned on the concerns regarding wikidata handling the data volume being resolved. BrokenSegue 15:07, 26 September 2020 (UTC)
- User:BrokenSegue: See discussion for answers. --Csisc (talk) 15:54, 26 September 2020 (UTC)
- Support I strongly encourage these kind of initiatives. Being actively involved in the | research of innovative uses of bibliographic open data I definitely endorse it. Please, eventually, consider also adding the | Semantic Scholar Open Research Corpus to the harvested dataset. ALoopingIcon (talk) 07:33, 27 September 2020 (UTC)
- Support Absolutely endorse this initiative.--Alessandra Boccone (talk) 10:10, 29 September 2020 (UTC)
- Support. Valuable contributions to the scholarly citations Aliaretiree (talk) 02:29, 30 September 2020 (UTC)
- Support This project promises both to make a material contribution and to provide a model for disciplines beyond computer science. Clifford Anderson (talk) 02:36, 30 September 2020 (UTC)
- Support Orly Simon (talk) 04:33, 30 September 2020 (UTC)
Questions
editAny questions about this proposal and feedback from reviewers should be placed on the associated discussion page.
Report
edit
- Status
- closed