Grants:Project/DBpedia/GlobalFactSyncRE/Timeline


Timeline for DBpediaEdit

Timeline Date
Study (choose two initial sync targets and analyse the lack of references in Wikidata) Day Month Year
GlobalFactSync tool (extend the current prototype with new features) Day Month Year
Mapping Refinements Day Month Year
GlobalFactSync WikiData ingest Day Month Year
GlobalFactSync Sprints Day Month Year


Monthly updatesEdit

Please prepare a brief project update each month, in a format of your choice, to share progress and learnings with the community along the way. Submit the link below as you complete each update.

Current tasksEdit

A log of current tasks is kept here. Ongoing discussions should be held using the corresponding discussion page.

(Preparation) April/MayEdit

June 2019 (official start)Edit

July 2019Edit

First Release Report: A first release containing detailed information about our micro-services is published on the DBpedia Blog

Containing:

  • First success story
  • Deployment of first micro-services on the server
  1. Initial User Interface here
  2. PreFusion JSON API here (user: read, pw: gfs)
  3. Reference Extraction Service here
  4. Reference Data Download here
  5. Infobox Extraction Service here
  6. ID service here
  1. definition of a set of problems with different layers of complexity
  2. analysis of various groups of subjects with respect to these synchronization problems

August 2019Edit

  • Continuing improvements of the first deployments, which will be an ongoing process. Especially the GFS Data Browser is being worked on:
    • users can now insert any Wikipedia URL into the subject search field
    • overall layout improvements
    • reference information is being added
  • Johannes Frey presented the GFS project at Wikimania
  • We created a news page within our Meta-Wiki project page framework for volunteers to keep them in the loop and encourage exchange. So far this has lead to three more volunteers signing up for our 'GFS Feedback Squad' and two users leaving feedback about our sync target study.

September 2019Edit

  • more work towards sync target study, focus on targets that were brought up by Wikidata users (e.g., geo coordinates, employer, nobel price)
  • intensive work on creating the complement to Wikidata and Wikipedia by collecting and providing data that is currently missing in both

October 2019Edit

November 2019Edit

  • re-extraction of GFS data and fusion
  • some work on the UI
  • identifying and testing ways to generate lists of the Wikipedia articles related to selected topics: categories, infoboxes, Wikidata queries and other articles (lists).

December 2019Edit

  • extraction of reference data for Polish cities; studied sources: BDL - Bank Danych Lokalnych, Wikipedia, Wikidata
  • analysis of available mappings between various geographical identifiers for Polish administrative units
  • showing current understanding of the fusion challenge

January 2020Edit

February 2020Edit

March 2020Edit

  • experiment prototype for improved harvesttemplate
    • index Infoboxes / Templates

April 2020Edit

  • experiment prototype for improved harvesttemplate
    • index Infoboxes / Templates

May 2020Edit

  • watch for feedback of new mockup

June 2020Edit

  • incorporate demo (hard-coded) references view into GFS browser using the novel JSON references dump

Planned Next Steps for July, August and September 2020Edit

  • incorporate demo (hard-coded) references view into GFS browser using the novel JSON references dump
  • GFS browser features
    • include mapping management to allow search for properties of new external sources


Is your final report due but you need more time?



Extension requestEdit

September 30, 2020Edit

In the last month output of our project was quite invisible as we 1. worked a lot on the data 2. had to deal with corona and all its consequences like missing child care. On the good side, we have quite a lot of budget (9000€) left and would like to stretch the project for four months like a budget-neutral extension. We still need time until end of September 2020. Project-wise we found this dump: enwiki-20200401-wbc_entity_usage.sql.gz

- Tracks which pages use which Wikidata items or properties and what aspect (e.g. item label) is used. So we see it realistic to provide the following:

- We have one of the best infobox parsers and we have full information about all properties there. This means we can produce a reliable Wikidata adoption report, which show how much Wikidata is adopted, where it is well adoption in Wikipedia and where it can be improved.

- We can use this to calculate "good imports" from Wikipedia to Wikidata, i.e. where data in WP infoboxes is especially plentiful and well referenced, but missing in Wikidata

- With the improvements on https://tools.wmflabs.org/pltools/harvesttemplates/ we would have a powerful User Interface to exactly tackle these spots

In addition, we started to index authoritative datasets that are often referenced in WP and WD. Taking this data from the source, we can build an interface, e.g. a user script to suggest relevant data points from these data sets to users for inclusion. This part might be experimental, but it would work like this: On https://pl.wikipedia.org/wiki/Pozna%C5%84 Populacja (30.06.2019) • liczba ludności 535 802[3]

[3] is the population count from stat.gov.pl holding the official census for Poland. If this gets updated, we might be able to autodetect that a change is required either in the infobox or on Wikidata (that is up to the community policy).

This will not be complete, but it will probably work for 10-50 million entries in Wikipedia and Wikidata, depending on the quality of the source and how official it is. In the next few month we need to work on the following topics:

- incorporate demo (hard-coded) references view into GFS browser using the novel JSON references dump

- GFS browser features

- include mapping management to allow search for properties of new external sources

@Juliaholze: Hi Julia, thanks for this request and context over your remaining budget as well as the disruptions you experienced due to the pandemic. We can appreciate that work on the project needed to be paused in order to focus on other, more important priorities, as we have experienced these same needs at the Wikimedia Foundation as well. This extension until 30 September 2020 to complete the above activities is formally approved. Your final report will be due on 30 October 2020. I JethroBT (WMF) (talk) 21:25, 6 July 2020 (UTC)
@JethroBT (WMF): Hi Chris, many thanks for your reply. We will complete the above activities and tasks.

Extension requestEdit

November 30, 2020Edit

We would like to request another budget-neutral extension. The main reason is very similar to the previous one. We are currently in the process of adding many authoritative datasets to the GFS browser, which will then enable to have "official" data from the appropriate sources to be included into Wikipedia/Wikidata. In the next two months we need to work on the following topics:

  • GFS browser features
  • include mapping management to allow search for properties of new external sources

Please also see our email to the WMF Grants Administrator.

Extension request approvedEdit

This request is approved. Your new Project end date is November 30, 2020, and your Final Report is due on December 30, 2020.

Marti (WMF) (talk) 19:08, 15 October 2020 (UTC)