Community Wishlist Survey 2019/Wikidata/Improvements to the reliability of Wikidata

Improvements to the reliability of Wikidata

  • Problem: Wikidata is often considered unreliable as a data source due to vandalism and a lack of adequate sourcing. In spite of these issues, Wikidata is still used in infoboxes across Wikipedias, even though it can be difficult for Wikipedia editors to edit Wikidata; and on many wikis references are omitted entirely from Wikidata infoboxes, making it more complicated to check the veracity of data.
  • Who would benefit: Wikidata; Wikipedia articles (infoboxes, descriptions, external database links, etc.) and other users of Wikidata data
  • Proposed solution: In order to verify existing data, it would be helpful to make sure that references used in Wikidata are shown when statements are used in Wikipedia infoboxes and the like,[note 1] and it could also be helpful to allow editors of both Wikipedia and Wikidata to add maintenance tags with a script (e.g. "[this statement] might not be true" or "needs a source"), replicating the utility of Wikipedia's venerable [citation needed]-style tags. Particularly for data which can't be verified easily using secondary sources or constraint violations, like geographic coordinates, it would also be beneficial to have tools to easily find errors (e.g. wrong coordinates) or oddities (e.g. weird precision) in the data.

Notes

  1. This would probably involve editing the Wikidata-related Lua modules for each wiki, or by creating a new module to incorporate all their features and localizing it across all wikis (there are currently multiple disparate modules in use, even on individual wikis).
  • More comments: Originally page protection enhancements were part of this proposal. These are now part of a separate proposal.
  • Phabricator tickets:
    • phab:T209242 – For Lua modules across Wikipedias, allow display of sources from Wikidata and filtering of unreferenced statements across all Wikidata infoboxes (11 November 2018)
    • phab:T209237 – Gadget for Wikidata and Wikipedia users to add maintenance tags to Wikidata items (11 November 2018)
    • phab:T209241 – Creation of software to auto-detect errors or oddities in internal or unreferenceable Wikidata statements, e.g. images, geographic coordinates (11 November 2018)
    • Related: phab:T148928 – Wikidata integration for proveit gadget (23 October 2016)
  • Proposer: Jc86035 (talk) 20:24, 29 October 2018 (UTC)[reply]

Discussion

Many of the comments discuss page protection because it was originally part of the proposal before being split into Partial and multi-item protection for Wikidata items. Jc86035 (talk) 16:28, 16 November 2018 (UTC)[reply]
Discussion from before the start of voting. Jc86035 (talk) 11:39, 17 November 2018 (UTC)[reply]

One issue that I hit recently is outdate Wikipedia imports. For instance, a (wrong) set of coordinates was imported from de.wp. In the mean time, the data was corrected on Wikipedia, but not Wikidata. A lot of additional work can be done on coordinates, such as supporting areas, which in turn allows checks such as "is this coordinate within the area indicated by the P31 of the item?"--Strainu (talk) 21:55, 29 October 2018 (UTC)[reply]

@Strainu: I agree; I've added a sentence about this to the proposal. Jc86035 (talk) 05:53, 30 October 2018 (UTC)[reply]
  • One possibility would be to add a button "doubtful" to the right of the "edit" button. Then others could query the doubtful statements (possibly using Wikidata Query) and fix the problems by deleting or editing the bad quality or wrong statements. Geert Van Pamel (WMBE) (talk) 10:47, 4 November 2018 (UTC)[reply]
    @Geertivp: I think that sounds like a good idea; such a feature could be used to add preset tags like in Wikipedia articles (e.g. citation needed, needs update, doubtful/dubious), possibly by making the script add qualifiers to statements. I have proposed a property for this information, since one does not yet exist. Jc86035 (talk) 11:16, 4 November 2018 (UTC)[reply]
    See also phabricator:T139583 --Lydia Pintscher (WMDE) (talk) 13:51, 9 November 2018 (UTC)[reply]
  • There is a way in Lua to ensure that only sourced properties are displayed in templates. That could be a first move for critical pages. Another idea could be to convert URL listed as sources into special items that could be reviewed to status on the source quality. This could be partially automated using the source base URL. Pubmed is reliable. CNN is quite reliable. The Onion is fake. -- Thibdx (talk) 00:08, 5 November 2018 (UTC)[reply]

Define Quality of external source and double check

I think Open Free Knowledge most important component is having references to external trusted sources. I therefore would like to see

  • in the Wikipedia infobox
    • that the reader easy can see
      • that a fact is based on an external source
      • the quality of source used - we need to find a way to describe quality of a source and our experience using it
      • that we have confirmed that this fact is the same as an external fact compare Listeria list that runs daily comparing Wikidata with Nobelprize.org link see example diff
    • that we also visualize in the Wikipedia infobox a mismatch with an external source found in Wikidata

- 09:56, 30 October 2018 (UTC)

@Salgo60: Added a sentence and a note to the proposal. The English Wikipedia's WikidataIB Lua module is able to filter statements based on whether or not they have a source which is not to a WMF project, but I don't think it's able to display sources from Wikidata. Jc86035 (talk) 13:21, 30 October 2018 (UTC)[reply]
I think the quality of a source might be difficult to categorize, although perhaps one could filter out sources which are blacklisted by the English Wikipedia for being unusable. Jc86035 (talk) 13:37, 30 October 2018 (UTC)[reply]
Thanks If you listen to the vision of en:Tim Berners-Lee then in an en:linked data world then I as a reader should be able to select what sources I trust... I feel Trust is very important and we need to think about how Wikipedia explains to the reader the quality of used sources. One vision could also be that I as an reader of Wikipedia should be able to see what facts in the article I read is based on sources I trust - Salgo60 (talk) 13:44, 30 October 2018 (UTC)[reply]
  • Basically your solution to vandalism is to prevent Wikipedia editors without existing Wikidata edits from removing vandalism that might show up on Wikipedia for critical fields. That sounds to me like a bad plan. ChristianKl (talk) 18:03, 2 November 2018 (UTC)[reply]
    @ChristianKl: Not necessarily (although that specificially would probably not be a good idea). Obviously what exactly to protect would be decided by Wikidata admins. You might
    • prevent a class of users from editing a group of pages;
    • prevent a class of users from editing specific parts of a group of pages;
    • prevent a class of users from adding statements without references on a group of pages;
    • prevent a class of users from adding statements without references on specific parts of a group of pages;
    • prevent a class of users from editing existing data for a group of pages;
    • prevent a class of users from editing some existing data for a group of pages.
    The latter two would probably be quite difficult to implement properly without preventing some registered users from reverting vandalism and without preventing some users from editing the data that they just added, although perhaps they could be limited to the scope of statements with existing references which have URLs or which cite another item.
    I think some of these would also be particularly useful for items about scientific articles, since usually after import they shouldn't need to be edited manually (and edits are likely to come from Special:Random). Jc86035 (talk) 05:19, 3 November 2018 (UTC)[reply]
I would forbid that it could be put "imported from Wikimedia project (P143)", because we are not a primary source and can not serve as a reference in certain affirmations. --Vanbasten 23 (talk) 11:11, 4 November 2018 (UTC)[reply]

Jc86035 Hello. Thanks for submitting a proposal for the wishlist survey. This proposal is very broad and proposes a number of different solutions. I'd encourage you to make the problem statement more focused on specific issues (like Ability to easily import references) that are individually actionable. Otherwise it will be very hard for us to focus our work on what's important. Thank you. -- NKohli (WMF) (talk) 23:40, 5 November 2018 (UTC)[reply]

@NKohli (WMF): Would it be sufficient for me to order the Phabricator tickets (once I create them) from most important to least? I think they're all important, although some of them would certainly have a bigger impact than others. Jc86035 (talk) 14:08, 6 November 2018 (UTC)[reply]
@Jc86035: Just writing down what you think are the steps to solve this in order of importance would be fine. Phabricator tickets would be nice but aren't necessary. I can't say we will be able to do everything but having the proposed solution in order of importance according to you would be helpful. I see Geert Van Pamel and Lydia suggested a solution above. You could also incorporate that in the proposal if you think it's a good idea. Thank you. -- NKohli (WMF) (talk) 22:49, 9 November 2018 (UTC)[reply]
@Jc86035: Thanks for trimming this down. I think the remaining phab tickets are more on the scale of what Community Tech could work on. Ryan Kaldari (WMF) (talk) 21:04, 15 November 2018 (UTC)[reply]
@Jc86035: I think taking out the crossed-out sentences will make this a lot more readable. Thank you. -- NKohli (WMF) (talk) 21:42, 15 November 2018 (UTC)[reply]

Voting