Wikidebat


overview

This is a proposal for a new WMF sister project.
Wikidebat
Status of the proposal
Statusunder discussion
Details of the proposal
Project descriptionWhat is the project purpose?

This project proposes to create a unique discussion platform (in the unitary sense) to allow citizens, associations, experts, companies, etc. to exchange spontaneously on societal topics by writing contributions. In this project two technologies in particular would be used: semantic web technology and textual analysis tools, i.e. NLP (Natural Language Processing) methods.

The semantic web tools would allow to link the contributions in a knowledge graph to highlight all the dimensions related to a problem (economic, social, environmental, etc.) as well as the argumentative structures.

NLP tools would allow to group the contributions according to their meaning to avoid redundancy and simplify the reading of the debates.

More details on these two technologies are given in the Proposal section.


What will be its scope?

The main use cases would be :

-Provide a graphical interface to navigate in knowledge graphs. -To propose an input window so that the user can write his contribution or a new topic for debate. -Group contributions according to their semantic proximity. -Propose services (SPARQL queries https://www.w3.org/TR/rdf-sparql-query/ ) to extract the desired information. -Feed a database and update it according to the additions and classifications of the contributions.


How would it benefit to be part of Wikimedia?

Being able to substantiate an argument with at least one article or publication is essential to ensure the quality of the discussions. Knowledge mobilization and reactivation is essential in this project and being part of Wikimedia would therefore be an opportunity. In this perspective, users could use, among others, the pages wikipedia, wikisource, wikiversity, wiktionary, wikinews to support an argument and, conversely, the identification of active debates but without precise references could give indications on pages to be built or extension needs to be filled. The type of collaborative platform envisaged would use the semantic web. Thus there would be close links with the wikidata project (which has created URIs for wikipedia concepts. These resources (i.e. these URIs) could be used within the framework of this project to identify resources and aggregate knowledge about them). Data and metadata would be produced and could feed wikidata: the number of debates, classification of debates, structure of debates (in terms of attendance), articles cited, debates requiring additional resources, characteristics of contributors if possible etc. If this project materializes, it could benefit from the moderation rules already set up by wikimedia. Wikidebat, could benefit from the image of wikipedia (which is known to all) and could be easily identified as a place for consultation and debate in a collaborative and open mode.

This project would mobilize textual analysis tools that could also enrich, in return, other projects from the investments and research made in the framework of Wikidebat.
Potential number of languagesThis would be a matter for further discussion as appropriate. Knowledge or debates would make sense to be in several languages, but this may not be the case for all debates, which for some will only make sense locally.
Proposed taglineConnect and organize to understand
Technical requirements
New features to requireThis project would require the use of databases. They could either be relational databases or triplestores if we place ourselves in the technologies of the semantic web.

Many similar treatments can be done with relational databases and triplestores. In this presentation, I place myself in the ideal case where this project could be done using semantic web technologies.

The choice of the semantic web would allow other users, researchers, journalists or others to add their layers of information on existing triples, thus increasing the value and uses of the data. It would be a way to ensure that shared information (such as reference pages, definitions, etc.) would be the same for all contributions referring to it. Finally, it would be possible to evolve the data voucher more flexibly than a relational database would.

The raw contributions to the discussions will have to be processed using NLP (Natural Language Processing), an API would be needed to call the python programs used.

A graphical user interface would be needed to explore the knowledge graphs.
Development wikiNot yet, but i hope so!
Interested participants
Me at the moment. In case of favourable opportunity, the goal would be to create collaborations and to integrate in this project all those who want to participate in order to realize a demo.

Debate tools based on information technology already exist, but these may have limitations, for example, because of their silo organisation which makes it difficult to identify the interactions that exist between debates, or because of the restrictions that can be made on the subjects of the debates. These projects are obviously very positive, but the proposal presented here seeks to propose a tool where the issues and topics are not selected a priori and where all the debates develop in a single place with the objective of identifying the different dimensions (social, economic, environmental etc.) attached to an issue and their ramifications with other issues in order to have the most complete overview possible of the issues and stakes.

This proposal is based on semantic web technologies. The RDF (Resource Description Framework https://www.w3.org/RDF/ ) language model gives the essential idea of representing data in the form of subject - predicate - object triples, where subjects and objects and predicates are resources identified by URIs (or literals for objects). An example of a triple: a book (Subject, identified by its URI) - has for author (predicate, identified by its URI) - name of the author (object, identified by its URI).

By aggregating the triples together, step by step, we can build graphs without limit. Many languages are built on the RDF language model to enrich the possible representations of triples, one of which allows to determine its own categories of object and relationship, it is the OWL language (Web Ontology Language https://www.w3.org/OWL/ ).

The classes envisaged, which would be defined in OWL, whose instances would be used in the discussions are : C1 Question to open a new debate C2 Definition to return on the same definition the occurrences used in the contributions. C3 Thesis (a short sentence, possibly limited in number of characters) to present the main idea of a contribution. C4 Argument (a short sentence, possibly limited in the number of characters) to support the thesis. C5 Justification that would contain links to sources of articles as a justification class to support arguments and text added by contributors. Ci More technical classes that are not directly displayed but are important for the processing flow (how for example the class of thematic memberships) … The set (Ck) of necessary classes would have to be completed in the event of a favourable opportunity notice.

In practice the contributions would develop around the Questions and the Contributions, in their most complete form, would be composed of the Thesis, Argument, and Justification classes. In order to encourage the participation of all it would be possible to consider incomplete contributions (only a thesis for example) which would then be the expression of an opinion.

In order to relate the contributions to each other and to reveal polarities among the set of ideas, the following relationships can be defined also with OWL

P1 Contradict to mean that a contribution is opposed to an existing contribution P2 RequestPrecision to mean that an argument is not precise enough P3 confirms to mean that a thesis based on another set of sources and justification confirms a similar thesis. P4 Complete to mean that new arguments complement an existing thesis. P5 A for source to relate an argument to its sources P6 is used to relate a definition used in a contribution to its official URI that would be shared for all. ... The set (Pn) of necessary predicates would have to be completed in case of a favourable opportunity opinion.

NLP (natural language processing) tools would be used on theses and arguments, which for the sake of simplicity would be short sentences, which we would try to group according to their meaning. An example is given below to explain the main idea:

The two theses are the two theses below:

-S1 "Biodiversity loss is an even more dramatic problem than climate change".

-S2 " the decrease in biodiversity is the major current risk. »

In the perspective of this project it would be a question of being able to group together the two sentences above which are very close semantically to avoid having too many duplicates which would diminish the overall readability.

This specific step of textual analysis would require methodological deepening in order to determine the adequate treatments. A first idea would be to transform these sentences into a normal form and determine the essential grammatical groups, and to use word embeddings or dictionaries to determine a distance between these two sentences and to group them together if the distance is sufficiently small. Tests should be carried out to determine the thresholds at which contributions can be grouped together.

Similarly, NLP treatments could be used to determine at a more global level the general theme of theses and arguments. Here, for S1 and S2 it would be "biodiversity".

To continue the example, if we have an S3 sentence "Biodiversity is not a major issue. "It would belong to the same general theme "biodiversity", but should not be grouped together with the first two sentences S1 and S2 because it presents an opposite idea.

Unfortunately there is no demonstration tool for the moment, and one of the objectives of the demo would be to determine, by putting the situation in context, what are the relationships and classes to be defined, i.e. to define the two sets (Ck) and (Pn).

The aim of this tool would be to create a link between the spheres of citizenship, academia, production, associations, etc. and, why not, by ricochet, on the political spheres. There are strong stakes around the debates. The fact that scientific results may have been known for many years without being taken into account in the political agenda can be seen as a dysfunction. Fake news and cognitive biases, on the other hand, distance us from methodical approaches (admittedly imperfect and constantly to be pursued) to establish solid facts.

The aim of this proposal on your site is to have as many critics as possible to judge the appropriateness of this project. AND IN CASE OF FAVOURABLE OPPORTUNITY, THE GOAL WOULD BE TO CREATE COLLABORATIONS AND TO INTEGRATE IN THIS PROJECT ALL THOSE WHO WANT TO PARTICIPATE IN ORDER TO REALIZE A DEMO. To conclude this part, this tool would allow both to mobilize the knowledge already accumulated by wikimedia but also to identify the new subjects at stake to be educated and participate in the construction of knowledge.

What will be its scope?

The scope of this project is:

To create a platform for continuous debate where Contributions are organized in knowledge graphs and where groupings of Contributions would be made according to their meaning to improve readability. The use cases are the following:

NAVIGATE IN THE KNOWLEDGE GRAPH The user could navigate in the knowledge graph and directly visualize the debates either via a graphical interface or via an interface representing the tree structure in a textual way.

In the case of a graphical user interface, one can imagine being able to zoom in on certain parts of the knowledge graph. At the most general level one would observe the relationships between the general themes, then one could click in a theme to see the debates located within it and so on until the Contributions.

SEARCH IN THE KNOWLEDGE GRAPH The user can search to select debates by keywords. Underlying SPARQL (https://www.w3.org/TR/rdf-sparql-query/ ) queries would be used to display the corresponding sub-graphs either graphically or textually.

PROPOSE A NEW DEBATE The user can propose a new topic for discussion. The NLP processing would check that a similar topic does not already exist. If similar topics already exist, the user would be asked to confirm that the topic is really different.

ADD A CONTRIBUTION The user can point on a group of Contributions or on a particular Contribution and open a dialog window to write his Contribution and specify what is the relation to the initial idea or group of ideas (contradict, ask for clarification, confirm...) and add links (to wikipedia pages, newspaper articles or scientific articles) to build his Arguments.

GROUP CONTRIBUTIONS TOGETHER The NLP tools would be used to group Contributions according to their meaning, this implies : - Implementing NLP tools to compute the semantic proximities of new Contributions to existing ones. - Using SPARQL queries to assign (modify or update) membership classes according to the results of the classification (for example, using the cases described in the proposal section, we would have at the level of the most general theme for S1 "biodiversity", and at a finer level "biodiversity seen as a serious problem"). Here the classes are represented by labels, but in reality it would surely be several nested classes represented by non-significant codes associated with labels.

MANAGING ABUSIVE CONTENT Use the NLP tools to avoid abusive content (insults and, as far as possible, hateful content and defamation etc.). The automatic treatment of insults is certainly the simplest. Other types of moderation could be explored with NLP treatments but human controls may be necessary if NLP is not sufficient.

SPECIFY SERVICES A knowledge graph can be very large. Services will have to be designed to facilitate access to content and enhance its value: for example, SPARQL queries could be built to identify the most active debates, the debates with the most ramifications or the new debates. SPARQL queries could be predefined with a variable as a parameter to be modified or allow advanced users to directly write their SPARQL query.

DATABASE MAINTENANCE the databases would need to be updated to reflect new additions of contributions and new classifications made on triples (to determine membership categories). And these updates should ideally be made in real time so that the user can check that his contribution is effective.



Proposed byEdit

Wiikkkiiii (talk)

Alternative namesEdit

Related projects/proposalsEdit

Domain namesEdit

Mailing list linksEdit

DemosEdit

People interestedEdit

DiscussionEdit

  Comment @Wiikkkiiii: I can't really give a direct vote here. Can you provide a demo website if it is not much trouble? Arep Ticous 14:04, 4 May 2020 (UTC)


  Comment @Arepticous: Hi Arepticous, thanks for your message! I hope to provide a demo soon. I’ll make a point in one month or two.