Grants:Project/Frimelle and Hadyelsahar/Scribe: Supporting Under-resourced Wikipedia Editors in Creating New Articles

statusselected
Scribe: Supporting Under-resourced Wikipedia Editors in Creating New Articles
summaryScribe is an editing tool to support underserved Wikipedia editors, helping them to plan the structure of their new articles and to find references in their language.
targetUnder-resourced Wikipedias, such as Arabic and Hindi
type of granttools and software
amount41,500 €
type of applicantindividual
granteeFrimelleHadyelsahar
contact• lucie.kaffee(_AT_)gmail.com• hadyelsahar(_AT_)gmail.com
volunteerزكريا
this project needs...
volunteer
join
endorse
created on13:00, 13 November 2018 (UTC)


Project idea

edit

What is the problem you're trying to solve?

edit
 
All Wikipedia language versions, in terms of articles (red) and active editors (blue). A long a tail can be seen for most of the languages having few articles and editors.

Bias of language support in Wikipedias

edit

Wikipedia is a widely used resource to access information about a variety of topics. However, while a few language versions cover a variety of topics, other languages are barely supported. As a speaker of Arabic, one of the 5 most widely spoken languages in the world, you will find a drastic lack of information even on basic topics. Compared to English’s almost 6 million articles, Arabic Wikipedia has only over 600,000 articles. This reproduces a general lack of information on the web: Only 0.6% of the content online is in Arabic [1], restricting the access of a huge part of the world to knowledge and effectively excluding a large proportion of the non-English speaking world from knowledge access.

This problem is not only confined to Arabic. Hindi is spoken by 370 million people in the world, making it the second most commonly spoken language in the world [2]. However, it is only covered by 120,000 Wikipedia articles - a significant gap in information that needs to be addressed urgently to reach a larger share of the world’s population.

The vast majority of Wikipedias have below 100.000 articles. For an encyclopedia containing general knowledge, that is a very small amount of topics covered. These underserved Wikipedias need technological and community-based solutions to support the access of information to every person in the world, regardless of their native language.

When looking at underserved language communities, we find a severe lack of information. This lack of content in the native language is a considerable barrier to participation online. In the case of Wikipedia, we get into a vicious cycle: Few articles mean little attention for the language Wikipedia, with few people browsing their language Wikipedia, there is a small pool of people from which editors could emerge, which results in few articles, which again leads to little attention etc.

Number of Articles Number of Wikipedias
> 5M 1
1M - 4M 14
100K - 1M 45
10K - 100K 79
1K - 10K 111
0 - 100 48

Bias in the number of editors for Wikipedias

edit

Following the small number of articles, there are vast differences in the number of editors maintaining Wikipedias, too. The large Wikipedias, such as English or German, have a large, very active community that contributes to its content. But as visible in Figure 1, most Wikipedias have a small number of editors maintaining them, correlating with the low number of articles.

The largest number of Wikipedias are maintained by under 50 editors (including 11 Wikipedias with one or no active editor). This means a very heavy load on those editors- they have to fill all the various roles of maintaining a Wikipedia and consistently ensuring its high quality, whilst dividing those time-consuming tasks between a very small number of people.

Having editors predominantly from a small set of language communities introduces a certain level of topic bias. Topics important in parts of the world with underserved languages will find their way into Wikipedia with considerably higher difficulty.

Source: http://wikistats.wmflabs.org/display.php?t=wp
Number of Active Editors Number of Wikipedias
> 100K 1
10K - 100K 5
1K - 10K 21
500 - 1K 11
100 - 500 40
50 - 100 30
0 - 50 195
Limitations of the content translation tool
edit
 
Screenshot of the Content Translation Tool, English to Turkish translation

It can be challenging for the limited number of editors in the underserved Wikipedias to create and maintain a large set of high-quality articles. The community has built a set of tools to facilitate article creation, such as the Content Translation Tool. This tool enables editors to easily translate an existing article from one language to another.

However, the content translation tool has the following limitations:

  • The articles that can be translated are selected by their importance to the source Wikipedia community. Topics with significance to the target community do not necessarily have an equivalent in the source Wikipedia. In those cases, there is no starting point using the content translation tool. It has been shown, that the English Wikipedia is not the superset of all Wikipedias, and the overlap of content between the different languages is relatively small, indicating cultural differences in the content. Editors should be encouraged to avoid a topical bias and cover articles important to their communities.
  • From a series of interviews with editors we found that editors tend to keep the references from the source language (or delete them) when creating articles with the content translation tool. In practice, searching for those references and assigning them to the equivalent article sections is a time consuming task and considered the backbone of any high-quality article on Wikipedia.
  • Especially for underserved languages, the machine translation is limited. There are few documents available online aligned with English and even less with other languages, that the translation engine can be trained on. This leads to often criticised quality.
  • Monolingual speakers are disadvantaged. They cannot verify the translation in the context of the source language or the references used.

What is your solution?

edit

We propose a content editing tool (Scribe) for underserved language Wikipedias. This tool will allow editors to have a base to start with when translation is not possible. This will allow editors to choose a subject to write about according to their community interests and notability criteria, regardless of the existence of that topic in other Wikipedias.

Overview of the enhanced editing experience

edit

Scribe will be implemented as a gadget and will provide editors in underserved communities with: section planning, reference collection, and important key points for each section.

It supports editors by planning the layout of a typical Wikipedia article based on existing articles in their language. Further, users can discover and interact with references in their language, to encourage the writing of high-quality articles. These references, collected from online sources, are summarized into important key points for each section. We will rely on Wikidata in the planning of the sections as well as in the integration of references and facts from the existing information in Wikidata.

Our tool will use techniques from Information retrieval and Natural Language Processing research such as reference collection, document planning techniques and extractive summarization.  

 
Document Planning with Scribe
 
Content selection and reference suggestion with Scribe

Document planning

edit

In the content translation tool, the structure of the source Wikipedia is translated and suggested to the editor in the target Wikipedia. In order to achieve the same, we will generate a suggested structure from similar Wikipedia articles in the underserved community Wikipedia. The similarity score will be calculated through the overlap between relationships in Wikidata between the two entities. In the academic literature, there has been a plenty of work on automatically suggesting structure for Wikipedia articles. We intend to follow a similar line of research to Sauper et al.[1], Piccardi et al. [2], adapted to newly created articles where we fallback into Wikidata to calculate similarity between the newly created article and existing articles in the target language to recommend structure from. Calculating similarity scores between Knowledge base entities is a well studied area of research [3][4][5] which we are going to exploit to rank top section recommendations for new articles.

Important key information with references in the target language

edit

Wikidata is used in many Wikipedias already [3] (e.g. the Fromage Infobox on French Wikipedia). While many small Wikipedias lack content, Wikidata contains information for over 50 million data items. Of those entities, over 30 Million have at least 5 statements (excluding external identifiers). While this data is a good starting point, Wikidata only supports factual information, not the extensive contextual information usually contained in a Wikipedia article. Therefore, this information represents a very limited coverage of any high-quality Wikipedia article. On the other hand, there are various online sources that are typically used as a reference for articles that contain more (contextual) information. These references are the backbone of any high-quality article on Wikipedia.

Therefore, we propose to support editors in under-resourced Wikipedias by suggesting more information from external sources, in addition to the facts existing in Wikidata to the facts existing in Wikidata. When editing, we display extracts from online resources in the form of bullet points and the link to the respective resource. This gives us the advantage that all external information has to be validated before publication on Wikipedia.  

We plan follow similar techniques from the multi-document summarization literature[6], specially those which rely on a structured aware approach for content selection [7]. To eliminate information redundancy we will rely on techniques from sentence aggregation[8].

For sentence realisation we rely on an extractive summarization approach following the line of work by Nallapati et al.[9] and Kedzie et al.[10]. Our choice of extractive summarization approach rather than an abstractive one is attributed to the the following reasons: Abstractive approaches are more prone to generate semantically incorrect information (hallucinations[11]), so the quality of summaries will not be guaranteed, especially when applying the same techniques on under-resourced languages with fewer training data. Additionally, for each generated word the model has to form a large target vocabulary, which is very slow in practice. Finally and most importantly, we design our tool so that generated summaries are not imported to the target article as they are but rather aid editors by highlighting the key points in the suggested references. Later on, they can reuse those in creating their article and rephrase according to their community editing standards.

Technical Details

edit

The tool will mainly make use of existing computational resources, it will not be needed to add computational power for this tool only. The computation splits in the following two aspects: (1) client side computation and (2) service based computation. (1) The client side computation (i.e. browser) will mainly be used to manage lightweight frontend functionality, such as the generation of Wikitext, the drag and drop etc. These are functionalities similar to the existing ones and be able to be processed by any computer. (2) The gadget will do API calls to web services hosted on an external server. It is a similar process to what existing gadgets do, e.g. ProveIt [4]. The server side will be responsible for performing tasks such as querying and filtering references, calculating textual similarity between Wikidata entities for section suggestion, performing extractive summarization. In order to reduce the online computation load we intend to do a large amount of caching for potential topics for each target language (existing Wikidata IDs without articles) before the service goes public.

We will work with the server side computational facilities in an incremental way. First, by utilising maximum already available resources for us, such as our personal machines and servers we have access to. If more computational power is needed, we will then work on collaborations with people that already run tools with similar capabilities of which we can use many services in our project, e.g. StepHit [5].

Scribe interface

edit

This is a preliminary design of our gadget. Along the span of our project, we will try several iterations of this design while trying it and acquiring feedback from the editors’ community. Scribe will be designed to have the following key features:

  • Display of suggested section headers, important key points, references, and images
  • Hovering to expand the references content
  • Drag and drop of the suggested headers, references, images into the editing field (in Visual Editor as well as Wikitext)
  • Generation of appropriate Wikitext for the dragged and dropped content
 
Preliminary Design of the Scribe interface, displaying the main content
 
Scribe's drag and drop functionality
 
English translation of the Scribe interface design

Project impact

edit

How will you know if you have met your goals?

edit

We measure our success in two different ways. First, we have a set of deliverables, that we can measure in the products we produce. Secondly, we measure the success in terms of support and satisfaction of editors.

Deliverables
Study Study on under-resourced Wikipedia editors
Gadget Editing tool as a gadget on at least three underserved Wikipedias
Research Paper Paper on supporting editors using multilingual document summarization
Dataset/Task Publication of all datasets developed in the process
Editor Satisfaction
Gadget Usage We measure the success of our editing tool usage in the following: 30 users in 3 languages; they spend less time for editing, and less time researching resources, a visible increase in length of articles
Articles Created We measure the success of Scribe in articles created. 100 articles are from our project

Deliverables

edit

(1) Large-scale study with Wikipedia editors from different under-resourced Wikipedias

To start the development of our tool, we aim to conduct a study that will focus on the community needs in terms of article creation and reference discovery. This study will be published and is a good general starting point for further investigation of the topic of editor support. We aim at interviewing a variety of editors from a wide range of languages and Wikipedia experience.

(2) Scribe: a gadget to enhance editors experience in underserved communities

Based on the results of the study, we will iteratively create an editing tool. The main goal of the project is to develop a tool that supports the editors in underserved Wikipedias. We will implement the tool as a gadget, giving editors the option to switch it on and off as they prefer. We aim to find at least 3 Wikipedias that will enable the gadget.

To be able to ensure that our tool meets the editors needs, we design an iterative process for the gadget development, to collect feedback and keep the editors in the loop for the further development without disrupting the editing process on Wikipedia itself. The gadget will be tested with members of the community, remotely as well as during events with the community. The resulting articles, if they fulfil the quality requirements, will be encouraged to be added to Wikipedia even in early stages.

We will do active outreach to involve at least one volunteer developer, who is interested in working with us on the project.

(3) Research output

We will publish the work on Scribe at a research venue as open access publications. This work will give an insight for the Wikipedia and the research community into our tool as well as its implications for the people editing Wikipedia. It will include extensive evaluation of how well technical requirements are met as well as the requirements by the community.

All datasets created will be published to encourage reuse and further work in this topic area.

Editor Satisfaction

edit

We want to measure the success of the actual tool in the usefulness to the community in addition to the deliverables. Besides the feedback we will collect from iterating over the development of Scribe, we measure the usage and articles created by the tool.

Do you have any goals around participation or content?

edit

The metrics relevant to our project is (3) Number of content pages created or improved, across all Wikimedia projects. As we are developing a tool, the content pages produced are a long-time goal, that will continue even after the project. However, as part of our outreach and testing of the tool, we are aiming to enable users to create 100 articles across all tested language Wikipedias. We aim to create this amount of articles, given we will have multiple iterations of working with the tool with experienced as well as new editors, that will work on real missing articles. We aim to integrate pages that are of high-quality at the end of the experiments. With the deployment of the tool, we aim to recruit editors to use Scribe and we will measure the articles created through the tool directly.

Fit with strategy

edit

One of the two key points of Wikimedia’s strategy focuses on “Knowledge equity”, including “communities that have been left out by structures of power and privilege”.

Supporting under-resourced language communities serves this purpose, as these communities have been left out of consideration in research as well as practice. Little research has been conducted on document summarization in non-English or Western European languages. We focus with our project on community members, that might speak one of the most spoken languages of the world (such as Arabic) but have little to no access to the information available online due to a language barrier.

Increasing the content on their Wikipedias by supporting editors can have a huge impact on speakers of the languages.

Sustainability

edit

We will make our project sustainable on multiple levels. On a software level, we integrate our project as a gadget on Wikipedia, following the community guidelines. We plan to involve volunteer developers from the beginning so that we can ensure a wider audience is familiar with the code base. We will do that by attending the Wikimedia Hackathon 2019.

Further, we will publish all produced datasets to encourage research on related topics in the research community. A good dataset can spark interest in the investigation of a problem. We believe if we provide datasets for our task, we can spark a similar interest in the community for the challenge that under-resourced languages pose.

The studies we will conduct with members of the community will give us a wide insight into the editing behaviour of community members of typically over-seen communities. With a mix of quantitative and qualitative studies, we will build a baseline for further understanding of how to support those communities. All studies will be published and we will maintain a page for output, where we will encourage translation by the community.

Project plan

edit

Activities

edit
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12
Planning of the research x
Community outreach x x x x x x x x x x
Conduct interviews and quantitative community study x x x x x x x x x
Gadget development x x x x x x x x x
Testing of Gadget on Wikipedia x x x
R: Document planning x x
R: References collection and key point extraction x x x
Writing Research publications x x x

We will set up a project page to collect feedback from the community of each Wikipedia and link them to a meta page in English. This will help to stay connected with the communities even after the project ended. As we will involve the communities from the beginning in each step of our development, we can ensure that there is interest in the direction of our research.

Budget

edit

Our budget includes the costs of two people involved with the project management and research, for the whole span of the project. These positions will be filled by the grantees. The positions split as following: One person working in research, project management and outreach in 15 hours/week basis, one person focusing on research and software development support, 8 hours/week for 12 months each. The rate for the researchers is 18.12€/hour before tax, totaling to 20,000€ for the whole timespan.

We plan to hire a part-time developer to make sure that the development of the project is flawless and the code well-maintainable, following MediaWiki standards. The developer will be responsible for the development of the gadget (Javascript), as well as the code for retrieving and displaying the references and the interaction with Wikidata. As the first three months of the project will focus on research and outreach, we will hire the developer for 9 months. The rate will be 20.83€ per hour, before taxes, totaling to 15,000€ for 9 months.

For dissemination reasons, we are planning to visit two Wikimedia events: one at the beginning of the projects (such as WikiArabia or Celtic Knot), to talk with editors in person about their needs and wishes for the tool. The second one will be Wikimania, where we are planning to present the state of the project and again gather input from the Wikipedia communities. Wikimania is the right venue for this, as it will have a large pool of editors from different Wikipedia language versions.

As the dissemination and recruitment will aim at editors of various Wikipedia languages, we will have to translate calls for participation in studies, results and other outreach. We, therefore, include fees for professional translations in our budget.

Further, for more outreach, we plan to organize events for communities that are under-resourced in Wikipedia, such as editathons. This part of the budget contains also a buffer for unforeseen costs in the running of the project.

Project Management, Research (2 People) 20K
Part-time developer (9 months) 15K
Visit of Wikimedia Events (2 People) 3K
Professional Translation fees 1.5K
Event Organization Giveaways & Buffer 2K

Total: 41,500 € (before tax)

Community engagement

edit

The project is focused on the community, and we are therefore dependent on a strong collaboration with them. In previous work, we could show that there is an interest of underserved communities in collaborating in work on new tools. This is something we want to deepen and strengthen in this project.

We plan to involve the community on each level of our project. Most importantly, we will collect feedback before, while and after the development from editors.

We have build relations with community members of different underserved language Wikipedias through previous involvement with the topic of supporting their reading and editing experience. We will deepen these relationships and reach out for further interested community members by: (1) updating interested community members on a project page/blog (2) reaching out on mailing lists and project pages, and (3) in-person discussions at Wikimedia events.

To ensure the sustainability of the project, we will reach out to the technical community (by existing connections and attending the Wikimedia Hackathon 2019) and ensure a close collaboration with volunteer developers.

Get involved

edit

Participants

edit

Lucie is a Ph.D researcher at the University of Southampton. Her work focuses on under-resourced languages and linked data (as in Wikidata)[12][13] and she has worked on references in Wikipedia and Wikidata[14]. She was an employee of Wikimedia Deutschland from 2014 to 2016 in the Wikidata team and is active in the technical community as a member of the Wikimedia Code of Conduct Committee. She developed the ArticlePlaceholder extension, that is now deployed on 14 low-resource Wikipedias.

Hady is postdoc researcher in Naver Labs, working on areas of Natural Language Processing and Machine Learning. His main research interests are Natural language generation [15] and Knowledge graph representations, with applications including text summarization, knowledge base to text generation, and question generation[16]. During his studies he gained an industrial experience by pursuing several internships in Bloomberg, Microsoft and IBM working various projects ranging from Multil Document Summarization to Multilingual Sentiment Analysis [17] and Social Media Analysis.

Together, we have published work on text generation from Wikidata triples for Wikipedia’s ArticlePlaceholder[18][19]. Our research has shown that an approach to generate text from Wikidata triples for underserved languages scores a high quality, and was evaluated with members of the Wikipedia communities in Arabic and Esperanto. We extended this work by semi-structured interviews that are to be published.

  • Volunteer Testing Zack (talk) 21:51, 3 December 2018 (UTC)

Community notification

edit

Endorsements

edit

Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).

  1. I've been in contact with the researchers for a long time and I have seen their progress over time. I have also witnessed their community engagement efforts taking place (surveys and interviews). So, their progress is real. I've also been a long-time supporter to the research because it offers a missing link between the potentials of Wikidata and the Arabic Wikipedia community that I am part of.Reem Al-Kashif (talk) 09:46, 23 November 2018 (UTC)
  2. Obvisouly, closing the language gap is perhaps one of the most important point to reach knowledge equity. A tool like this one could have an incredible value for all languages but especially for the under-resourced one. Cheers, VIGNERON * discut. 13:32, 26 November 2018 (UTC)
  3. I see some similarities to the efforts on Wikidata+infoboxes, but of course an infobox will never be able to inform as much as prose can, so this project would be great to do. Jane023 (talk) 16:38, 26 November 2018 (UTC)
  4. That could be a solution for articles with the same structure. See in discussion for further information. --Helmoony (talk) 17:09, 26 November 2018 (UTC)
  5. Definitely something that will help us in South Africa with our many languages. No need for new editors to learn Wikidata!Michaelgraaf (talk) 12:22, 27 November 2018 (UTC)
  6. The approach suggested here by a researcher who has worked a long time on this topic seems effective in order to create more content in Wikipedias that are still very small. This is a very underserved area, and we need to try out more things there! Lucie and Hady could really make a difference by implementing this proposal. -- denny (talk) 16:42, 29 November 2018 (UTC)
  7. The ArticlePlaceholder was a great first step for helping low-resource language communities. A tool like this that enables editors to more quickly produce good results using the information already available would be very useful. And I know Lucie is a great person for exactly such a task. --WiseWoman (talk) 23:37, 29 November 2018 (UTC)
  8. I work as a Data Scientist for Wikidata and Wiktionary (as a contractor for WMDE), and this is by far the most interesting and potentially useful information retrieval/NLP initiative in this domain that I have encountered recently! GoranSM (talk) 14:18, 1 December 2018 (UTC)
  9. Exciting project! Lucie has experience both on the research side of things and on the implementation in production (with the ArticlePlaceholder) so I can't wait to see the results! − Pintoch (talk) 12:55, 2 December 2018 (UTC)
  10. Scribe will complete ContentTranslation - and provide equivalents better that current machine translation for low-resource languages. Thanks, makers :^] Zack (talk) 21:36, 3 December 2018 (UTC)
  11. Giving more people more access to more knowledge is what we strive for and this project directly works towards it! I think it shows pretty clearly how it will be able to tangibly benefit many people, especially ones that have not been at the center of our focus so far. --Incabell (talk) 16:18, 5 December 2018 (UTC)
  12.   Support Yet another good idea to address and tackle the content gap, that would hopefully result in a better capacity of engaging the less-represented communities. Sannita - not just another it.wiki sysop 01:19, 6 December 2018 (UTC)
  13.   Support The project addresses the important and widely known social issue of underserved communities by lowering the technical barriers to their participation. The proposer has demonstrated expertise in such technical matters and understands how to engage diverse wiki communities. -- Daniel Mietchen (talk) 16:43, 6 December 2018 (UTC)
  14.   Support Exactly what I had in mind, totally agree with Sannita. Ircpresident (talk) 00:54, 12 December 2018 (UTC)
  15.   Support The project is twofold: it works towards a better knowledge access equity, but also developing incentives for users to develop content in underserved languages. I am absolutely convinced that Lucie and Hady have the required technical and scientifical competencies to drive the project, but also the human skills to interact with the communities to make it successful. Disclaimer: I am Hady's phd co-supervisor.
  16.   Support --Isaac (talk) 23:29, 21 December 2018 (UTC)
  17.   Support If we want to make significant progress towards knowledge equity we need to make progress on this. Lucie and Hady are the right people to get us started. --Lydia Pintscher (WMDE) (talk) 19:46, 30 December 2018 (UTC)
  18.   Support Good structured and promissing project. Sky xe (talk) 00:49, 6 January 2019 (UTC)
  19.   Support I believe this idea is not only crucial for small Wikipedias, but also valuable for technological progress in the whole Wikimedia landscape. As I've been working on automatic reference suggestion for Wikidata statements, I'd be pleased to give the applicants some specific feedback. Hjfocs (talk) 14:13, 15 January 2019 (UTC)
  20.   Support This is an important area for improvement, and the grantees’ prior work inspires confidence in the success of this project. --Lucas Werkmeister (talk) 15:11, 18 January 2019 (UTC)
  21.   Support -- Maxlath (talk) 07:11, 19 January 2019 (UTC)
  22. I endorse this project1 Amrapaliz (talk) 19:09, 21 January 2019 (UTC)
  • This is a worthwhile project that will benefit minority language wikipedias and contribute to knowledge diversity and equity. Doctor 17 (talk) 00:59, 27 May 2019 (UTC)

References

edit
  1. Sauper, Christina; Barzilay, Regina (2009). "Automatically Generating Wikipedia Articles: A Structure-Aware Approach" (PDF). Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP: 208–216. 
  2. Piccardi, Tiziano; Catasta, Michele (2018). "Structuring Wikipedia Articles with Section Recommendations" (PDF). SIGIR '18 The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval: 665–674. doi:10.1145/3209978.3209984. Retrieved 2018-06-27. 
  3. Bordes, Antoine (2013). "Translating Embeddings for Modeling Multi-relational Data". Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. 
  4. Lin, Yankai (2015). "Learning Entity and Relation Embeddings for Knowledge Graph Completion". Proceedings of the Twenty-Ninth (AAAI) Conference on Artificial Intelligence. 
  5. Nickel, Maximilian (2016). "A Review of Relational Machine Learning for Knowledge Graphs". Proceedings of the IEEE. doi:10.1109/JPROC.2015.2483592. 
  6. Goldstein, Jade (2000). "Multi-document summarization by sentence extraction". NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization. 
  7. Banerjee, Siddhartha (2016). "WikiWrite: Generating Wikipedia Articles Automatically" (PDF). Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16). 
  8. Hercules, Dalianis (1996). "Aggregation in natural language generation". Trends in Natural Language Generation An Artificial Intelligence Perspective. 
  9. Nallapati, Ramesh (2017). "SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents". Proceedings of the Thirty-First {AAAI} Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California USA. 
  10. Kedzie, Chris (2018). "Content Selection in Deep Learning Models of Summarization". Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. 
  11. Chisholm, Andrew (2017). "Learning to generate one-sentence biographies from Wikidata". Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, {EACL} 2017, Valencia, Spain, April 3-7, 2017, Volume 1: Long Papers. 
  12. Kaffee, Lucie-Aimée; Piscopo, Alessandro; Vougiouklis, Pavlos; Simperl, Elena; Carr, Leslie; Pintscher, Lydia (2017). "A Glimpse into Babel: An Analysis of Multilinguality in Wikidata" (PDF). Proceedings of the 13th International Symposium on Open Collaboration: 14. 
  13. Kaffee, Lucie-Aimée (2018). "Analysis of Editors' Languages in Wikidata" (PDF). Proceedings of the 14th International Symposium on Open Collaboration. 
  14. Piscopo, Alessandro; Kaffee, Lucie-Aimée; Phethean, Chris; Simperl, Elena (2017). "Provenance Information in a Collaborative Knowledge Graph: an Evaluation of Wikidata External References" (PDF). International Semantic Web Conference: 14. 
  15. Vougiouklis, Pavlos (2018). "Neural wikipedian: Generating textual summaries from knowledge base triples" (PDF). Web Semantics: Science, Services and Agents on the World Wide Web 52–53 (2018) 1–15. 
  16. Elsahar, Hady (2018). "Zero-Shot Question Generation from Knowledge Graphs for Unseen Predicates and Entity Types". Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA. 
  17. Elsahar, Hady (2015). "Building Large Arabic Multi-domain Resources for Sentiment Analysis". Computational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part {II}. doi:10.1007/978-3-319-18117-2\_2. 
  18. Kaffee, Lucie-Aimée; Elsahar, Hady; Vougiouklis, Pavlos; Gravier, Christophe; Laforest, Frédérique; Hare, Jonathon; Simperl, Elena (2018). "Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders". European Semantic Web Conference. 
  19. Kaffee, Lucie-Aimée; Elsahar, Hady; Vougiouklis, Pavlos; Gravier, Christophe; Laforest, Frédérique; Hare, Jonathon; Simperl, Elena (2018). "Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata". Proceedings of NAACL-HLT 2018: 640–645.