WIKIMOVE/Podcast/Transcript Episode 9

Nicole: Welcome to episode 9 of WIKIMOVE In this podcast, we discuss the future of the Wikimedia Movement. I'm Nicole Ebber and with me is Nikki Zeuner. We are both part of Wikimedia Deutschland's Movement strategy and global relations team.

Nikki: This episode was recorded at 17:00 UTC on February 17th, 2023. Things may have changed since we recorded this show, but what we still know…

Nicole: …is that by 2030, Wikimedia will become the essential infrastructure of the ecosystem of free knowledge. And anyone who shares our vision will be able to join us.

Nikki: We have a meta page and a web page, and all the relevant links are available there and also in the show notes.

Nicole: Today's topic has actually been a request from some of our listeners who filled in the feedback survey and they suggested that we speak about abstract Wikipedia in our show. And normally I would now like introduce what abstract Wikipedia actually is, but I will leave this question to our guests to answer. So we will talk about abstract Wikipedia, about wiki functions, how they relate to and built upon wikidata, and of course how they contribute to the implementation of Movement strategy, to the infrastructure of the ecosystem of free knowledge that I just mentioned and also to knowledge equity and knowledge as a service. There's a note also, if you wanna hear more and learn more about knowledge as a service, listen to our first WIKIMOVE episode because that's exactly what it is about.

Nikki: So the good thing about this show is that we as hosts, we get to ask all the dumb questions that we never had a chance to ask and learn things. So today I'm going to be the child who wanders into the middle of a movie and wants to know. “What's going on here?” Let's see if anybody knows what movie reference that is. So today, our guests answering the dumb questions are our colleague from Wikimedia Germany, Lydia Pintscher, and our former colleague, Denny Vrandečić. Lydia is a German computer scientist and free software and open culture supporter. She studied computer science with a focus on innovation, medicine, and language at Karlsruhe Institute of Technology. She's the portfolio lead for Wikidata at Wikimedia Germany and has been with the project since its beginnings in 2012. In her free time, she supports the KDE community as vice president of KDE-EV, which is the organization, the charitable organization that supports KDE. Welcome, Lydian. Great to have you on the show.

Nicole: And welcome Denny, Denny Vrandečić. He is a Croatian-American computer scientist. He's a founder of Wikidata, and that's also why Nikki said he's a former colleague of us at Wikimedia Deutschland. He's also the co-developer of Semantic Media Wiki. Also was a member of the Board of Trustees of the Wikimedia Foundation a couple of years ago, and he worked at Google on the Knowledge Graph. He was the first administrator of the Croatian Wikipedia and also has been a Wikipedian for almost 20 years. Today, he is leading the development of Wiki Functions and Abstract Wikipedia at the Wikimedia Foundation. And Danny lives in Berkeley, California. So good morning, welcome Denny.

Denny: Thank you so much for having me.

Nikki: Okay, Danny, what is abstract Wikipedia? So can you explain it to me, the child wandering into the middle of the movie in a simple way, and maybe talk a little bit about what the goals are and who would ultimately benefit from this project?

Denny: The idea behind abstract Wikipedia is to create and maintain the content of a Wikipedia article only once, but make it available in all the different languages of the Movement. So right now we have about 300 language editions of Wikipedia and all of those have articles that are independently maintained, written and created. And here's the idea is, well, maybe we can somehow put the content in some abstract representation, abstracting from an individual natural language, and then use functions to turn this abstract representation back into natural language text so that when something happens or when there is an error in the content, you have to update it in one place, but make it available in all the different languages at once. So that's the rough idea of Abstract Wikipedia. Wikifunctions then is a catalog of functions that has functions for generating natural language, but also for all other kinds of things. And well, the hope is that Abstract Wikipedia will basically benefit everyone who wants to read Wikipedia, particularly in languages that are currently underrepresented. And also to make the content of Wikipedia, and this is where it ties in together with the knowledge as a service idea, to make the content of Wikipedia more accessible to machine processing and with wiki functions, also to introduce a new form of knowledge that can be used in all kinds of ways.

Nikki: So, Lydia, I have to ask you, he said managed the content in Wikipedia, but isn't the content in Wikidata? Can you explain like how all this like relates to Wikidata?

Lydia: Yes. So Danny painted this beautiful picture of making it much easier to write Wikipedia articles in many, many languages. And Wikidata is a fundamental building block for that. By giving you both statements about the world. So things like the number of inhabitants of Berlin, but also lexicographical data. So data like you would find it in a dictionary. And taking those two things together is, as I said, fundamental building block for abstract Wikipedia to actually be able to turn these abstract content that Danny was talking about into human readable text that we all can enjoy in a Wikipedia.

Nikki: So the data, the actual information comes from Wikidata and then Wikifunctions and abstract Wikipedia turn it into a human readable text, correct? I'm just trying to regurgitate what I understand. Okay, cool. So maybe to make it easier to understand, can we use an example throughout this show? Is there one specific topic that we can take as an example, Denny?

Denny: Since this is an audio show, let's keep it very simple. Let's take a sentence like Jupiter is the largest planet in the solar system. That's something that Abstract Wikipedia can represent, but Wikidata really cannot. Wikidata has a lot of information about Jupiter. It knows that it's a gas giant. It knows it's in the solar system. It knows its size, its volume, its mass, it all has all this data. But the fact that it is the largest planet in the solar system is not explicitly stored in Wikidata. You could make a query and ask, give me all the planets sorted by Mars or whatever, and you would see Jupiter in the first place, but it's nowhere explicitly stored. And, but this kind of information is really available in Wikipedia articles. This is how the article about Jupiter opens in basically every language of Wikipedia. So how do we tell the system that this is a piece of information that we want to see in the article? This is where Abstract Wikipedia comes in. Abstract Wikipedia gives you the means to tell the system, create this piece of information at the beginning of the article and have it then displayed in all the different languages.

Nicole: Lydia, I would like to ask you, Wikidata and Movement Strategy, how do they connect? I'm sure you've answered that question multiple times, but for our listeners, it would be I think very interesting to better understand how does Wikidata contribute to Wikimedia becoming the essential infrastructure of free knowledge and by that also contribute to knowledge equity and knowledge as a service.

Lydia: Wikidata is very, very important for us to realize our strategy. Wikidata helps both with knowledge as a service and knowledge equity. I would say maybe a bit more with knowledge as a service. But let's start with knowledge equity. Knowledge equity, Wikidata, for example, helps people store information once and then make use of it in all Wikipedias and outside Wikimedia in all the other Wikimedia projects in your digital personal assistant on your phone and much more. Instead of having to do that 300 times for 300 different Wikipedias and many, many more times on top of that for anyone else who would want that data. So that takes a lot of burden in terms of work away, but also makes that knowledge available in many more languages than it currently is. On the knowledge as a service side, Wikidata makes our content, our knowledge, available in a machine readable way. That makes it so much easier to build applications, visualizations, other cool experiences on top of our knowledge than it has ever been before Wikidata existed.

Nicole: And I'm giving that kind of same question now also to Denny. How would you explain the connection of abstract Wikipedia to knowledge equity and knowledge as a service and the infrastructure of free knowledge?

Denny: Well, it's basically almost the same answer, making more knowledge accessible in more languages to more people. And there we want to even push further than what Wikidata is doing. So Wikidata can be used, for example, to power infoboxes of the different languages and so on. And with Abstract Wikipedia, we really hope to actually create article text that can be read to create at least simple articles, but to create a common baseline of knowledge that's available in many more languages than it is today. So this is how, this is the one side of knowledge equity. The other side of knowledge equity, which we tend to sometimes forget, but which I think is even more important, is that we want to allow everyone to contribute to the sum of all knowledge. And this is the same thing as in Wikidata. In Wikidata, you don't have to be an English speaker to contribute. We have, in fact, many people who are contributing in other languages. And it's the first project that we have where people with different language backgrounds actually contribute on the same items, on the same data together and collaboratively, without having to have a shared language. Abstract Wikipedia is aiming for the same thing, but will meet the more difficult task because the content is more complex and more prone to be discussed. So we'll see how this will work out, but it is a very important goal that we want to have everyone be able to share in the common baseline of knowledge as well, not just to read it, but also to contribute to it. On the side of knowledge as a service, it is also similar to Wikidata. It is more machine readable than the content in Wikipedia. We still don't have a good API for text, even though the LLMs are getting closer, but there's a lot of information which is still not easily processable for high fidelity and with a high precision. And this is where Abstract Wikipedia comes in. And in fact, I think that Abstract Wikipedia will help with making completely novel product ideas, completely novel ideas for features that Wikipedia can have, because the content can be processed before it's then being used to generate the text. Not just inside of Wikipedia, but also outside and therefore power many more tools.

Nikki: I think I'm going to ask a question that I was going to ask at the bottom of the show, but it's popping into my mind now. Can you explain to me, a lot of this sounds like artificial intelligence already. Can you explain to me the difference between what you're working on here and something like chatGPT, for example? Because to me, chatGPT is also, I give it a query. It writes a text in natural language. And yes, I know all the limitations of the verification of what's in that text. But how is this different what you're building from a chatbot?

Denny: The goals might sound similar, but the approach is fundamentally opposed to what large language models like JetJTP are doing. In particular, in Abstract Wikipedia, one of the main goals, which is very important because it's a Wikimedia project, is if you see an error, if you see that something is wrong, you as a contributor will be able to click on the right edit button and fix that error and know that it's actually fixed. That's something you can't do with large language models. You can't just go in and say, oh, let's push a few weights here. And from now on, I'm confident it will never claim again that whatever is the case. There's no such way to actually edit large language models, not that we know of so far. Abstract Wikipedia, Wiki functions, controllable by the community, can be understood what is going on, even though it might sometimes be complex, but we are used to this kind of complexity if we think about templates, we get data, comments, and all these interactions between the different projects. But there is a way for the community to control down to the single letter, what is actually happening in Upset Wikipedia. And this at the same time is also the disadvantage of Abstract Wikipedia, because the machine-learned models are more comprehensive. They claim to capture more. I mean, they can deal with every kind of question, whether it's truthful or not, it's a different question. But they claim they can answer every question, right? And you can use them basically for much more, because it's not so brittle. Abstract Wikipedia will always be more brittle than those language models you lose control and therefore you sometimes use fidelity with reality, which means just basically those systems can lie. Whereas in abstract Wikipedia, if it lies, which can also happen, but then it's an intentional lie by a contributor who has put it into the system. And we can have other contributors going in and fixing it. Whereas the language models, well, they work differently.

Nikki: Okay. So. Where are you at currently with it? So when did you start? Where are you at? What are the next steps? What are some challenges you're facing maybe?

Denny: We started the project in 2020 and are working on it on the WikiFunction spot. So there are two stages to the project. First one is to launch WikiFunction. And then to add all the functionality that is needed to get Abstract Wikipedia running on top of that, because the functions in the WikiFunctions are the necessary steps for Abstract Wikipedia to work. WikiFunctions is out of the beta. We still need to work on the user interface and on security and safety features so that people cannot break it too easily and so on. And so we are getting closer to the space where we actually want to launch the project and launching your sister project and have it out there so people can actually use it.

Nikki: Great. All right. Same question to you, Lydia. Where's Wikidata at and what are you tackling this year?

Lydia: Right. I've been to that. Maybe some addition to what Danny said about these large language models and how they differ from abstract Wikipedia and other things we're doing. I think one of the very important things for us is knowledge equity. One crucial thing about these large language models is that they need a lot of data to be trained on, which by definition is not available in machine readable digital form for a lot of the languages, for example, or the content that we want to represent. So with Wikidata and their language, their culture, the knowledge they care about without having to amass massive amounts of data to train such a system.

Nikki: That makes sense.

Lydia: But let's dive into Wikidata. A ton is going on, but I think the most important thing at the moment is around data quality and specifically data modeling, making it easier for people to model data in a consistent way. So that it's easier to use our data in systems like abstract Wikipedia, because that is currently one of the big issues that people who try to build up on Wikidata's data still have, that it's kind of unevenly modeled, let's say.

Denny: Yeah, I'm so thankful for Lydia for taking on this job of making the data more consistent. Because in the end, then you will have in wiki functions, hopefully functions that will actually use the data in Wikidata, but they can only do that if the data is there in a predictable way. So I'm super happy that she's doing it. And the other place that I'm really happy about is all the improvements going on in the lexicographical data on Wikidata.

Nicole: Denny, I would love to hear a little bit more about wiki functions and what it is about. And I have to admit that only in the preparation for this episode, I understood that wiki function is actually the sister project, the new sister project, and not abstract Wikipedia. I actually, to be honest, thought abstract Wikipedia is a new sister project. So please talk a little bit about wiki functions, and maybe you can also give an easy example, because when I think of functions, I'm sure it's not just minus plus times x or divided by, but a little bit more complex. So what is it?

Denny: It does get a little bit more complex, but we will actually start with exactly this kind of functions, times, minus, divided by, and all these things. But then you can easily think of more complex functions on top of that, for example. Okay, given we have two dates, how many days have passed between those two days? So you have, or what day of the week is it on a given date? Or, oh, this is where it gets interesting, for example, for Wikidata and Wikipedia. At this date we have now in the Gregorian calendar, but in the Julian calendar, what date would it be? Like, how do we actually translate from one to the other? Or that we have other data systems like the Muslim calendar, the Islamic calendar, or calendar that are used in China or in Thailand and so on. And we want to be able to translate from those different calendars to each other and compare the data. And then unit conversion. I mean, that's something that a lot of Wikipedias already have Lua modules for, to translate one unit into another. And we might, for example, have in Wikidata the size of a country or a county in one unit. And then we want to compare it to its to the parts of the country, like for example, in German with the federal states, and they might be in a different unit, probably not, but they might be, and then we want to add those up and see, do they actually fit to each other and so on. So there's a lot of things you can do with this kind of functions, units, and it's not just math actually. There's also all the functions that can run on language, for example. So you can take like a word and see what is the plural of the word, and therefore you can then make better text and so on which you need for natural language generation. Or you can also do work with ideas, right? I mean, we have in, Wikifunctions will eventually be able to use Wikidata as a knowledge base, which and then we can do, for example, things like compare terms with each other. Compare, for example, did those species live at the same time? Do they live in the same area? It can go to comments and then look up the geoshape of where the animals live or and then compare them and so on. There's so much things you can do with functions. Yeah, this is just a summary of the ideas.

Nicole: Thanks. Super interesting. And I think it's almost endless. Like the number of ideas that you can like generate with that. It's really, I'm probably not, does almost endless exist? I don't know, but nevermind. We also talked in our prep meeting a little bit about the where should the contract content of abstract P Wikipedia actually live. You just talked about it. Where would it, where does this information live? And we asked, for example, will it live on Wikidata? Maybe Lydia, you can start answering that question and Danny can then jump in. Let's see if you're both have the same thoughts on this.

Lydia: So Denny and I already talked extensively about the many different options we have for this. But one of them is definitely to store it on Wikidata. So you would have an item, for example, for Jupyter and then attached to it, somehow you would store the abstract content for a Wikipedia article for Jupiter. Just like you right now, for example, link to all the Wikipedia articles already. That is one option. But we have many more because there are a few other options that we can think of. So for example, one of the things we discussed was that maybe it's not just Wikipedia that would, in the end, like an abstract Wikipedia. But how about a travel guide from Wikiboy Ash, for example? Why not have that in an abstract way as well. And then things get slightly, tiny bit more complicated.

Nicole: And that's where Denny jumps in.

Denny: I'm here with Lydia. We don't really have answers. We have some options and we are discussing it now for like, I think two years, maybe three years. And for example, the technically, the easiest thing would be to just put all the content into wiki functions. But I think that the people who will edit wiki functions are very different from the people who would be editing Abstract Wikipedia. And having them in the same project might be a bit troublesome. I mean, in the end, it also leads to the question, like, you know, we have those 800 or so projects in the wiki Movement. Do we really want them to be completely independent projects with their own structures and everything? Or do we want to bring them all a little bit together? And comments, Wikidata already brought them closer together. Abstract Wikipedia will certainly make it even tighter. And we have to answer all these questions about how these things relate to each other. Who has rights? Where? Which kind of community should be dealing with what kind of information? So having Abstract Wikipedia and Wiki functions sounds technically easy, but I'm not sure it's socially the right way. So one way or the other, this is a conversation we really have to have with the wider community. Lydia and I are computer scientists from the background and it shouldn't be the only computer scientist deciding this kind of questions of how our socio-technical systems are being set up and I really want to get to hear back from the community and help us with this decision. Usually I have a very strong opinion on things, but here I'm really clueless about what the best solution is. And I will need the input from the community and hear what do folks think where the best place is.

Nikki: So you say you're inviting people to think with you and work with you and making those decisions out. How should people do that? Are you going to reach out at some point? What are some avenues to connect or contribute?

Denny: Our plan is to have it the same week when you publish this podcast to also publish a newsletter for Abstract Wikipedia and to have this put also into the WikiData newsletter around the week or the next week. And just tell people look. Here are ideas, here are conversations. I don't think that it's the right time to make a decision because a lot of the things aren't there yet, they haven't been published. And people don't really understand how wiki functions work, how the interaction works, and so on. But I think it's a good place to have the first few ideas to see what are the arguments, and maybe hear a little bit out. The actual decision then we will have at some point in the future when people understand better what is actually up with the project and so on. But I think it's a good time now to start thinking about this and then later to actually come to this. So we won't make a decision now. It's just I think too early for that. But we can start actually thinking about it.

Nicole: Who do you think is going to be the community for Abstract Wikipedia and or Wiki functions? Do you only see like computer scientists being the community or developers being the community? And also do you see overlaps with the Wikidata community, for example?

Denny: So I very much hope that we won't have only computer scientists in Wikifunctions. They're quite the most natural thing. And I hope that a lot of computer scientists will join, but we're putting a lot of effort in making the whole system easy to contribute even if you're not a computer scientist. And there are some tasks where you definitely don't need a CS background. For example, for translating items, for writing documentation, for helping people to use functions. Because I mean, functions are useful for everyone even without a CS background. But there will be, I expect that there will be a certain bias in the community because there are some parts and because some of the ideas are more accessible to people with a computer science or math background. But nevertheless, you're putting a lot of effort into UX to make it widely. And so we hope that we can reach a bit of a wider community, but who knows? I mean, I'm supposed to be surprised with the results of these things sometimes because for example, Wikidata, we were always thinking this is something a very specific type of people will be using. And certainly that it would bias even more than the Wikipedias, but in the end, it didn't turn out this way. Lydia, do you want to add a bit on this?

Lydia: Yeah, I'm actually really happy about that. And also, it taught me again to trust in the people, right, and to have faith in the people because with Wikidata, we've seen a lot of people without any programming background or without any computer science background and so on. For example, Learn Sparkle to be able to do their editing work on Wikidata and create cool lists of paintings and things like that. And they learned this just for Wikidata because it is so useful and something they want to be a part of. And they really hope that at least a part of actual Wikipedia and Wikifunctions will manage to do the same on top of amazing UX, of course.

Denny: And this is exactly where the questions about where is Abstract Wikipedia actually located is so important because wiki functions sounds like something that might be more biased towards a certain demographic. And I think Abstract Wikipedia, even if it works technically, should be considered a potential failure if our diversity, for example, is actually worse than in the Wikipedias. The abstract Wikipedia content in the end will go to all the Wikipedias. So the contributors to abstract Wikipedia, I really think have to be as diverse as possible. And we have to increase diversity of people that are adding that. And therefore, it is so important to decide where exactly this content will live, which community together. Should they be growing up? Should they be working together? And this is something that we shouldn't do just only on technical base, but rather we have to think which kind of communities do we want, do we need in order to contribute to Abstract Wikipedia, and how are they different from WikiFunctions? If the diversity on contributors on WikiFunctions is lower than on Wikipedia, I would be a bit unhappy, but it wouldn't be like the end of the world. It wouldn't be terrible. But if the diversity of people contributing to the content of Abstract Wikipedia is lower than on the English Wikipedia, say. That would be devastating because this just means we're using this as a tool to push out a limited number of potential points of views to even more people than before. That's exactly what we don't want to achieve.

Nicole: And that's, I think, a huge challenge like for the whole Movement, right? We want input on the Movement strategy, on the universal code of conduct, on all those things and we ask most of the time the existing community or established communities, they are often sharing their, I would say maybe the loudest voices and so on. So how do we reach other corners of our Movement to contribute and make the input then and the communities more diverse? I think that's something that we are all really challenged by and that's also something that we address in some of our shows here and so on. So I think it's also good to think about this publicly so that people can join in and contribute to think about it. And I like also that you said it's of course, not only a technical question, but similarly, a social question or many social questions that we need to answer. I would like to draw our attention to one thing that we also spoke about in the in the prep work. And so for example, for originally Wikidata, it was a little bit, I'm not sure, maybe you can correct me if I stated it wrongly, kind of thought as the backbone of Wikipedia. We often explained it in the beginning, yeah, if you need to change the number of inhabitants of Berlin, then you have the central backbone Wikidata. And now it's really used for a whole bunch of other use cases that go way beyond the encyclopedia. And I would like to hear a little bit about how about abstract Wikipedia and wiki functions. What do you both also think? Also Lydia, both of you think what kind of an impact they will have beyond the Wikimedia projects.

Lydia: So for me, it was really great to see when I joined Wikimedia. There was Wikipedia that was talked about and maybe sometimes Commons, but anything beyond that was rarely ever mentioned, right? And I think with the introduction of Wikipedia, somehow the public perception has moved from “oh there's only this one big Wikipedia and English Wikipedia at that to hey, there's more. There's Commons, which is not just a backbone for Wikipedia, there's Wikipedia, which is not just a backbone for Wikipedia, but that are both projects in their own right and that have meaningful content that you can use completely independent of Wikipedia to build really cool stuff.” And I hope that with actually Wikipedia, we will see similar things and wiki functions. I think wiki functions is more likely to get us there, but Denny?

Denny: No, I fully agree. Wikifunctions, I think, has a chance of becoming an important infrastructural project for the whole ecosystem of free knowledge and open source. I like to imagine that in the future, if you're using, for example, spreadsheet or whatever, you will just use a function from Wikifunctions and pull them in and you will have this huge amount of library functions. If you use a programming language, you can just use Wikifunctions as a library in the backend and so on for developers. If you are doing anything with data on a website or whatever, you can just call a Wikifunctions function and pull it together and transform your data and use it in your way. And now this sounds all super CS thing like only for computer scientists and programmers. But you know, spreadsheets are actually used by a huge number of people. And there's a huge potential for impact, for example, to use the knowledge from Wikidata through the functions in Wikifunctions to tie this all together and make it interesting. Now, this is when I'm dreaming. How do we get to this place of actually doing that? That I don't know yet. We're having a hard time of actually describing what a function is, how you can use it, what are the kind of questions you can answer with functions and so on. And I am totally convinced that functions are useful for everyone. I said it before in the show. But I would totally understand if a lot of people don't see that, if they actually don't even understand what a function is, how they can use it and so on. So there's a lot of narrative that we have to somehow cover in order to make it clearer to the people how functions are useful for them. So I totally see that there's a potential for wiki functions to become at least as transformative as Wikidata is. But it's not a given that we will reach that point because we have to make sure that it actually works right, that we get the narrative right, that we can actually tie in with the world at large and that we can make this very important place in the ecosystem of free knowledge and find it for Wikifunctions. But we will have a few challenges on the way to get there.

Nikki: So can we look forward to some demonstrations at Wikimania this year?

Denny: Demonstrations, definitely. I won't promise more than that, even though my hopes are certainly bigger.

Nikki: Yeah, great. Well, thank you guys so much for taking us out of the darkness around these exciting new projects. And we wish you all the best of success and luck and come back on the show and report back when you have made some milestones or some breakthroughs. We'd love to have you back and hear more about it.

Nicole: Yes, please.

Lydia: Thanks for having us.

Denny: Thank you so much for having us here.

Nikki: All right, so that's a wrap of the 9th episode of WIKIMOVE. Thank you for listening.

Nicole: WIKIMOVE is a production of Wikimedia Deutschland and its Movement Strategy and Global Relations team. Eva Martin pulls all the strings in the background so we can create excellent content with our guests. Our music was composed and produced by Rory Gregory and is available under Creative Commons CC By SA on Wikimedia Commons. And thank you to our wonderful guests Lydia and Denny, it's been such a pleasure speaking with you.

Nikki: We release new episodes every month. Visit our WIKIMOVE meta page to react to our podcast, connect with other listeners, and subscribe to always be notified of our new episode releases. If you missed previous episodes, check them out on our meta page. You can also contact us directly via good old email, wikimove@wikimedia.de to continue this discussion and share your suggestions for future episodes. Goodbye, ciao. Ciao for now, tschussi.