Abstract Wikipedia/Community:2023-03-17
This is only a community comment document created by Dušan Kreheľ. |
Wikifunction are evolving, but Abstract Wikipedia is now only in theory. What is the relationship between WikipediaAbstract and Wikifunction?
- If both are autonomous, will Wikifunction also be used on local Wikipedias (like the Super-Template)?
- If Wikifunction are created for Abstract Wikipedia, shouldn't Abstract Wikipedia have some real practical code progress (not just theoretical) when Wikifunction have such progress? There is a possibility that the investment of time and money in Wikifunction may cause a later dictatorship against Abstract Wikipedia, if it were necessary to "uncomfortably" modify Wikifunction, due to rather unplanned requirements of Abstract Wikipedia.
- Will it be possible to use Wikifunction also on normal local wikipedias?
It is being discussed how to link with Wikidata to other projects, but I have not read an analysis of the pros and cons of these existing technologies and the impact for Abstract Wikipedia. Now the Consideration is "how are they beneficial to us", but I think it would like this view: "We have these technologies (Wikidata, …), with such pluses and minuses. Shall we use them? Shall we include them in the concept?".
We have a list of Natural language generation. Are any used? If so, it will load the data with a domino effect according to the request? Put simply, they are mainly for English. How will they deal with the amount of data. Or it will be so in the end that it will be best to write a tailor-made solution (as well as due to suitability for the technical structure of Wikimedia).
How technical is Wikidata (Export Wikidata is ~122GB)? How difficult will Abstract Wikipedia be? How much will Wikidata become more difficult? Will this affect the implementation of Abstract Wikipedia?
There is a way to create Abstract Wikipedia as a data-modular, flexible, maybe a bit "networked" software and choose one; and not create a "monster with one huge table of data"? (Rhetorical question)
Wikidata:
- Plus: Technical user-like (as JSON export of item)
- Minor minus: size of dumps.
- More extensive items:
- They load slower on weaker hardware.
- Unclear. (I.e. less appealing to use.)
- What is e.g. this wikidata:Lexeme:L82668? Feel free to use it in Abstract Wikipedia?
If we have some results (I mean Wikifunction), why isn't the content on Abstract_Wikipedia updated about it?
I understand that the creation/rendering of the page will go through the process: Abstract Wikipedia → Wiki code → Output HTML. This means that the rendering will increase. And this will only encourage technical "cache-mania" and people will have a slower live edit Preview article.
Why does he want to use non-standardized wiki code? Shouldn't it be standardized first if we mean it professionally and in the end it will be beneficial for everyone?
And why is the theoretical concept of Abstract Wikipedia (i.e. I think for Jupiter) done on a simpler example. It should also be at the worst (as opposed) if we want to have a good analysis and not be surprised later. Abstract Wikipedia would be useful for creating pages for municipalities (a separate administrative unit), where the structure of the page is approximately or largely similar. Also suitable for easy editing of statistics. Analyze for example Vyšný Slavkov and Nižný Slavkov.
How will localized in-text references be implemented? Or even, eventually, Wikipedia References will be created.
Test concept in source code
editEven when work started on this project, I didn't see anywhere a pseudo-demo or some toy codes testing the resulting concept. So here is one of my attempts:
<?php
// as "Wikidata"
$list_dni_tyzdna=[
"items" => [
"pondelok", // as index
"utorok",
"streda",
"stvrtok",
"piatok",
"sobota",
"nedela"
],
"count" => "all"
];
// as "Wiktionary"
$item_pondelok=[
"genus_sk" => "man",
"pattern_sk" => "otcov",
"case" => [
"N" => "pondelok"
],
];
$item_utorok=[
"genus_sk" => "man",
"pattern_sk" => "otcov",
"case" => [
"N" => "utorok"
],
];
$item_tyzden=[
"genus_sk" => "man",
"pattern_sk" => "stroj",
"case" => [
"N" => "týždeň",
"G" => "týždňa"
],
"lists" => [
"dni_tyzdna"
]
];
$item_den=array(
"genus_sk" => "man",
"pattern_sk" => "duby",
"case" => [
"N" => "deň"
],
);
// as "function", SUPER FUNCION
function call_function($function_data)
{
$raw_function=$function_data['function'];
$raw_arguments=$function_data;
return $raw_function($raw_arguments);
}
function declension_sk($arguments)
{
$word=$arguments["function_arguments"]["item"];
$variable="item_$word";
// Only Nominativ implemented.
global $$variable;
return $$variable['case']['N'];
}
function association_declension_sk($arguments)
{
$word=$arguments["function_arguments"]["item"];
// subject
$root="item_$word";
global $$root;
$output=$$root['case']['N'];
// object
if(isset($arguments["function_arguments"]['associations']))
foreach($arguments["function_arguments"]['associations'] as $association)
{
switch($association['type'])
{
case "post":
$ass_type=$association['type'];
$ass_term=$association['term'];
$object="item_$ass_term";
global $$object;
$output.=" ".$$object['case']['G'];
break;
case "fixed_order":
$list=$association['list'];
$subject=$association['subject'];
$parameters=[
"function" => "fixed_order_sk",
"function_arguments" => [
"list" => $list,
"object" => $subject
]
];
$result=call_function($parameters);
if($result !== false)
$output="$result $output";
break;
}
}
return $output;
}
function associotion_sk($arguments)
{
$subject=$arguments["function_arguments"]["item"];
if(is_string($subject))
$subject=array($subject);
if(count($subject) > 2)
return "sú"; // are
return "je"; // is
}
function fixed_order_sk($arguments)
{
$list=$arguments["function_arguments"]["list"];
$object=$arguments["function_arguments"]["object"];
$variable="list_$list";
global $$variable;
$index=array_search($object, $$variable['items']);
if($index === false)
return false;
$index++; // 0-index to human-index
return "$index.";
}
function function_definicion($arguments)
{
// DEV: More items in $subject/$object is not implemented.
$subject=$arguments["function_arguments"]["subject"];
$object=$arguments["function_arguments"]["object"];
$elements=[
'SUBJECT',
'ASSOCIATION',
'OBJECT',
];
$lang="sk";
/*
* Constants for the sk languages
*/
// Warning: Another correct variant is "OBJECT ASSOCIATION SUBJECT".
$format_sk="SUBJECT ASSOCIATION OBJECT.";
$format_sk_SUBJECT=[
"function" => "declension_sk",
"function_arguments" => [
"item" => $subject,
"case" => "N" // Nominative (shorted)
]
];
$format_sk_ASSOCIATION=[
"function" => "associotion_sk",
"function_arguments" => [
"item" => $subject
]
];
$format_sk_OBJECT=[
"function" => "association_declension_sk",
"function_arguments" => [
"item" => $object,
"case" => "N", // Nominative (shorted)
"associations" => [
[
/*
* "Type" in Slovak:
* - before
* - after
* - relative declension before
* - relative declension after
*/
"type" => "post",
"term" => "tyzden"
],
[
"type" => "fixed_order",
"list" => "dni_tyzdna",
"subject" => $subject
]
]
]
];
/*
* Building the text
*/
$output=$format_sk;
// "simple" rendering
foreach($elements as $element)
{
$parameter_variable="format_${lang}_$element";
$element_output=call_function($$parameter_variable);
$output=str_replace($element, $element_output, $output);
}
return $output;
}
$frame_pondelok=[
"function" => "function_definicion",
"function_arguments" => [
'subject' => "pondelok",
'object' => "den",
]
];
$frame_utorok=[
"function" => "function_definicion",
"function_arguments" => [
'subject' => "utorok",
'object' => "den",
]
];
/* Pozn.: Nutné spracúvať po vetách, nakoľko bodka može byť za skratkou (príklad: napr.). */
function fixing_centence_sk($text)
{
// First char is uppper.
$first_char=mb_substr($text, 0, 1);
if(strcmp($first_char, "„") != 0)
$text=mb_strtoupper($first_char).mb_substr($text, 1);
// Odstránenie bodky pred koncom vety.
$wrong_end=".“.";
$lengths_wrong_end=strlen($wrong_end);
if(substr_compare($text, $wrong_end, -$lengths_wrong_end, $lengths_wrong_end) == 0)
$text=substr($text, 0, -$lengths_wrong_end);
return $text;
}
function call_frame($frame)
{
$variable_name="frame_$frame";
global $$variable_name;
$output=call_function($$variable_name);
return fixing_centence_sk($output);
}
/* main() */
echo call_frame("pondelok")."\n";
echo call_frame("utorok")."\n";
// Notice: Ako by sa mal upravit program, ak je term v plurali?
--Dušan Kreheľ (talk) 23:05, 17 March 2023 (UTC)
Hi Dušan. Thanks for sharing your thoughts and questions in detail. To begin, I'll give some brief answers and links to more details, so that you can read and ask any followup questions.
- What is the relationship between Abstract Wikipedia and Wikifunctions?
- In very simplified terms: Wikifunctions is both a standalone project (it will be useful by itself), and also a required component of Abstract Wikipedia.
- In complex technical detail, here are some details about how Abstract Wikipedia might work. The details are still being discussed because the technical possibilities and technical constraints become clearer as work progresses on Wikifunctions, and the linguistic needs are discussed and refined: Abstract_Wikipedia/Template_Language_for_Wikifunctions and Abstract_Wikipedia/Natural_language_generation_system_architecture_proposal
- If both are autonomous, will Wikifunctions also be used on local Wikipedias (like the Super-Template)? -- and -- Will it be possible to use Wikifunctions also on normal local wikipedias?
- Yes, see the various planned options at: Abstract_Wikipedia/Components
- If Wikifunction are created for Abstract Wikipedia, shouldn't Abstract Wikipedia have some real practical code progress (not just theoretical) when Wikifunction have such progress?
- Yes. You can see some functions relevant to natural language generation on the Wikifunctions Beta in the relevant section.
- We have a list of Natural language generation. Are any used? If so, will it load the data with a domino effect according to the request?
- Some of this is covered in this edition of the newsletter: Abstract_Wikipedia/Updates/2022-11-04
- How technical is Wikidata (Export Wikidata is ~122GB)? How difficult will Abstract Wikipedia be? How much will Wikidata become more difficult? Will this affect the implementation of Abstract Wikipedia?
- The total size of Wikidata shouldn’t be a constraint for Wikifunctions or Abstract Wikipedia, as we would only work on what is relevant in a given context at a time.
- If we have some results (I mean Wikifunction), why isn't the content on Abstract_Wikipedia updated about it?
- We make frequent updates on the Updates page, and relevant updates on the project page.
- Why does he want to use non-standardized wiki code? Shouldn't it be standardized first if we mean it professionally and in the end it will be beneficial for everyone?
- That’s a long standing desire: mw:Parsoid/Parser_Unification#Related_docs - We didn’t want to depend on that.
- And why is the theoretical concept of Abstract Wikipedia (i.e. I think for Jupiter) done on a simpler example?
- Partially because simple examples are easier to understand for everyone. Partially because Abstract Wikipedia will inherently need to use relatively simple/clear sentences in order to work well in the largest number of languages. And partially because we didn’t want to presuppose certain solutions, as we would need to do for more complex examples.
- How will localized in-text references be implemented? Or even, eventually, Wikipedia References will be created.
- The same as any other generated content. That is one advantage of the results from the WikiCite project, which has created a lot of citable objects in Wikidata.
- Re: your "Test concept in source code"
- Thanks for the example! I expect a similar example should be implementable in the WIkifunctions Beta too. Do you want to try?
I hope that helps. (Note: Some of the more technical answers were drafted by my colleagues.). Best wishes, Quiddity (WMF) (talk) 21:13, 14 July 2023 (UTC)
- @Quiddity (WMF) and collective:
- 3) I can't find an example. But I devoted very little time to the attempt.
- 10) Not in "free time". And what should I actually implement (rhetorical question)? In my code there is also content generation for the final text (the frame call), which is more than just covering the work of the WikiFunction project.
- Abstract_Wikipedia/Components#F3:_Implicit_article_creation
- why "magic words"? They are just special semantic words. Why add "another dimension" or fancy "occultism" or some "religion" or something else.
- F6 and F7 is brand building. For experts, ok, but can you call it for ordinary people with some terms that are closer/more suitable for them? Why not F6 as {LINK} (assuming Wikidata are standard, which they practically are)?
- Special terms are whole words. Wouldn't it be good to have abbreviated alias variants (rhetorical question)? So there could be less text in the source code.
- https://wikifunctions.beta.wmflabs.org/wiki/Special:CreateZObject
- This editing is not for the "ordinary person".
- Why is it not in “labelized” representation (rhetorical question)?
- For whom is the "project" intended, i.e. what is the technical and language level of a beginner, advanced, expert? What group does it want to have? What group should use it?
- https://wikifunctions.beta.wmflabs.org/wiki/Special:EvaluateFunctionCall
- This probably doesn't work.
- https://wikifunctions.beta.wmflabs.org/wiki/en/Z10224
- A simple operation (positive integer) takes about 1 CPU second? 5 seconds total time? ... How long will it realistically take to generate the content of a document (rhetorical question)?
- Speed, speed, even Parsoid speed…
- At the time of publishing this post, it is not working.
- It will be possible to read user settings, i.e. j. key-value settings (such as time zone)? They will probably need to be kept separate from the main ones.
- Will it be possible to change the language generator or will it be possible to switch between several?
- There is also a risk that "someone will screw up your project", because it will be so complicated for the user that he will choose an easier way and write it as normal (plain) text or have the text generated from his entered facts (through primitive assigned, like "he's take take", "he's got XY") via some text generator or AI (Bard can do it, tested).
- Please implement a "project" or have the attitude that it is an addition to the existing basic solution (0-ring are "blogging software", wikidata inter-pagelinks and the wikipage language; 1-ring is dark theme) and whatever a perfect solution (and possibly also a complex one), people can use your solution in different ways, according to their needs (e.g. they can only use some part). For example global functions could be implemented now.
- And please make it modular. I don't want to be a hostage to some plan or someone's idea that if something is not satisfactory then it cannot be changed, because it can be e.g. "a lot of minute" resources, human work, time, money or "you don't want to because the content needs to be changed".
- How could it be evaluated/resp. I see:
- If I understand correctly, 3 things should be created:
- WikiFunction – i.e. j. "virtual machine" for calculations
- Abstract Wikipedia
- Content generation - a la "template"
- Data repository - similar to Wikidata (but CC BY-SA license)
- As a whole, it is only in development. It is not possible to create any content in an article with a "project" - not even in the form of a simple "forwarding of data". Wikifunction somehow works, but so far only for "playing". WikiFunction are part of the "Wikipedia monster":
- link to Wikidata
- It probably doesn't have a size up to a few megabytes (if you want to test it yourself).
- You can (un)test it at wikifunctions.beta.wmflabs.org:
- global login does not work
- working with it is for experts or developers.
- Some users will certainly be pleased that the "global template" will be implemented. Dušan Kreheľ (talk) 17:24, 17 July 2023 (UTC)