Wikipédia abstraite/Premier moteur d’évaluation

Où et comment devrait être déployé notre premier moteur d’évaluation et comment devrait-il être architecturé ?

Éventuellement, le wiki des fonctions vise à avoir un écosystème en bonne santé de moteurs d’évaluation interopérables. Nous fournirons une description bien définie de ce qu’un moteur d’évaluation DOIT et DEVRAIT faire, et nous espérons qu’il y aura des moteurs d’évaluation fonctionnant dans de nombreux contextes différents — des petits moteurs d’évaluation embarqués vivant dans des applications natives qui veulent utiliser certaines fonctions, en passant par des moteurs d’évaluation fonctionnant comme applications mobiles ou serveurs locaux, jusqu’aux moteurs d’évaluation fonctionnant dans le cloud ou sur un réseau distribué pair-à-pair. La Fondation Wikimédia aura besoin d’exécuter un ou plusieurs moteurs d’évaluation afin de fournir les fonctionnalités sur lesquelles les projets Wikimédia pourront compter. Nous penserons à ceux-ci et a de nombreux autres une autre fois.

Pour l’instant nous devons nous figurer où vivra notre premier moteurs d’évaluation. Alors qu’il serait formidable d’avoir tous ceux-ci éventuellement mis en œuvre, nous devons être réaliste concernant nos ressources et devrions seulement à en avoir d’abord un seul d’entre eux. Il semble y avoir trois principales possibilités à considérer pour la premières mies en œuvre :

Intégré dans l’extension WikiLambda.
Un service autonome.
Nous évaluons tout dans le navigateur du lecteur.

Il nous faut décider ceci avant que nous commencions à travailler sur la Phase δ, où nous mettrons en œuvre ce moteur d’évaluation. Ici nous discutons des avantages et inconvénients des trois principales approches et discutons également des options possibles au sein des approches.

Principales options

Extension WikiLambda

The WikiLambda extension itself not only provides the wiki extension to edit and maintain the ZObjects on wiki, but also embeds an evaluation engine. The extension exposes the evaluation engine through an API to the world and to its internal uses.

Avantages :

Deployment of the WikiLambda extension is easy as we don’t have to deploy a secondary stack for the evaluation engine (even if we don’t go this route now, it would make sense to have an embedded evaluation engine in the extension in order to allow easy deployment, which also helps with getting people to contribute to the code base easier).
We already started the creation of objects with certain capabilities in the PHP code, and can continue building on top of that.
It is by far the fastest and easiest way to access all the other ZObjects that one would need to evaluate a given call.
As we want the extension to call to contributed functions living on wiki to provide some of its experiences, having it immediately available at hand is by far the most convenient solution with the least overhead.
Can reuse everything that already exists in MediaWiki to provide an API, e.g. authentication, tokens, etc.

Inconvénients :

From the point of security, it is high risk, as this would in deployment run on the main cluster. If someone manages to break out from the embedded system, they would have direct access to the production databases, as the wiki has access to respective credentials.
From the point of security, it can also be an attack vector for an intentional or even unintentional DOS attack as expensive evaluations are being run and locking production instances of MediaWiki.
It is hard to sandbox the evaluation inside the extension as it all lives in the same PHP code.
Need to implement monitoring, time and memory constraints all within MediaWiki.

Service autonome

We develop a standalone service which can be called via REST to evaluate function calls. Le service est utilisé par des clients internes mais aussi extrnes

Avantages :

Can develop everything from scratch. No constraints through MediaWiki, the service can be as lean as the stack we build on allows us.
The service can be sandboxed, monitored, and time- and memory-budgeted as a whole.
Can build a holistic and uniform approach to caching within the evaluation engine.
Can scale easily by running more services in more boxes. Even allows an architecture where a single server call may decide to call other servers and so parallelizes parts of the call, something which is hard to implement in the other two approaches discussed here.

Inconvénients :

Have to develop everything from scratch. Plenty of potential for bikeshedding regarding implementation language, stack, and deployment approach.
Also means much more opportunities to introduce new bugs.
Need to figure out how to start a new production-strength service for launch (but can run on WMFCloud or other infrastructure until then).
Need to read lots from the DB, so important to be able to do that fast. Mais l'accès est en lecture seule
Incurs cost on anyone who deploys and wants to work on the extension, as they need to set up the service as well.

Navigateur

The function evaluations do not have to run on Wikimedia infrastructure at all, but might all run in the user’s browsers.

Avantages :

Facile à déployer.
Hard to cause an adverse effect on the serving infrastructure.
No need to constrain resources for readers, as it is their own resources.

Inconvénients :

Have to be very careful not to allow attacks against readers.
Will likely lead to a sluggish experience in many clients, leading to make this accessible only to users with better equipment, which is counter to our goals.
The evaluation engine might need to read a lot from the DB, with long round-trip times since the browser has to access the wiki.
No way to call the code server-side (unless we develop it in parallel on say node so that it can run in the browser and in the backend, but that’s basically developing and maintaining two solutions, even if they share some code).
Opportunity for caching is severely limited, especially across users, where huge benefits are expected.
Not friendly for search engine.
Unable to process further (such as {{Str left|{{#lambda:xxx}}|123}}).

Architecture pour le service autonome

We are currently leaning towards developing the first evaluation engine as a standalone service. Deploying it as part of MediaWiki is viewed as having too many potential security and performance risks, as does running it in the browser. We hope to revisit the decision about the browser at a later point.

Ideally the evaluation engine can be spun up many times and orchestrated as a stateless service. The service is 'read-only', modulo caching and monitoring.

The evaluation engine exposes an API over HTTP that receives a ZObject as the input and returns a ZObject as the output. The returned ZObject is the result of evaluating the incoming ZObject until a fixpoint is reached. This is most interesting for function calls.

The service will dramatically benefit from caching. There are (at least) three levels of caching regarding the evaluation engine:

HTTP cache: caching the whole API call to the evaluation engine at the level of HTTP calls. If the same call is made, the same result is returned.
intermediate results cache: when the evaluation engine evaluates a function call, it should check whether it has the given function call in its cache, and return that result instead. This would benefit from all evaluation engines spun up in parallel could use the same cache.
content cache: references are looked up in the wiki. This might also be pushed from the wiki, particularly if we have a shared cache. This can be part of the intermediate results cache.

Eventually we will need to allow for REST calls to other Wikimedia and external services (e.g. Wikidata, Weather report, etc.), as well as to changing things such as the current time, as well as random values.

What about race conditions when functions are being edited while a function is running?

Would evaluation engines be homogenous or diverse? I.e. could there be an evaluation engine that can use a TPU, and others that don't, and how do we route queries? A simple idea: do all evaluation engines need to understand all programming languages, or can we make them lighter by having dedicated evaluation engines where some can run Python, others JavaScript, etc.

Quelques technologies à considérer :

Ébauche pour les étapes du premier moteur d’évaluation

L'ordre des étapes suivantes n'est pas nécessairement celui indiqué. Built-ins go first, but most of the other are rather independent of each other.

Fonctions intégrées

Voir phabricator:T260321.

Very small API — maybe even just start with a single evaluate method and that's it.
Pas de cache.
Implements some (or all) of the first set of builtins as described by phabricator:T261474.
Everything that is not a function call gets returned as is (maybe canonicalized).
Example: "Test" gets evaluated to "Test".
Everything that is a function call gets evaluated until fixpoint.
Example: if(true, "Yes", "No") gets evaluated to "Yes".
If the arguments are function calls, they also get evaluated, until fixpoint.
Example: head(tail(["1", "2", "3"])) evaluates to "2".

Note that all of this already requires that references are being resolved to the wiki. The function head is a function in the wiki and needs to be looked up, its implementations need to be gathered (we only have built-in implementations for now), an implementation needs to be chosen, and evaluated.

The first version of choosing an implementation is to choose any. Ce point doit être affiné ultérieurement.

Isolation dans un bac à sable et suivi de l’activité

Implement all necessary steps for sandboxing and monitor the evaluation engines (Phabricator:T261470).

A lot of this should be available through Kubernetes. But we probably would also like to keep statistics such as “how often was a function called?”, “how often did we have cache misses”, etc.

Mise en cache intermédiaire

Introduce caching of intermediate results.

Before evaluating a function call, look up if this is in the cache and return that instead.

Allow the cache to be reset.

Modifier invalide le cache

Editing an existing ZObject invalidates the whole cache. Nous allons lentement améliorer ce comportement. (Ideally an edit only invalidates those caches which need to be invalidated, but that will become complex quickly. We will get to it later.)

Requires intermediate caching.

Mises en œuvre de la composition

Permettre les mises en œuvre sous forme de compositions (Phabricator:T261468).

Mise en œuvre en JavaScript

Permettre les mises en œuvre en Javascript.

Wikidata

Possibilité de faire appel à Wikidata.

Commons

Possibilité de faire appel aux fichiers dans Commons.

Autre wiki de Wikimédia

Possibilité de faire appel aux autres projets de Wikimédia.

Appel du Web

Possibilité de faire un appel externe au web en général.

Fonctions non fonctionnelles

Notamment « heure actuelle » et « aléatoire ».