Thoughts on language integration

Below are some thoughts on what we would like to see in WikiMedia's language intergration. See proposal on language integration for a concrete proposal on how to bring this about in the software.

Linking edit

Interlanguage linking is IMHO the most important part of integration. At present this doesn't require the languages to run the same software (though it helps) or be on the same server, or even be managed by the same project organization. On the minus side, maintenance is a bit of a pain, as it requires manual inspection and insertion of links.

  • Possibilities for the future:
    • A maintenance tool that sucks up the links and points out inconsistencies. (Inconsistencies aren't necessarily wrong -- if articles are divided up differently on different wikis, it might be 100% correct for two English articles to point to one Dutch article, or one French article to point to a German disambiguation page. Humans need to look at these things.)
    • Interlanguage stubs -- articles that don't exist in one language should still be able to link out to existing articles on the same topic in the other languages, and this should be noted in the wiki interface.
    • Articles that don't exist in one language should be able to be linked to from the other languages, so long as it doesn't distract from the ability to easily find articles that do exist
    • Using a separate table instead of/in addition to in-text magic links
    • Articles that are linked and differ a lot in length. There would need to be a "correction factor"; 1 for en, 2 for de, 2.5 for fr, as different language tend to say the same in different length.

I think the last feature would be almost meaningless, due to different organization of material, language issues etc. Taw 15:23 Dec 7, 2002 (UTC)

Common user space edit

A single user account space has been suggested. This could be nice, especially if cookie issues are resolved. There may be some naming conflicts?

Technical issues (see also Single login):

  • Need to add a 'common' database to store this stuff
  • Internal user ID numbers would have to be changed where recorded to avoid conflict
  • Options per-language or all over...?
    • Technical options (edit box size etc.) would be the same for all wikipedias.
  • User pages should be per-language; no entry lists/redirects to other languages ?

Combined recentchanges and other interface issues edit

The user interface localization contains some bits that deal with content; ie links to particular pages; the names of namespaces; the names of log pages. These should be more cleanly extracted so that you can select the user interface language separately from the content language if desired.
Once that's done, increased mixing of content from different languages becomes more plausible such as combining Recentchanges lists for multiple languages and meta.
On combining things... It may be ultimately simplest to combine everything into one database. The various page tables (cur, old, archive, etc) could include another field to indicate language (by code, or by numeric index?) as they now do for namespaces; thus a single-language view can simply drop in "AND lang='XX'" to various DB queries to limit itself, or "AND lang IN ('XX','YY'...)" to show an arbitrary selection of languages. Of course, as with the namespaces, this would need to be carefully tested. Leave one out, and you're accidentally deleting or rename pages in other languages.

Common upload space? edit

Many photos and diagrams can be used across languages.

Pros:

  • Everyone sees the files being uploaded, so can make use in articles in their own languages
  • Don't need to keep a dozen copies of those maps

Cons:

  • Increased chance of naming conflicts?
  • A language-specific mirror/CD-ROM version needs to more carefully sort out which bits they need.
    • Can be resolved easily; the images know which articles link to them.
  • Disagreements over what is appropriate uploads?
    • Uploads would still be on the same server.
      • But in some countries, admins could be held legally responsible for not deleting things they knew that were illegal.
        • Then only the US residents should decide what to delete.
          • But US residents may not be able to handle deletion requests in 20 some languages. What if some image has suspicious copyright status? Who would investigate that?
  • Possibly complicates deletion procedure/ privilage
  • Good file name in one language is completely incomprehensible in some others.
    • One file could have multiple names?
  • Maps and diagrams are usually not necessarily language-free.
    • But many photos may be.

Technical issues:

  • Need to make used links known across languages so don't have false orphan results

etc

See also WikiImages.org and en:User talk:Clutch/mod_wiki for some discussion of how we might implement a combined database.

Altern.ative edit

  • Keep localized uploads, accessible from the same language with [[image:]], from others with [[lang:image:]] with lang=en, de,...
  • new "virtual language" for uploads that are shared by all languages.

URLs edit

That's a whole nother can o' worms.

Ultimately there are three sets of URLs:

  • Canonical URLs - the form the software links to itself with
    • Presently, http://{langcode}.wikipedia.org/wiki/Title (see exceptions below)
  • Behind-the-scenes URLs - what the software actually looks at. The /wiki/Title URLs actually turn into these on the server, and things like edit links use this expanded form:
    • http://{langcode}.wikipedia.org/w/wiki.phtml?title=Title&additional=stuff for anything else
  • Convenience URLs - pretty names
    • All redirects from .com to .org
    • Alternate domains like www.wikipedia.de, etc
    • presently, redirect of bare wikipedia.org to English encyclopedia

where present exceptions to the language codes are 'www' instead of 'en' for the English encyclopedia section, and 'meta' for the project meta-discussion, which is officially a combined multilingual space.

Possible changes are:

  • Use en instead of www for English

And/or:

  • Move everything to wikipedia.org or www.wikipedia.org. Canonical URL might turn into:
    • http://wikipedia.org/{langcode}/Title
  • With full language integration, the above would probably become something like this behind the scenes:
    • http://wikipedia.org/w/wiki.phtml?lang={langcode}&title=Title

For example, one might see:

in place of:

I like this personally as it looks cleaner, dumping the redundant 'wiki' from the URL, and puts the "Wikipedia" project name in front of the language, emphasizing that it's a single project which covers many languages. (All old URLs can and would be preserved as redirects; could be considered convenience URLs.)

As far as the frightful question of what to do with the main bare URL, with or without 'www', see What to do with www.wikipedia.org.

Comments edit

There is currently a site that implements multilanguage along some lines discussed here: http://www.logilogi.org There, however help is needed to make it run faster.

An interlanguage link feature for images would be a nice idea, in my opinion.

  • Duplicity of images would be avoided
  • The editors/users in one language could rename the image while it still links to the same original file
  • They can choose between using local version of an image or cross-link to an image in other language
  • If somebody updates the source image, all other Wikis that use that image will also stay updated
  • Any violation of copyright can be swiftly dealt at Wiki where the source image resides without having to examine each Wiki that use the "bad" image separately.

In short, something similar to what we currently have with the interlanguage linking of article.

-- Vyasa 11:14, 13 Nov 2003 (UTC)



Old comments, resolves issues edit

"Increased chance of naming conflicts?" for images -- we really need to lay down the law on this one -- far too many people upload images named "Washington1.jpg" -- is it the state, the preseident, the city, the monument, or the face on mount rushmore? in this day and age of 256-letter names, we can afford to go crazy!

I think I'm the poster child for long descriptive names -- en:Image:California map showing Contra Costa County.png etc.

I think that there should be separate language areas for image uploads (such as maps which are labeled in a particular language), as well as one place (meta?) where language-independent images (such as horses and cheetahs) can be uploaded. It should be possible to include an image from any language, as well as link to a page in another language in the body of the article. -phma

that would be a good solution. Alternatively, have a convention that any image with labelling text have a prefix or suffix of the language code, eg "en-map of europe.png" or "map of europe-en.png". of course we'd need to bludgeon people into following conventions like that. But with two upload spaces we'd need to keep checking too, since the software can't tell if an image has text or not. -- tarquin
Personally I don't see much point to that... using descriptive names gives you natural disambiguators; you'll have "map of europe.png", "carte d'europe.png", "yooropa no chizu.png", etc. --Brion VIBBER
Yes, but this isn't very future-proof. If we needed to split them up into separate language areas at a later date then this would have to be done manually. Also it won't necessarily always be that clear (particularly if short names are still being used). -- HappyDog

My opinion is that user accounts should be integrated, some better interwiki links management facilities should be available, and that's about as much integration as I'd want to see. Putting everything into one database shouldn't be even thought of as long as we use MySQL. There is no need to change anything with images right now imho.

And one more thing - URLs need to be changed to www.wikipedia.<langcode> as local search engines don't index different sites by default (this is the case at least in Poland, and it has something to do with licensing of web engines afaik). Taw 15:23 Dec 7, 2002 (UTC)

Those aren't language codes at the end, they're country codes. This is why we can't make that switch in general, languages and countries are different. Even national languages don't generally have the same codes. That doesn't mean that http://www.wikipedia.pl/ (or whatever it is) can't be used as a convenience redirect, however. — Toby Bartels 14:15 Dec 11, 2002 (UTC)

Do not exactly belong to this page maybe...but...
Some people complain that the list of language is taking too much room at the top of the homepage. Hopefully, one day, they will also take a *lot* of room at the top of numerous pages ! Would it be possible that display of language is made part of a user option ? By default, all languages would be available. However, a registered user could choose among the list the languages he wishes for display. Or, alternatively, some languages could be displayed as they are right now, and others could be made available in an easily acce ssible list (so a user could know there is an equivalent page in this language, but not have his screen full of international links)
What do you think ? anthere

It would be nice to have an option to display the language-links as "long" (like now) or "short", e.g. only as 2-letter country-codes

I think it's a great idea that should be implemented in the future, probably when the encyclopedia databases are integrated into one. If I'm reading the encyclopedia, I personally gain no advatange from articles in Dutch, German, Japanese... -- Stephen Gilbert 00:47 Dec 24, 2002 (UTC)