Botopedia/Chatlog 11 September 2005

Session Start: Mon Sep 12 01:43:27 2005
Session Ident: #wikidatadiscussion
[01:43] * Now talking in #wikidatadiscussion
[01:43] * kornbluth.freenode.net sets mode: +ns
[01:43] * waerth has joined #wikidatadiscussion
[01:43] * Ucucha has joined #wikidatadiscussion
[01:43] * CyeZ sets mode: +oo Ucucha waerth
[01:43] <Ucucha> hoi :)
[01:43] <CyeZ> hoi hoi
[01:44] * daniel-fpc has joined #wikidatadiscussion
[01:44] <CyeZ> ik blijf hier wel rondhangen om na te lezen wat er besproken is. Maar ga nu toch eerst maar eens slapen.
[01:44] <CyeZ> inmiddels al bijna 2u hier
[01:44] <waerth> slaapze CyeZ
[01:44] <Ucucha> welterusten CyeZ
[01:44] <CyeZ> trusten
[01:44] * CyeZ is now known as CyeZzZz
[01:45] * Ausir has joined #wikidatadiscussion
[01:45] <Ucucha> hi Ausir
[01:45] * waerth goes to study the toilet for 10 minutes
[01:45] * Datrio has joined #wikidatadiscussion
[01:45] <Ucucha> hi Datrio
[01:45] * tsca has joined #wikidatadiscussion
[01:45] <Datrio> hey, hey
[01:46] * elian has joined #wikidatadiscussion
[01:46] * TOR_CNR has joined #wikidatadiscussion
[01:46] * yannf has joined #wikidatadiscussion
[01:47] <yannf> hi all
[01:47] <Ausir> hi
[01:47] <tsca> hi
[01:47] <Ucucha> hi tsca, elian, TOR_CNR, yannf :)
[01:47] <yannf> hi Ucucha 
[01:47] <TOR_CNR> hello
[01:48] <yannf> Ucucha, where are you from ? which project are you working on ?
[01:48] <Ucucha> nl.wikipedia and wikispecies a bit
[01:48] <yannf> ok
[01:48] <Ucucha> I think wikidata will be interesting for taxonomic information
[01:48] <yannf> yes, sure
[01:49] <TOR_CNR> isn't wikispecies doing that?
[01:49] <Ausir> well, it's not actually a discussion about wikidata
[01:50] <Ausir> (the Wikidata, the new software project)
[01:50] <elian> can someone tell us what it's about then?
[01:50] <Ausir> the channel name isn't very fortunate - it's about an international project for generating articles about towns of the world in wikipedias from statistical data
[01:51] <Ausir> many wikipedias already do it, but it's not very coordinated and they rarely share the data
[01:51] <Ausir> well, I suppose it will eventually be integrated into wikidata
[01:51] <Ausir> and just taken from the common database by all wikipedias
[01:51] <waerth> I know the name might not be optimally choosen
[01:51] <waerth> it is what I came up with in a flash
[01:51] <waerth> sorry
[01:51] <Ausir> #wikitowns , maybe?
[01:52] <waerth> want to move everyone there?
[01:52] <Ausir> well, not really
[01:52] <Ausir> since we're already here anyway
[01:52] <tsca> who cares how the channel is called
[01:52] <Ausir> tsca: the people who thought it's about wikidata :)
[01:52] <tsca> oh
[01:53] * waerth changes topic to 'the name of this channel is unfortunately choosen. this is a discussion about an international proct for generating articles about towns of the world in wikipedias from statistical data�'
[01:54] * waerth changes topic to 'The name of this channel is unfortunately choosen, the discussion is not about the wikidata project. This is a discussion about an international proct for generating articles about towns of the world in wikipedias from statistical data�'
[01:54] * waerth changes topic to 'The name of this channel is unfortunately choosen, the discussion is not about the wikidata project. This is a discussion about an international project for generating articles about towns of the world in wikipedias from statistical data�'
[01:54] <tsca> is this a one-time conference or is this channel here to stay?
[01:54] <waerth> that should be correct
[01:54] <waerth> one time channe;
[01:54] <waerth> l
[01:54] <waerth> as far as i am concerned
[01:54] <waerth> others might feel differently ;)
[01:55] <Ausir> well, this channel could be useful in the future as well
[01:55] <waerth> yes
[01:56] * WikiWichtel has joined #wikidatadiscussion
[01:56] <waerth> but I am afraid there is a nameconvention rule for wikipediachannels ausir
[01:56] <waerth> at least someone told me that once
[01:56] <Ausir> waerth: towns.wikipedia ?
[01:56] * WikiWichtel has left #wikidatadiscussion
[01:57] <waerth> something like that ausir
[01:57] <tsca> wikicities :-)
[01:57] <waerth> no ;)
[01:57] <waerth> it is a shame dannyisme isn't here yet
[02:00] <tsca> who chairs this meeting?
[02:00] <waerth> well I started it
[02:01] <waerth> so if no-one objects ;)
[02:01] <waerth> I have it one minute before one here
[02:02] <waerth> which makes it almost 20.00 cet
[02:02] <waerth> so there is a few issies in my opinion
[02:02] <waerth> where to get the data from
[02:02] <waerth> what data to consider reliable
[02:03] <Ausir> waerth: mostly data from government websites
[02:03] <waerth> where to store it in the projects (meta?) untill a software solution is found
[02:03] <waerth> yes ausir lets start on the first point ;)
[02:03] <Ausir> Polish Regioset website is not made by the government, IIRC, but also quite reliable, though
[02:04] <waerth> ok
[02:04] * Anthere has joined #wikidatadiscussion
[02:04] <Ausir> waerth: well, we could always send the data to commons in open office file format, since commons already accepts those
[02:04] <Ucucha> hi Anthere
[02:04] <waerth> yes 
[02:04] <Ausir> hi Anthere
[02:05] <waerth> it should be in a file format that is easily readable by robots
[02:05] <waerth> as they would mainly use it
[02:05] <elian> XML
[02:05] <elian> ?
[02:05] <Ausir> well, just a text file, probably, but with sxv extension, so that it's uploadable to commons
[02:05] <waerth> you mean with ; ?
[02:05] <elian> why not put the data on commons?
[02:05] <elian> and let the bots gather the data and create the articles
[02:05] <waerth> comma seperated files right ?
[02:05] <Ausir> it'd be good if we made a standardized format for those files
[02:06] <elian> seems better for updates
[02:06] <tsca> why upload when you can give links to where the data is on the Web?
[02:06] <waerth> because we would want the data with us
[02:06] <Ausir> tsca: but it's not always available in English, or even available at all
[02:06] <tsca> the collection of data can be (c), can't it?
[02:06] <Ausir> tsca: and the files could be compiled from various sources
[02:06] <Ausir> true
[02:06] <waerth> they are if you implement it as is
[02:07] <waerth> if you take it and put it into an article it is not
[02:07] <waerth> it are plain facts
[02:07] <waerth> otherwise all our country/municipality etc articles are copyrightviolations
[02:08] <tsca> so, no Commons.
[02:08] <tsca> just a collection of links
[02:08] <waerth> anyway I wanted to start the discussion with what kind of data do we want to use ?
[02:08] <Ausir> tsca: well, sometimes it's a pain in the ass to download and parse the files info into a decent format
[02:09] <Ucucha> should we only discuss towns here or also other data?
[02:09] <waerth> for some countries like the netherlands data from all villages up to the smallest ones is available online for free
[02:09] <Ausir> so if we had them already in a format that can be used easily by the bots, it'd be easier
[02:09] <waerth> for other countries you have to pay
[02:09] <Anthere> hi
[02:09] <Ausir> Ucucha: like what kind of data?
[02:09] <Ausir> waerth: well, it depends on the country
[02:09] <Ucucha> biological
[02:09] <Ucucha> there are databases about insects, for example
[02:09] <tsca> yeah sure, we can just as well generate xml files, aside from generating the articles
[02:09] <Ucucha> maybe about planetoids or so
[02:10] <Ausir> Ucucha: true
[02:10] <Ausir> we'll be generating articles about all Polish MPs out of the data from the parliament website at pl: :)
[02:10] <waerth> ok but lets focus on one area for now
[02:10] <waerth> otherwise it gets to splintered
[02:10] <Ausir> but let's focus on the towns firs
[02:11] <waerth> I feel that at meta we could start a page where we collect links per country
[02:11] <Ausir> yeah
[02:11] <waerth> were we can find the official data
[02:11] <waerth> with the emphasis on official
[02:11] * dittaeva has joined #wikidatadiscussion
[02:11] <Ucucha> hi dittaeva
[02:12] <waerth> and we would put notes nect to it per country
[02:12] <Ausir> waerth: or from other reliable institutions, not necessarily government
[02:12] <dittaeva> hi
[02:12] <waerth> ausir I am reluctant to take other sources
[02:12] <waerth> because you would need a good verification where they got their data from
[02:12] <dittaeva> are we early (I'm missing Eloquence)?
[02:13] <dittaeva> is there a log?
[02:13] * dittaeva changes topic to 'The discussion is not about the wikidata project. This is a discussion about an international project for generating articles about towns of the world in wikipedias from statistical data�'
[02:14] <Ucucha> dittaeva: I'm logging now
[02:14] <tsca> I can publish the log later
[02:14] * dittaeva changes topic to 'This channel is currently not about the wikidata project. This is a discussion about an international project for generating articles about towns of the world in wikipedias from statistical data�'
[02:14] <tsca> ok
[02:15] <dittaeva> yeah, did I miss much?
[02:15] <Ausir> waerth: well, Regioset is a Polish database of regional information, and it's not official, but rather reliable - anyway, we should have references for every bit of data exactly to what source it comes from
[02:18] * lode has joined #wikidatadiscussion
[02:19] <tsca> we need some focus in this discussion, it's going nowhere :-)
[02:19] <Ausir> and their database is based on official sources anyway, it's just those sources are not available on-line... so I'd say official sources should be preferred
[02:19] <Ausir> but not necessarily only official - just verifiable
[02:21] <dittaeva> is there a page on meta for the project?
[02:21] <Ausir> where's dannyisme? :(
[02:21] <Ausir> dittaeva: we're going to create one
[02:21] * gpvos has joined #wikidatadiscussion
[02:22] <Ausir> we're just thinking now about what should the project be like...
[02:22] <tsca> my suggestion is:
[02:23] <tsca> bot operators who create series of aricles, create xml files at the same time (for other bot operators) and put the files online
[02:23] <tsca> then the files are announced on some ml.
[02:23] <tsca> that is all...
[02:23] <Ausir> tsca: or the meta project page
[02:24] <Anthere> here is a message from waerth
[02:24] <Anthere> though he appears to be there, he is not
[02:24] <Anthere> he was disconnected
[02:25] <tsca> let's agree on the title of the page so that we can add it to our watchlists
[02:25] <Anthere> he phoned his internet company and they said they were doing maintenance tonight
[02:25] <Anthere> he just called me to tell me
[02:25] <Ucucha> that's a big pity :(
[02:25] <Anthere> so, he apologies very much, but he wont be with you
[02:25] <Anthere> could someone at least log the discussion for him ?
[02:25] <Ausir> well, discussing it is sort of pointless anyway
[02:26] <Ucucha> he was just announced to be the chair
[02:26] <Ausir> someone should {{be bold}} and just create the project page:)
[02:26] <Ucucha> someone willing to be the new chair?
[02:26] <Ausir> and then we'll discuss it and improve it
[02:27] <elian> Wikitowns?
[02:27] <Datrio> Wikibots
[02:27] <elian> no
[02:27] <Ausir> no
[02:27] <elian> bots are just the tools
[02:27] <Ucucha> not wikitowns
[02:27] <Datrio> well, it would be the best to eventually get the databases for everything
[02:27] <Datrio> not only towns
[02:27] <Ucucha> we shouldn't limit it to towns
[02:27] <Datrio> so not wikitowns
[02:27] <Ausir> Bot-generated articles
[02:27] <Datrio> heh
[02:27] <Datrio> Botopedia
[02:27] <elian> Botbase?
[02:27] <Ucucha> yeah
[02:27] <elian> Botopedia
[02:27] <elian> *lol*
[02:28] <Datrio> "A free encyclopedia that YOU can edit. If you're a bot, that is."
[02:28] <elian> my boyfriend wanted a LanguageBot.php for his bot, but brion refused
[02:28] <Ausir> [[Database sharing project]]
[02:29] <elian> Ausir: too boring
[02:29] <Ausir> elian: but more descriptive :)
[02:29] <Ucucha> Botbase seems a good idea :)
[02:29] <Ausir> and anti-botters are going to protest anyway :)
[02:30] <Ausir> just like they did ever since rambot doubled en: overnight
[02:30] <tsca> call it [[asdrfgu3a,c a45:"qad]], a good bot name, and create some redirs
[02:31] <lode> when it is on a different db/wp you loose the fact that people who see those stubs edit it to something more
[02:31] <tsca> OK, what can you expect on that page:
[02:31] <tsca> 1. ~8000 Italian municipalities
[02:31] <Datrio> [[$bot->new("Pedia")]]
[02:31] <Ausir> a list of all such projects
[02:32] <Ausir> in all wikipedias
[02:32] <tsca> 2. Swedish/Norwegian/Danish municupalites
[02:32] <Ausir> who to contact etc.
[02:32] <tsca> 3. French (?towns)
[02:32] <Ausir> yeah, towns 
[02:32] <tsca> what else?
[02:32] <Ausir> and villages
[02:32] <Ucucha> 4. Fish
[02:32] <Ausir> tsca: Rambot
[02:32] <Ucucha> from FishBase, possibly
[02:32] <tsca> Ucucha: Fish are problematic
[02:32] * waerth has quit IRC (No route to host�)
[02:32] <tsca> they have local names
[02:33] <Ucucha> yes, but many just have scientific names ;)
[02:33] <tsca> unless one finds a digital dictionary/glossary...
[02:33] <Ausir> tsca: and of course, Polish towns/communes/counties/voivodships
[02:33] * Kristof has joined #wikidatadiscussion
[02:33] <Ucucha> hoi Kristof
[02:33] <Kristof> hoi all
[02:33] <Datrio> don't forget Pokemons from en
[02:33] <Ucucha> tsca: probably, most species don't have common names at all
[02:33] <Ausir> and Czech disctricts
[02:34] <Ausir> Ucucha: but you can never be sure, at least for many languages
[02:34] <Ausir> Ucucha: towns are easier
[02:35] <Ucucha> they are
[02:35] <Ucucha> but we don't have to do easy things only
[02:35] <Ausir> but your other proposal, asteroids, was interesting :)
[02:35] <Ausir> maybe we could find a good NASA database for generating those?
[02:39] <Ucucha> http://nssdc.gsfc.nasa.gov/planetary/factsheet/asteroidfact.html
[02:39] <Ucucha> a few
[02:40] * henna has joined #wikidatadiscussion
[02:42] * lode has left #wikidatadiscussion
[02:42] <yannf> i think that, quite often, the data need processing before being useable by bots
[02:43] <yannf> so it's better doing that only once
[02:43] <yannf> not for every languages
[02:44] <tsca> sure
[02:44] <tsca> we should work out some standard format
[02:44] <yannf> yes, that's an important part of this project
[02:46] * GerardM has joined #wikidatadiscussion
[02:47] <yannf> i can help working on some conversion tools
[02:48] <yannf> and defining a common format
[02:48] * TOR_CNR has left #wikidatadiscussion
[02:49] * henna is now known as hennaNoInternet
[02:49] <tsca> yes, please publish the format proposal on meta so that we can discuss it 
[02:51] <yannf> actually, i am not a bot expert
[02:53] <yannf> but formating data to a defined format interests me
[02:54] <yannf> i think something like CSV would be appropriate
[02:54] <yannf> csv = comma separated values
[02:57] <tsca> either this or xml
[02:57] <tsca> but field names must be standardised
[02:58] <tsca> so that we don't need to modify out bots all the tile
[02:58] <tsca> *time
[02:58] <dittaeva> csv would perhaps be easier for people to get/make
[02:59] <dittaeva> but we are planning to put the data on commons once wikidata is ready and works for the purpose, right?
[03:00] <yannf> i would propose upload the files to commons
[03:00] <tsca> why not meta?
[03:01] <yannf> either is ok for me
[03:01] <tsca> it's just like the logfiles for interwiki bot operators; no need to put them on commons
[03:02] <yannf> ok
[03:05] <yannf> so first we need a list of sources
[03:05] <tsca> yeah
[03:06] <yannf> waerth already made one
[03:07] <dittaeva> so where is it?
[03:07] <dittaeva> :-)
[03:07] <yannf> http://nl.wikipedia.org/wiki/Gebruiker:Waerth/Handigelinksvooriedereen
[03:15] * tsca is now known as tsca_away
[03:33] * Ucucha has quit IRC ("Chatzilla 0.9.68.5 [Firefox 1.0.6/20050717]"�)
[04:05] * Kristof has quit IRC (Read error: 110 (Connection timed out)�)
[04:06] * dittaeva has quit IRC (Read error: 110 (Connection timed out)�)
[04:06] * dittaeva has joined #wikidatadiscussion
[04:07] <dittaeva> http://meta.wikimedia.org/wiki/Botopedia
[04:07] * dittaeva changes topic to 'This channel is currently not about the wikidata project. This is a discussion about an international project for generating articles about towns of the world in wikipedias from statistical data. http://meta.wikimedia.org/wiki/Botopedia�'
[04:29] * dittaev1 has joined #wikidatadiscussion
[04:32] * dittaeva has quit IRC (Read error: 110 (Connection timed out)�)
[04:44] * gpvos has left #wikidatadiscussion
[04:53] * tsca_away is now known as tsca
[05:04] * Datrio has quit IRC
[05:28] * dittaev1 is now known as dittaeva
[05:32] <Ausir> dittaeva: I edited it a bit
[05:32] <Ausir> dittaeva: I made a list of existing projects like that
[05:33] * tsca has quit IRC (Read error: 110 (Connection timed out)�)
[05:34] <dittaeva> nice
[05:37] <dittaeva> Ausir: you're the one who has been operating the bot at the polish wikipedia? Where did you get the norwegian communes data?
[05:38] <Ausir> tsca is operating the bot
[05:38] <Ausir> I'm mostly working on the data - but tsca got the scandinavian data himself
[05:38] <dittaeva> ok, too bad he went off zzzzz (I suppose)
[05:38] <Ausir> from the norwegian authorities website
[05:39] <Ausir> some statistical office
[05:39] <Ausir> from here: http://www.ssb.no/ and here: http://www.kommunenokkelen.no/
[05:39] <Ausir> see: http://pl.wikipedia.org/wiki/Finn%C3%B8y
[05:40] <Ausir> a sample article
[05:40] <dittaeva> thanks!
[05:40] <Ausir> one has statistics about population etc.
[05:41] <Ausir> and the other has addresses and stuff like names of the mayors etc.
[05:41] <dittaeva> cool
[05:41] <dittaeva> real cool
[05:41] <dittaeva> awesome
[05:41] <dittaeva> :-)
[05:42] <Ausir> we now have more data on Norwegian towns than no: :P
[05:42] <dittaeva> you should probably have appended them all with the equivalent of "kommune" in polish cause most names also refer to things that are not kommunes
[05:43] <Ausir> like what?
[05:43] <dittaeva> and there's a lot more at ssb.no
[05:43] <Ausir> well, the equivalent of kommune in Polish is gmina
[05:43] <Ausir> dittaeva: what else is there?
[05:44] <Ausir> oh, I can see there's a lot of it there
[05:44] <Ausir> although I don't think much of it is needed in a wikipedia article
[05:46] <dittaeva> yeah I just looked at it, you'd probably have to work a lot with the framework to make articles from the rest of the data
[05:46] <dittaeva> f. ex. [[pl:Leikanger]] is not only a kommune it is also a "township"
[05:47] <dittaeva> and Vik is a kommune and a lot of other places
[05:48] <Ausir> well, at pl: it says in Polish "Leikanger is a town and a commune in Norway..."
[05:48] <dittaeva> ok, :-)
[05:49] <Ausir> although if we had enough content or stats for the towns themselves, we could have separate articles for "Leikanger commune" and "Leikanger", I suppose
[05:50] <Ausir> like we have for Polish towns and communes
[05:50] <dittaeva> hm, it might be in there somewhere
[05:50] <dittaeva> at least there is inhabitant statistics for them, but I suppose you need something more
[05:51] <Ausir> well, we don't need to separate it for now :)
[05:52] <dittaeva> does tsca have the same username on wikipedia too, should I contact him on pl.?
[05:53] <Ausir> yes
[05:54] <Ausir> he's also active as sv: and da:
[05:54] <Ausir> *at
[05:54] <Ausir> he lives in Denmark
[05:55] <dittaeva> cool
[05:59] <Ausir> I suppose it's fairly easy to translate articles from one Scandinavian Wikipedia to another?
[06:00] <dittaeva> yes, its regularily done
[06:04] <Ausir> tsca is online at #pl.wikipedia
[06:20] * dittaeva has quit IRC (Read error: 110 (Connection timed out)�)
[07:00] * GerardM has quit IRC ("Chatzilla 0.9.68a [Firefox 1.0.6/20050716]"�)
[07:00] * Ausir has quit IRC (Read error: 104 (Connection reset by peer)�)
[07:00] * Ausir has joined #wikidatadiscussion