Grants:Project/Rapid/Akeron/Wikiscan/Report

Other languages:
Report accepted
This report for a Rapid Grant approved in FY 2019-20 has been reviewed and accepted by the Wikimedia Foundation.
  • To read the approved grant submission describing the plan for this project, please visit Grants:Project/Rapid/Akeron/Wikiscan.
  • You may still comment on this report on its discussion page, or visit the discussion page to read the discussion about this report.
  • You are welcome to Email rapidgrants at wikimedia dot org at any time if you have questions or concerns about this report.

Note: this text is essentially an automatic translation of the French version which is the reference.

Goals edit

Did you meet your goals? Are you happy with how the project went?

1) Migrate wikiscan.org to new servers

Task Result Comments
Receiving and checking new servers OK There was a strange problem with CPU power reduction on a server that took a long time to diagnose but it was eventually fixed.
HTTPS support for all subdomains (wildcard) OK
Setting up and configuring servers OK For the configuration of new servers, I saved time by adapting scripts that I use for other servers. I took the opportunity to update the system of the old server.
Ensure compatibility with PHP 7 OK Notably replacement of list() = each(), count(null)...
Adapt the Wikiscan code to distribute wikis to two external databases OK
Move databases to new servers OK

2) Include all Wikimedia wikis and make the necessary optimizations

Task Result Comments
Optimize statistical queries of small wikis to minimize the number of queries OK

Requests on the replication of the Tools are quite slow, instead of reading the data day by day, they are now made month by month for small wikis.
The time for a complete update of a small wiki has decreased from about 45 mins to 3-9 mins. I have extended this system to larger wikis, to all those whose total revisions and logs are less than 750,000 (last column with day or month visible here).
The loss of data per day required further adaptations and generated a bug that was not fixed (loss of a graph (user) on the months in the calendar menu).

Optimize the servers that will have to manage more than 800 sites in parallel OK I have given priority to speed over security because the data are not critical and are regularly recalculated. For example for Mysql use of innodb_flush_log_log_at_trx_commit = 2 and transaction_isolation = READ-COMMITTED.
Add all Wikimedia wikis not currently included OK The total number of wikis has increased from 381 to 896.

3) Adapt and optimize the code to the recent restructuring of the Mediawiki tables.

Task Result Comments
Support for new comment and actor tables OK
Optimization of requests, especially for large wikis Partial This problem is more complex than expected: some updates sometimes fail when tool replication is slow, this concerns mainly very large wikis but also smaller wikis when there is high robot activity. It would take several restructurings to automatically adjust the size of queries according to the number of lines to read.
For now I have disabled the comment table joins for wikis that often remained blocked and some updates are less frequent.

Outcome edit

Please report on your original project targets.


Target outcome Achieved outcome Explanation
Wikis supported from 381 to 800+ 896 supported wikis All public wikis have been added, the list is visible from this page [1] (all wikis that had less than 100,000 editions were not included before, this corresponds to size 1 at the end of the table)


Learning edit

Projects do not always go according to plan. Sharing what you learned can help you and others plan similar projects in the future. Help the movement learn from your experience by answering the following questions:

  • What worked well?
    The reuse and adaptation of personal scripts to configure servers has been very useful. It allowed me to configure the two new servers in the same way and I was able to reuse them on the old one once the system update was done. The scripts use simple SSH and RSYNC commands, I no longer use Puppet.
  • What did not work so well?
    The change of calculation mode from day to month for small wikis required other adaptations, there is a bug not solved in the calendar menu with the monthly graph of users.
    The slowness of requests on Wikimedia replication will require more development to have an intelligent system that can adapt to the situation, the overall size of a wiki is not a sufficient indicator because there is sometimes a very large influx of automatic contributions on small wikis and I have the impression that it is even slower in this case.
    The project/grant management took me longer than expected but I was able to save a little time on the installation of the servers.
  • What would you do differently next time?
    Allow more time for project management.

Finances edit

Grant funds spent edit

Please describe how much grant money you spent for approved expenses, and tell us what you spent it on.

Total spent : 1780 €

Description Scheduled times Hours used
1 Migrating wikiscan.org to new servers 21 h 18 h
2 Include all Wikimedia wikis 10 h 9.5 h
3 Adapt and optimize the code to Mediawiki restructuring 10 h 11 h
4 Project management 7 h 15.5h
Total 48 h 54 h

Remaining funds edit

Do you have any remaining grant funds?

No.