Grants:Project/Hydriz/Balchivist 2.0/Final



Welcome to this project's final report! This report shares the outcomes, impact and learnings from the grantee's project.

Part 1: The Project edit

Summary edit

The new Balchivist 2.0 application has been developed with the core features to allow for users to browse and download Wikimedia datasets. However, the project did not fully complete as there were some proposed features (such as the watchlist feature) which have not yet been implemented, and the development of the application will continue even after the grant ends.

Project Goals edit

The goals of the project as described in the original proposal are as follows:

The main objective of this proposed project is to improve the searchability of the datasets published by Wikimedia, so that it is easier for new and existing users to find and download Wikimedia datasets. Wikimedia users will also be able to benefit from an improved API for working with the datasets compared to the existing system which is only limited to the database dumps, overall streamlining the process of using Wikimedia datasets. Finally, improved reliability in the archiving infrastructure provides researchers with the assurance that the datasets will be archived timely and in full.

The project has not deviated from the original aim, which is to allow Wikimedia datasets to be easily searchable. However, due to the large scope of the project and changing circumstances, this aim was only partially achieved. Through Balchivist 2.0, Wikimedia datasets are now easily browsable within a single portal, and the search functionality will be added into the system in the future. The system will also progressively support more Wikimedia datasets and also include OpenStreetMap datasets.

Secondly, the project adopted an "API-first" development philosophy. Hence, the API to work with the Wikimedia datasets are now available as part of Balchivist 2.0. At present, the API supports a basic RESTful interface for getting metadata on the Wikimedia datasets, and will also be progressively enhanced to provide more information that may be useful for users of the datasets.

Finally, the goal of improving the reliability in the archiving infrastructure was also partially achieved, as it was mostly an internal requirement. The existing Balchivist 1.0 program already achieves this to some extent, and was thus deprioritized during the development of the project.

Overall, the project will continue to be developed beyond the closure of this grant. Many of the features and functionality that was originally intended to be implemented during the project will be instead implemented progressively over the coming months.

Project Impact edit

Important: The Wikimedia Foundation is no longer collecting Global Metrics for Project Grants. We are currently updating our pages to remove legacy references, but please ignore any that you encounter until we finish.

Targets edit

  1. In the first column of the table below, please copy and paste the measures you selected to help you evaluate your project's success (see the Project Impact section of your proposal). Please use one row for each measure. If you set a numeric target for the measure, please include the number.
  2. In the second column, describe your project's actual results. If you set a numeric target for the measure, please report numerically in this column. Otherwise, write a brief sentence summarizing your output or outcome for this measure.
  3. In the third column, you have the option to provide further explanation as needed. You may also add additional explanation below this table.
Planned measure of success
(include numeric target, if applicable)
Actual result Explanation
All critical tasks completed This goal has been achieved. The critical tasks of the project were to develop a system that allows users to easily browse for Wikimedia datasets. This goal was achieved as the final product can now be used by end-users, although the system might be unstable due to possible improvements in the future.
50% code coverage This goal was not achieved. The scope of the project was significantly underestimated at the start, which resulted in the need to prioritize other more important tasks. As the project stablizes, this goal will progressively be achieved in the future.
Reduction of incidences to <1% This goal has been achieved. Based on preliminary tests, the system did not report any issues in the uploading process to the Internet Archive.
Include at least 5 new datasets This goal is partially achieved. Currently, only the Wikimedia database dumps and the Wikidata entity dumps are supported in the system. However, it would be trivial to add support for more datasets in the system as it has been designed to be easily extendible.


Story edit

Looking back over your whole project, what did you achieve? Tell us the story of your achievements, your results, your outcomes. Focus on inspiring moments, tough challenges, interesting anecdotes or anything that highlights the outcomes of your project. Imagine that you are sharing with a friend about the achievements that matter most to you in your project.

  • This should not be a list of what you did. You will be asked to provide that later in the Methods and Activities section.
  • Consider your original goals as you write your project's story, but don't let them limit you. Your project may have important outcomes you weren't expecting. Please focus on the impact that you believe matters most.

The project was originally intended to be completed from June 2021 to December 2021, over a period of about 7 months. However, the project was delayed primarily due to 2 unforeseen reasons:

  1. Certain personal issues occurred during the project, which resulted in a lack of time to commit to finishing the project.
  2. The scope of the project was significantly underestimated when crafting the initial proposal.

The scope of the project was underestimated as the original proposal was crafted based on how Balchivist 1.0 was developed, which was just a Python application to archive the datasets. Balchivist 1.0 was developed over 3 months, and when estimating the duration of this project, an additional 3 to 4 months were added for the development of the frontend and backend components of the web application.

Hence, during the development of the project, there was significant difficulty in trying to build the web application with the features that were originally intended. Thus, even with the approved extension of the project, it was decided that only the core features would be built and the additional features will be introduced progressively after the grant project has concluded. The goal was thus adjusted to build a minimum viable product, then progressively add new features to the application while gaining users.

Survey(s) edit

If you used surveys to evaluate the success of your project, please provide a link(s) in this section, then briefly summarize your survey results in your own words. Include three interesting outputs or outcomes that the survey revealed.

No surveys were conducted for this project.

Other edit

Is there another way you would prefer to communicate the actual results of your project, as you understand them? You can do that here!

The project source code are hosted on GitHub: https://github.com/balchivist. Additionally, the API is documented using OpenAPI and displayed using Swagger at https://balchivist.github.io/api-docs/.

The tracking of the project is also done on GitHub projects: https://github.com/orgs/balchivist/projects/1

Methods and activities edit

Please provide a list of the main methods and activities through which you completed your project.

  • The Balchivist 2.0 frontend and backend web application have been developed with the core features and published on GitHub.
  • The API has been documented and publicly viewable.
  • The application is currently available on https://dumpsstaging.wmcloud.org (URL will change to https://dumps.wmcloud.org once ready for final deployment)

Project resources edit

Please provide links to all public, online documents and other artifacts that you created during the course of this project. Even if you have linked to them elsewhere in this report, this section serves as a centralized archive for everything you created during your project. Examples include: meeting notes, participant lists, photos or graphics uploaded to Wikimedia Commons, template messages sent to participants, wiki pages, social media (Facebook groups, Twitter accounts), datasets, surveys, questionnaires, code repositories... If possible, include a brief summary with each link.

Learning edit

The best thing about trying something new is that you learn from it. We want to follow in your footsteps and learn along with you, and we want to know that you took enough risks in your project to have learned something really interesting! Think about what recommendations you have for others who may follow in your footsteps, and use the below sections to describe what worked and what didn’t.

What worked well edit

What did you try that was successful and you'd recommend others do? To help spread successful strategies so that they can be of use to others in the movement, rather than writing lots of text here, we'd like you to share your finding in the form of a link to a learning pattern.

What didn’t work edit

What did you try that you learned didn't work? What would you think about doing differently in the future? Please list these as short bullet points.

  • The project was too ambitious and included a lot of scope that could not be realistically achieved within the project timeframe. Developing a new project from scratch is hard, and I should have allocated more time and resources to building up the project.
  • Don't strive for perfect right from the start. I regretted trying to design the project in a way that can cover many different scenarios, which cost a lot of time thinking about the best way to structure the project instead of actual implementation. It would have been easier to build a minimum viable product first before tweaking to make and implement new features.

Other recommendations edit

If you have additional recommendations or reflections that don’t fit into the above sections, please list them here.

  • If building a new project from scratch, it would be good to consult a senior engineer to get a better estimate of how long the development should take.

Next steps and opportunities edit

Are there opportunities for future growth of this project, or new areas you have uncovered in the course of this grant that could be fruitful for more exploration (either by yourself, or others)? What ideas or suggestions do you have for future projects based on the work you’ve completed? Please list these as short bullet points.

  • The project currently only has the basic features implemented. The future improvements and extensions are tracked on the GitHub project linked above and will be progressively worked on in the coming months.

Part 2: The Grant edit

Finances edit

Actual spending edit

Please copy and paste the completed table from your project finances page. Check that you’ve listed the actual expenditures compared with what was originally planned. If there are differences between the planned and actual use of funds, please use the column provided to explain them.

Expense Approved amount Actual funds spent Difference
Work hours $16,000 $16,000 $0
Total $16,000 $16,000 $0


Remaining funds edit

Do you have any unspent funds from the grant?

Please answer yes or no. If yes, list the amount you did not use and explain why.

  • No

If you have unspent funds, they must be returned to WMF. Please see the instructions for returning unspent funds and indicate here if this is still in progress, or if this is already completed:

  • (not applicable)

Documentation edit

Did you send documentation of all expenses paid with grant funds to grantsadmin wikimedia.org, according to the guidelines here?

Please answer yes or no. If no, include an explanation.

  • No, all the grant funds are for time spent on the project.

Confirmation of project status edit

Did you comply with the requirements specified by WMF in the grant agreement?

Please answer yes or no.

  • Yes

Is your project completed?

Please answer yes or no.

  • No

Grantee reflection edit

We’d love to hear any thoughts you have on what this project has meant to you, or how the experience of being a grantee has gone overall. Is there something that surprised you, or that you particularly enjoyed, or that you’ll do differently going forward as a result of the Project Grant experience? Please share it here!

This grant project was a great learning experience for me, especially as a first-time grantee and the first time I have proposed a project. There were many factors that threatened to derail the project, but it was particularly pleasant to have the support of the grant officers and the Wikimedia Foundation to keep this project going. Although it was unfortunate that some of the features are not yet implemented, but the work will continue even after the grant ends to achieve the goals originally laid out in the grant proposal.