Grants:Project/Rapid/Hjfocs/soweego 1.1/Report

Report accepted
This report for a Rapid Grant approved in FY 2019-20 has been reviewed and accepted by the Wikimedia Foundation.
  • To read the approved grant submission describing the plan for this project, please visit Grants:Project/Rapid/Hjfocs/soweego 1.1.
  • You may still comment on this report on its discussion page, or visit the discussion page to read the discussion about this report.
  • You are welcome to Email rapidgrants at wikimedia dot org at any time if you have questions or concerns about this report.

Goals

edit

Did you meet your goals? Are you happy with how the project went?

Absolutely yes. The main goal was to get the highest-quality Wikidata links, in the form of identifier statements. This yielded an improvement in terms of average precision over all target catalogs. The table below displays a comparative performance evaluation between the best standalone algorithm and ensemble ones. All values are an average with respect to each target dataset.

The higher the better.

Algorithm Precision Recall F1
multi-layer perceptron .916 .934 .925
soft voting .919 .930 .924
gated .922 .926 .924
hard voting .914 .934 .923
stacked .923 .924 .923

See chapter 7 of Tupini07's MSc thesis for a detailed explanation.[1]

Outcome

edit

Please report on your original project targets.


Target outcome Achieved outcome Explanation
Release of soweego version 1.1 Ready to go We will announce the release as soon as the last minor comments of pull request #372[2] are addressed
Documentation Tupini07's MSc thesis The resource is publicly available[1]
Developers engagement 5 forks,[3] feedback raised by third-parties,[4] 60 stars[5] We managed to attract more potential contributors


Learning

edit

Projects do not always go according to plan. Sharing what you learned can help you and others plan similar projects in the future. Help the movement learn from your experience by answering the following questions:

  • What worked well?
The project scope and time span were very small, which allowed us to effectively address specific activities.
  • What did not work so well?
The proposal involved experimental research, so there was a risk that the results would not meet our forecast.
We were actually expecting slightly better performances.
  • What would you do differently next time?
Perhaps focus on more low-hanging fruits, rather than go for more experiments.

Finances

edit

Grant funds spent

edit

Please describe how much grant money you spent for approved expenses, and tell us what you spent it on.

We spent the whole budget to sustain the worktime. The task breakdown follows.

Task Timeline
Make SLP and MLP compatible with the current hyperparameter grid search September 2
Add decision trees as a classifier September 2
Explore different ways in which we can ensemble the current classifiers September 13
Super-confident predictions September 25
Add logistic regression as a classifier September 30
Evaluate performance of ensemble methods October 1

Remaining funds

edit

Do you have any remaining grant funds?

No.

References

edit