Research and Decision Science/Resources
Documentation and materials for new data practitioners
editThis is meant to be a compilation of various resources (onboarding guides, links to presentations, etc.) that would be useful for someone joining the Foundation as an analyst or data scientist and learning about the many datasets and systems we have. This page shouldn't be educating the new hire on those things but should be a springboard for learning about those things – in other words, the only content should be an index.
- Policies
- Guides
- Data and systems
- Data Glossary: definitions for core and essential metrics
- Product Analytics Glossary: definitions for terms related to product analytics
- Data Dictionary: covers some important datasets (needs to be deprecated because that information is now available in our Data Catalog)
- Datasets in Data Lake
- MediaWiki database layout
- Tools
- Superset (WMF internal dashboards and reports)
- Obtaining access to Superset/Turnilo, with explanation of LDAP/Developer Account terminology
- Turnilo (WMF internal tool for pivoting and exploring data)
- Querying: Hive, Presto, Spark, Differences between SQL engines
- Google Search Console access
- Superset (WMF internal dashboards and reports)
- Analytics instrumentation
- Lifecycle of an Event
- Event* Disambiguates between the many event services, and links to their more extensive documentation.
- Metrics Platform (a suite of tools that helps Wikimedia teams make data-driven decisions about product experiences)
- Event Schemas and Guidelines
- Hadoop Event Ingestion Lifecycle
- Event Sanitization
- Matomo/Piwik (JavaScript tracking client used for wikimediafoundation.org and other smaller-scale sites)
- Best practices
- Other
- Wikimedia GitLab for version control
Miscellaneous
editGlobal Data and Insights: links to some of our projects that measure Movement trends, demographics, feelings of safety, and development.