Research:Revision scoring as a service/Revscoring library

Key features edit

Scorer abstraction edit

...todo...

Feature extraction garden edit

When supporting an ecosystem with multiple models that use similar features, it's important that features are (1) well defined and (2) don't duplicate work. #Feature dependencies depicts a set of example features, their dependencies on datasources and other features. By using a dependency injection strategy for specifying and actualizing relationships between features/datasources, we can allow for easy development of new features based on old features and datasources. We can also minimize the work that the system will need to perform when building feature sets for a large set of different models.

 
Feature dependencies. Dependencies for features and datasources are presented. Datasources can depend on other datasources. Features can depend on both datasources and other features.

Example Makefile style dependency expression for MisspellingRaioDifferential

WordsAdded: RevisionDiff
	<parse revision diff> \
	return count

MisspellingsAdded: RevisionDiff Dictionary
	<parse revision diff and use Dictionary to find misspellings> \
	return count

PreviousWords: ParsedPreviousRevisionText
	<parse non-markup content> \
	return count

PreviousMisspellings: ParsedPreviousRevisionText Dictionary
	<parse non-markup content and use Dictionary to find misspellings> \
	return count

MisspellingRaioDifferential: WordsAdded, MisspellingsAdded, PreviousWords, PreviousMisspellings
	return (MisspellingsAdded/WordsAdded) / \
	       ((MisspellingsAdded/WordsAdded)+(PreviousMisspellings/PreviousWords))

Model files edit

...todo...