Differential privacy/Docs/DP error metrics

Applied differential privacy is a new field. Within that field, there are many different (and at times discordant) ways of measuring the success or failure of a given data release. This page is intended to provide a general purpose list of common differential privacy error metrics, along with how they're calculated.

These ideas and mathematical derivations are not set in stone; they are just examples of how we at WMF have calculated error in some of our ongoing DP projects.

Mathematical definitions edit

For the purposes of this page, we define the following variables:

  •   is the input dataset.
  •   is a mechanism that takes in a dataset and does some non-differentially private transformation on it.
  •   is a mechanism that takes in a dataset and does some differentially private transformation on it.
  •   as the set of values making up the true output of an aggregation (i.e. a groupby-count, a groupby-sum). Each element of the set is represented by  .
  •   as the set of values making up the differentially private output of an aggregation (the true output with some random noise added). Each element of the set is represented by  , and values less than 0 are cast to 0.
  •   is the publication threshold. If a specific element  , it is publicly released.

Therefore,   and  .

Note: some error formulas here use the "|" notation to denote set cardinality, and some use the "|" notation to denote absolute value.

Counts edit

It is useful to ground discussions of error metrics in an understanding of the underlying number of elements that would be released given a specific value of  .

True count edit

We can calculate the true count for a dataset as follows:

 

This is the number of data points that would be published at a threshold   if no differential privacy were used, and serves as a kind of ground truth.

Published count edit

We can calculate the published count for a dataset as follows:

 

This is the number of data points that would be published at a threshold   while using differential privacy.

Errors edit

We can also calculate different sorts of error on an element-by-element basis by comparing differentially private values to true values. These error metrics can then be looked at in the aggregate.

Some useful questions to ask if you've calculated these metrics for all elements of a DP data release:

  • What is the median error of published elements?
  • What is the mean error of published elements?
  • What fraction of published elements are below some error threshold?

Absolute error edit

We can calculate the absolute error for an element  , corresponding to element  , as:

 

The absolute error is calculated in the units of  , and is the actual difference between a DP value and its corresponding true value.

Relative error edit

We can calculate the relative error for an element  , corresponding to element  , as:

 

The relative error is calculated in percentages, and is the percent difference between a DP value and its corresponding true value.

Other metrics edit

Besides counts and errors, there are other, more specific metrics of error that apply to differential privacy.

Bias edit

Differential privacy often involves transforming input data through normalization and the removal of outliers. These transformations can have effects on the published data as a whole. Bias seeks to quantify those effects, and can be calculated as:

 

Bias adds up the entire set of elements in the DP release and in the true release, subtracts them from each other, and divides by the true sum, yielding a percentage. If the percentage is negative, the sum of the DP release is less than that of the true data; if the percentage is positive, the sum of the DP release is greater than that of the true data. The closer it is to 0, the closer the two sums are to each other.

Drop rate edit

The drop rate is relatively analogous to the idea of a false negative rate in machine learning/statistics. In the context of DP, a data point that is dropped is one where the true count is greater than 0, but the noisy count is less than or equal to the release threshold (i.e.  ). The drop rate is the extension of this element-by-element principle to the entire dataset, and can be calculated as

 

 

The drop rate seeks to quantify the data that would be released without noise, but is instead not included in a differentially private data release, and can either look at the count or the sum of the elements that are dropped.

Spurious rate edit

The spurious rate is relatively analogous to the idea of a false positive rate in machine learning/statistics. In the context of DP, a data point that is spurious is one where the true count is 0, but the noisy count is greater than the release threshold (i.e.  ). The spurious rate is the extension of this element-by-element principle to the entire dataset, and can be calculated as

 

 

The spurious rate seeks to quantify the data that wouldn't be released without noise, but is instead included in a differentially private data release, and can either look at the count or the sum of the elements that are dropped. It is the data that is, in effect, "hallucinated" in the data release because of random noise added.

Maintenance of ordering edit

To come