User:MPopov (WMF)/Notes/Log transform in regression

Linear regression edit

The multiple linear regression model is often written as:

 

Where   is the independent, identically distributed error term and is usually assumed to come from a Normal distribution with mean 0 and standard deviation  .

The quantity   is  , the expected value of   given the values of  .

After the intercept  , the coefficients   represent changes in the expected value of   from unit changes in the corresponding  .

For example, suppose we had 2 predictors   and we held the value of   constant while increasing   by 1 unit:

 

Binary (0/1) indicators are especially nice because when   then the corresponding   doesn't contribute to expected value of  , but when   then y changes by the corresponding  . So if   is "received new UI/UX" then the corresponding   represents the effect of receiving the new UI/UX on whatever metric/KPI   is.

Log transformation edit

Sometimes it's necessary to transform either the response/outcome variable   and/or some of the   regressors/predictors  .

If we apply the transformation to the dependent variable   we end up with:

 
where   is usually the natural log (  aka  ). Changes in regressors/predictors correspond to changes in expected value of the response/outcome on the log scale, which can pose difficulties. To talk about changes on the original scale, we have to apply the inverse of the natural log:  . Then

 

Notice how exponent of the summed terms becomes product of exponentiated terms, turning what used to be additive changes in expected value to multiplicative changes in expected value.

To see the effect of a unit increase in  , we repeat the example from earlier but this time with a log-transformed  :

 

Log-Log transformation edit

If we apply the transformation to both the dependent and the independent variable, we end up with:

 
Suppose we increase   by one percent (1%):

 

Therefore a 1% increase in   corresponds to a   multiplicative change, which will be less than 1 (a decrease) if   if negative and will be more than 1.01 (an increase) if   is positive. The closer to 0 that   is, the closer to 1 (no change)   will be.