[R-lang] Re: How to handle missing data when I try to log-transform my data

Roger Levy rlevy@ling.ucsd.edu
Wed Jun 23 20:25:33 PDT 2010


On Jun 23, 2010, at 12:10 PM, Hugo Quené wrote:

> BTW, typical RTs are not normally distributed, so that a 
> transformation is often necessary, using e.g. log(RT) or 1/RT as 
> your dependent variable.

I am going to say something that may be somewhat controversial but I hope that it spurs discussion.  I tend to advise *against* indiscriminately applying log or other transforms of RTs (or other continuously-distributed dependent variables, for that matter) prior to regression analysis in the name of compensating for non-normality.  In my mind, the most compelling reason to transform a variable is to get as close as possible to the functional form between the independent and dependent variables that is either theoretically relevant or is known to exist empirically.  Applying a non-linear transform such as log or inverse to a dependent variable can easily break such a functional form.  As a concrete example, in a controlled 2x2 experiment with the following condition-mean RTs:

     B1   B2
A1   400  600
A2   600  800

the log-transformed means would be around

     B1   B2
A1   2.60 2.78
A2   2.78 2.90

In order to assess whether there is an interaction between factors A and B, one has to answer the following question: is RT or log-RT the "correct" scale in which to interpret the effects?  My belief is that the answer to this question should be the chief guiding principle in determining whether to transform RT before regression analysis or ANOVA.

In terms of the negative consequences of deviation from normality on the inferences coming out of the analysis, there is probably some loss of power that comes with using linear regression on raw RT data, which are heavy-tailed and skewed.  But my own experience -- both with empirical data and with artificial simulations -- is that the loss of power is pretty minimal.  If anyone else has compelling simulations that demonstrate a substantial loss of statistical power, though, I'd be interested in seeing them!

Best

Roger

--

Roger Levy                      Email: rlevy@ling.ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy










More information about the ling-r-lang-L mailing list