[R-lang] Re: How to handle missing data when I try to log-transform my data
Roger Levy
rlevy@ling.ucsd.edu
Wed Jun 23 20:25:33 PDT 2010
On Jun 23, 2010, at 12:10 PM, Hugo Quené wrote:
> BTW, typical RTs are not normally distributed, so that a
> transformation is often necessary, using e.g. log(RT) or 1/RT as
> your dependent variable.
I am going to say something that may be somewhat controversial but I hope that it spurs discussion. I tend to advise *against* indiscriminately applying log or other transforms of RTs (or other continuously-distributed dependent variables, for that matter) prior to regression analysis in the name of compensating for non-normality. In my mind, the most compelling reason to transform a variable is to get as close as possible to the functional form between the independent and dependent variables that is either theoretically relevant or is known to exist empirically. Applying a non-linear transform such as log or inverse to a dependent variable can easily break such a functional form. As a concrete example, in a controlled 2x2 experiment with the following condition-mean RTs:
B1 B2
A1 400 600
A2 600 800
the log-transformed means would be around
B1 B2
A1 2.60 2.78
A2 2.78 2.90
In order to assess whether there is an interaction between factors A and B, one has to answer the following question: is RT or log-RT the "correct" scale in which to interpret the effects? My belief is that the answer to this question should be the chief guiding principle in determining whether to transform RT before regression analysis or ANOVA.
In terms of the negative consequences of deviation from normality on the inferences coming out of the analysis, there is probably some loss of power that comes with using linear regression on raw RT data, which are heavy-tailed and skewed. But my own experience -- both with empirical data and with artificial simulations -- is that the loss of power is pretty minimal. If anyone else has compelling simulations that demonstrate a substantial loss of statistical power, though, I'd be interested in seeing them!
Best
Roger
--
Roger Levy Email: rlevy@ling.ucsd.edu
Assistant Professor Phone: 858-534-7219
Department of Linguistics Fax: 858-534-4789
UC San Diego Web: http://ling.ucsd.edu/~rlevy
More information about the ling-r-lang-L
mailing list