[R-lang] Re: How to handle missing data when I try to log-transform my data
bogartz@psych.umass.edu
bogartz@psych.umass.edu
Thu Jun 24 07:18:38 PDT 2010
Alternatively one might consider fitting the fitting the exgaussian
using the gamlss package.
Quoting Reinhold Kliegl <kliegl@uni-potsdam.de>:
> In response to the transformation issue raised by Roger (different RE?)
>
> We just had a paper come out that addresses this issue in the
> context of LMMs (Kliegl, Masson, & Richter, Visual Cognition; data
> and R scripts are available, too]. We report analyses in the
> original metric as well as log and reciprocal transforms in the
> context of frequency effects in masked repetition priming.
> (1) Various assessments (boxcox, LMM residual plots) suggested that
> the reciprocal transfromation leads to an acceptable distribution of
> errors; original and log values clearly do not. We have replicated
> this result. (It does not hold for all types of RT experiments!)
> (2) The transformations had very little (negligible) consequences
> for inferences about fixed effects [main effects and interactions];
> this is in support of Roger's and much of the communities intuition
> (3) Here is the important point to consider: The transformations had
> a very strong effect on the correlation of intercept, priming
> effect, and frequency effect (random effect correlations in LMM).
> Some of them even changed its sign. So depending on the
> transformation you infer a positive, non-significant, or negative
> correlation between mean RT and frequency effect. (The correlation
> between priming and frequency effect stayed positive for all three
> variants.)
>
> So what do you choose when it matters? Our recommendation was that,
> IN THE ABSENCE OF A THEORY OR MODEL, it may be best to choose the
> transformation that is in agreement with the statistical model we
> want to apply. So if we use an LMM, we also have to use the
> reciprocal transformation. Basically, in this case, the
> homogeneity of the error distribution along the dimension of the
> dependent variable, is used as a criterion for equidistant units on
> the DV. (We do not want the "yardstick" to become less precise when
> we measure large values.)
>
> Now if your theory or model or conviction forces you to stay with a
> different metric (say the original one), not all is lost either. You
> could choose a GLMM, for example with a gamma distribution, that
> generates the typical skew observed for RTs, that is one where
> variances increase with RT. Or you use a Bayesian framework for
> distributions outside the exponential family (e.g., Rouder et al.,
> 2005, PsychonomicBullRev, argue for a Weibull).
>
> At the outset and in the absence of a theory/model, I consider the
> original metric, the log transform, and the reciprocal transform
> (and possibly others) as equally plausible starting points. The
> metric we are most familiar with has been handed down by cultural
> evolution. If I subscribe to a linear model at this point, a
> defensible starting point to me is to ensure equal standard
> deviations along the measurement scale. I am happy to stand
> corrected.
>
> Reinhold Kliegl
>
>
> On 24.06.2010, at 05:25, Roger Levy wrote:
>
>>
>> On Jun 23, 2010, at 12:10 PM, Hugo Quené wrote:
>>
>>> BTW, typical RTs are not normally distributed, so that a
>>> transformation is often necessary, using e.g. log(RT) or 1/RT as
>>> your dependent variable.
>>
>> I am going to say something that may be somewhat controversial but
>> I hope that it spurs discussion. I tend to advise *against*
>> indiscriminately applying log or other transforms of RTs (or other
>> continuously-distributed dependent variables, for that matter)
>> prior to regression analysis in the name of compensating for non-
>> normality. In my mind, the most compelling reason to transform a
>> variable is to get as close as possible to the functional form
>> between the independent and dependent variables that is either
>> theoretically relevant or is known to exist empirically. Applying
>> a non-linear transform such as log or inverse to a dependent
>> variable can easily break such a functional form. As a concrete
>> example, in a controlled 2x2 experiment with the following
>> condition-mean RTs:
>>
>> B1 B2
>> A1 400 600
>> A2 600 800
>>
>> the log-transformed means would be around
>>
>> B1 B2
>> A1 2.60 2.78
>> A2 2.78 2.90
>>
>> In order to assess whether there is an interaction between factors
>> A and B, one has to answer the following question: is RT or log-RT
>> the "correct" scale in which to interpret the effects? My belief
>> is that the answer to this question should be the chief guiding
>> principle in determining whether to transform RT before regression
>> analysis or ANOVA.
>>
>> In terms of the negative consequences of deviation from normality
>> on the inferences coming out of the analysis, there is probably
>> some loss of power that comes with using linear regression on raw
>> RT data, which are heavy-tailed and skewed. But my own experience
>> -- both with empirical data and with artificial simulations -- is
>> that the loss of power is pretty minimal. If anyone else has
>> compelling simulations that demonstrate a substantial loss of
>> statistical power, though, I'd be interested in seeing them!
>>
>> Best
>>
>> Roger
>>
>> --
>>
>> Roger Levy Email: rlevy@ling.ucsd.edu
>> Assistant Professor Phone: 858-534-7219
>> Department of Linguistics Fax: 858-534-4789
>> UC San Diego Web: http://ling.ucsd.edu/~rlevy
>>
>>
>>
>>
>>
>>
>>
>>
>
> ----
> Reinhold Kliegl, Dept. of Psychology, University of Potsdam,
> Karl-Liebknecht-Strasse 24-25, 14476 Potsdam, Germany
> phone: +49 331 977 2868, fax: +49 331 977 2793
> http://www.psych.uni-potsdam.de/people/kliegl/
>
>
>
>
>
>
Richard S. Bogartz
Professor of Psychology
UMASS, Amherst 01003
More information about the ling-r-lang-L
mailing list