[R-lang] Re: How to handle missing data when I try to log-transform my data

Xiao He praguewatermelon@gmail.com
Wed Jun 23 11:33:11 PDT 2010


Hi all,

Thank you for the helpful explanations. I really do not know the theoretical
reasons behind replacing values smaller than 100ms with 0. I vaguely recall
that I read a couple of journal articles where the authors resorted to this
method. After reading what you guys wrote, I do agree that such replacement
would introduce more bias. The reason why I was thinking about replacement
was also partly because I just started using lmer() not long ago, and coming
from aov(), I guess I was more concerned about unbalanced data and missing
values and did not think about the flexibility and power of lmer(). But
thank you again for your help. I will do what you guys have suggested. :-)



Best,
Xiao


On Wed, Jun 23, 2010 at 11:08 AM, Scott Jackson <scottuba@gmail.com> wrote:

> I may be missing something, but I'm not sure why you would want to
> replace 100ms with 0ms.  Surely that introduces much more bias than if
> you had just left the too-low RTs in?
>
> I agree with the others that simply rejecting trials with too-low RTs
> is probably the way to go.  At least, that's a common practice,
> especially if you're using lmer or another analysis that does not
> require balances data.
>
> Alternatively, depending on what kind of data you have, you might try
> a more sophisticated missing-data imputation technique, like multiple
> imputation.  There's a very nice R package for imputation called
> "mice" that I have been using extensively lately, which has very
> readable and helpful documentation, even if you're new to multiple
> imputation.  There's another package called "Amelia" that I have not
> used, but which also has excellent-looking documentation.
>
> If you go the route of substituting NAs for too-short times instead of
> completely deleting that observation from your data.frame, here's a
> tip.  The following does NOT work:
>
> data$RT[data$RT < 100] <- NA
>
> The following DOES work:
>
> is.na(data$RT[data$RT < 100]) <- TRUE
>
> good luck,
> -scott
>
> On Wed, Jun 23, 2010 at 1:24 PM, Xiao He <praguewatermelon@gmail.com>
> wrote:
> > Dear R-lang users
> > I have a question that is, I suppose, less related to the use of R.
> > I have a set of self-paced reading data, and all the RTs that are below
> > 100ms are to be discarded. What I used to do when analyzing raw data was
> to
> > replace discarded values with 0. That was all simple and easy. But I
> > recently started to analyze log-transformed data. An issue then arises as
> to
> > how to handle missing data. Obviously, if I replace the discarded raw
> data
> > points with 0, log transformation does not work, as it will return "-Inf"
> > for obvious reasons. So I would like to know what you would suggest me to
> do
> > in my case. Thank you very much in advance.
> >
> >
> > Xiao He
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.ucsd.edu/mailman/private/ling-r-lang-l/attachments/20100623/4ca4ba9c/attachment.html 


More information about the ling-r-lang-L mailing list