[R-lang] Re: How to handle missing data when I try to log-transform my data

Michael T Hammond hammond@U.Arizona.EDU
Wed Jun 23 11:14:13 PDT 2010


all

This is my own approach as well; it's worked well for me.

mike h.

On Wed, 23 Jun 2010, Nathaniel Smith wrote:

> A little fancier approach would be to use R's built in support for
> missing values, "NA". The way this works is that NA is a magic value
> that means "this data point is not available", and anywhere R accepts
> a number or a string or whatever, it also accepts NA, and all R
> functions know how to do something more-or-less sensible with it.
>
> For example:
> > log(c(150, 300, NA, 250))
> [1] 5.010635 5.703782       NA 5.521461
>
> The way that modeling functions (lm, lmer, etc.) handle NA is that
> they throw out any rows of your data that contain an NA, so for
> something like reaction times this will end up being basically the
> same as what Matt suggested. But it can be more useful in other cases
> -- like say you have two predictors Age and TimeOfDay (like, I don't
> know, you think older people will have slower reaction times, and you
> think people are fastest in the morning). You always know what time
> you ran each subject, but some people declined to tell you their age.
> If you code those people's ages as NA, then
>     lm(RT ~ TimeOfDay)
> will automatically analyze *all* subjects' data, while a formula that
> includes "Age" like
>     lm(RT ~ TimeOfDay + Age)
> will automatically analyze just the subset of your subjects who were
> willing to tell you their age.
>
> -- Nathaniel
>
> On Wed, Jun 23, 2010 at 10:48 AM, Matt Goldrick
> <matt-goldrick@northwestern.edu> wrote:
> > Dear Xiao
> > Why don't you simply discard the missing values? I don't know what type of
> > analyses you're doing, but mixed-effects regression models are generally
> > robust to imbalanced data, so there's no need to keep the number of
> > observations fixed across conditions.
> > HTH,
> > Matt
> > On Wed, Jun 23, 2010 at 12:24 PM, Xiao He <praguewatermelon@gmail.com>
> > wrote:
> >>
> >> Dear R-lang users
> >> I have a question that is, I suppose, less related to the use of R.
> >> I have a set of self-paced reading data, and all the RTs that are below
> >> 100ms are to be discarded. What I used to do when analyzing raw data was to
> >> replace discarded values with 0. That was all simple and easy. But I
> >> recently started to analyze log-transformed data. An issue then arises as to
> >> how to handle missing data. Obviously, if I replace the discarded raw data
> >> points with 0, log transformation does not work, as it will return "-Inf"
> >> for obvious reasons. So I would like to know what you would suggest me to do
> >> in my case. Thank you very much in advance.
> >>
> >>
> >> Xiao He
> >
> >
>
>


More information about the ling-r-lang-L mailing list