[R-lang] Re: Random slopes in LME

T. Florian Jaeger tiflo@csli.stanford.edu
Sun Feb 20 13:37:37 PST 2011


Hi Philip,

One issue in finding the maximum random effect structure justified by the
> data is that it's not straightforward how to compare models with random
> effect structures which have differing random slope adjustments, but do not
> differ in the number of random slopes.
>
> For instance, in a 2 x 2 design, with factors A & B, one could do as
> Florian suggests and compare (1+A*B | Participant) to (1 | Participant). If
> it works out that the more complex model is unjustified, all is hunky dory.
>

Let me first repeat that my point 1) was not the procedure but merely a tip
as to how to *start* the procedure, which -on average- saves time. That's
really important.


> But if it *is* justified, then one would want to compare (1+A*B |
> Participant) with (1+A | Participant).  And here's the problem: if (1+A |
> Participant) turns out to be as good as (1+A*B | Participant), i.e. no
> significant improvement in the model, it's also possible that (1+B |
> Participant) is as good as (1+A*B | Participant).  So now the question is,
> which is preferred --- (1+A | Participant) or (1+B | Participant)? This
> doesn't seem to be an unapproachable problem for this simple case, but once
> one adds in random slopes for items, and the possibility of structures like
> (1 | Participant) + (0 + A | Participant) now the computational space of the
> problem seems to explode.
>

I wouldn't call this an explosion for a 2x2 ;). But, of course, there may be
several equally good models. First, it's important to point out that in my
experience that's the exception rather than the norm (with sufficient
amounts of data; of course, for little data, there are many ways to overfit
the model to the data).

Now, what to do if this situation occurs? I still think the answer is
simple: Make sure that your effect survives under all parameterizations. If
it doesn't report that (i.e. be honest). There might be one simplification
that many of us are fine with: sometimes one model will be "simpler" than
the other in that is contains fewer higher order terms. E.g. I would go with
(1 | Participant) over (0 + A | Participant) unless there is a *strong
theoretical reason to do the latter*. That's the reason why I suggested to
start removing the highest order terms at each step. It's a frequently
employed practice in regression modeling.

I'm not sure if Hal's program addresses this (which I haven't been able to
> get work, sadly), so apologies if this is all worked out already. But in the
> interest of formulating some conventions, I would interested in hearing
> others' thoughts on this issue.  I know that Roger has recommended using the
> full random effect structure (A*B), so long as it converges, but if others
> have bumped up against this problem, I'd like to hear how it's been
> handled.
>

I agree with Roger. If you have no reason to be concerned about
over-fitting, why not keep that the full structure. after all lmer can tell
you that the estimated variance is zero (or close to it).

Also, my email was not meant as a full solution, it was meant as a guideline
that will do perfectly fine in most scenarios. HTH,

Florian

>
> philip
>
> On Feb 20, 2011, at 12:48 PM, Daniel Ezra Johnson wrote:
>
> When comparing models with different random effect structure don't we have
> to grapple with the "boundary problem" - in other words, not just compare
> change in deviance to a chi^2(1)?
>
> On Feb 20, 2011, at 3:28 PM, "T. Florian Jaeger" <tiflo@csli.stanford.edu>
> wrote:
>
> Hi R-lang,
>
> maybe this is a good time to comment on a more general issue that Roger
> also brought up. As more and more people employ mixed models to analyze
> their data, we will need some conventions in terms of the procedures
> employed and the aspects of the analysis that are reported. While that is
> probably unattainable for the most general case, I think it's a reasonably
> achievable goal for standard 2 x 2 (or, by extensions, factorial)
> psycholinguistic experiments with more or less balanced data.
>
> In the meantime, I have some comments on the specific issue of random
> slopes. For a factorial design, I would submit that it should be the norm
> that we test the full factorial random effect structure with cross subject
> and item random effects. The goal of this procedure should be what I have
> called the "maximal random effect structure justified by the data" (which,
> of course, is a bit of a shorthand, since it's really the maximal random
> effect structure justified by the data under a set of assumptions, such as
> that the assumptions of the generalized linear mixed model and the
> particular family of distributions are met to a sufficient extent, that
> there is sufficient data to justify model comparison using a chisq over
> differences in deviances, that there is no overfitting issue, etc.).
>
> To see why we should test whether random slopes are required (and include
> them in the mode, if they are required), it might be helpful to recall what
> random effect structure in a linear mixed model would most closely
> correspond to our good old friends, the F1 and F2 (by-subject and by-item)
> ANOVA. For example, the F1 ANOVA is intended to not just capture
> between-subject differences in terms of the intercept (e.g. overall reading
> time differences, if we assume sum-coding). It's also meant to account for
> between-subject differences in the sensitivity to *all* of the design
> conditions (e.g. the two main effects and the interaction of a 2x2 design).
> So, the linear mixed model that most closely corresponds to the F1 ANOVA has
> the following random effect structure:
>
> (1 + Condition1 + Condition2 + Condition1:Condition2 | Participant)
>
> or shorter
>
> (1 + Condition1*Condition2 | Participant)
>
> and, similarly, the closest thing to an F2 analysis would be
>
> (1 + Condition1*Condition2 | Item)
>
> hence, to take maximal advantage of the fact that linear mixed models allow
> us to combine both analyses into one (reducing, among other things, the
> conservativity of minF analyses), the starting model should probably have
> both of these as crossed random effects, plus, obviously, the fixed effects
> for the design:
>
> 1 + Condition1*Condition2 + (1 + Condition1*Condition2 | Participant) + (1
> + Condition1*Condition2 | Item)
>
> I'll call this the full model (not the same as the model with "the maximal
> random effect structure justified by the data"!). Now, not all of the random
> effects may be justified, as we can test by model comparison. For example,
> the model might not even converge. So, we can simplify the model stepwise,
> starting with the higher order random effect terms (the highest
> interaction). I am assuming that we will leave the fixed effect structure
> untouched since it's directly justified by the theoretical interests that
> led us to create the 2x2 design.
>
> As we simplify the random effect structure stepwise, there are some tips
> that I find useful (see also <http://hlplab.wordpress.com/2009/05/14/random-effect-structure/>
> http://hlplab.wordpress.com/2009/05/14/random-effect-structure/ for a
> simple recipe):
>
> 1) Let's start by comparing the full model with a model with just the
> random intercepts:
>
> 1 + Condition1*Condition2 + (1 | Participant) + (1 | Item)
>
> Let's call this the intercept-only model.  I'll assume that we have enough
> data to justify model comparison using a chisq over differences in deviances
> (that will usually be the case for a reasonably sizes psycholinguistic
> experiment). I'll also assume ML-fitted models. If the difference in
> deviance between the full model and intercept-only model is small, i.e. if
> the chisq of the model comparison has a value of less than 3 (or 2, if you
> want to be conservative; if I recall correctly even a chisq(1)< 3.8 is
> technically no significant), we don't really need to look further. There is
> no way then that any of the intermediate models could be significant. That's
> because difference in deviance between nested models are additive. I.e. if
> model A is nested in B and B is nested in C (and hence A is nested in C),
> then the difference in deviance between A and B plus the difference between
> B and C = the difference between A and C.
>
> Even if the chisq between the full and the intercept-only model is larger,
> you'll immediately get an idea of how whether a lot or just a little of
> model improvement will be achieved by adding random slopes. So, even if you
> need to continue your comparisons to find the best model, you've learned
> something valuable at that moment.
>
> 2) In our typical psycholinguistic experiment, participants are exposed to
> a barrage of lexically and structurally rather homogeneous stimuli.
> Researchers and paradigms, of course, differ in how a typical set of stimuli
> looks, but if you live in the "The plumber that the janitor knew is from New
> York." stimulus world and you 'successfully' created a stimulus set that
> seeks its match in terms of homogeneity (a match it most certainly won't
> find in actual language use ;), then, in my experience, you can expect your
> item random effects to be rather small. I don't necessarily think that's a
> great thing, but it's convenient with regard to the statistical issues at
> hand. If you work with such stimuli, I recommend starting by comparing the
> full model to a model with only random subject effects:
>
> 1 + Condition1*Condition2 + (1 + Condition1*Condition2 | Participant)
>
> Again, if the chisq is really small, you don't need to look into item
> effects any further.
>
>
> 3) Hal Tily has developed a function (which he posted a while ago to this
> list) that automatically performs stepwise model comparison. While I am
> *not* advocating to blindly rely on automatic model simplification, I think
> this procedure -if restricted to random effects- could be acceptable for
> balanced designs. I think Roger Levy and Hal are working on a new an
> improved implementation, which hopefully will soon be available to this list
> (yippie!).
>
> Ok, sorry, for the unsolicited, long and perhaps trivial (?) email. Also, I
> am sure I've screwed up somewhere, so feel free to correct, expand. I just
> had the feeling that the general procedure outlined above wasn't clear to
> some (many?) readers of this list.
>
> Florian
>
>
>
> On Sun, Feb 20, 2011 at 8:08 AM, Zhenguang Garry Cai < <z.cai@ed.ac.uk>
> z.cai@ed.ac.uk> wrote:
>
>> Hi Ariel,
>>
>> Sorry for the somewhat misleading information. I do not recall any formal
>> requirement from JML for random slopes, but in a recent submission to JML, I
>> was required to include random slopes. Also, as Roger Levy said, including
>> random slopes is a sensible thing to do.
>>
>>
>> Garry
>>
>>
>> On 20/02/2011 12:37, Ariel M. Goldberg wrote:
>>
>>> Garry,
>>>
>>> Does JML have a set of specific requirements for mixed-effects analyses?
>>>  I wasn't able to find anything in their author instructions.
>>>
>>> Thanks,
>>> AG
>>>
>>> On Feb 13, 2011, at 1:17 PM, Zhenguang Cai wrote:
>>>
>>>  Hi all,
>>>>
>>>> I have a question concerning random slopes in mixed effects modeling. So
>>>> I ran a structural priming experiment with a 4-level variable (prime type,
>>>> A, B, D and D). The dependent variable is response construction (DO dative
>>>> vs. PO dative).  The following is a summary of the experiment results.
>>>>
>>>> Prime           A       B       C       D
>>>> DOs             85      24      38      59
>>>> POs             82      144     128     109
>>>> % of DOs        .51     .14     .23     .35
>>>>
>>>> I am interested in whether different prime types induced different
>>>> priming, e.g., whether A led to more DO responses than B, C or D. Initially,
>>>> I ran LME analyses with random intercepts only. For instance, I did the
>>>> following to see whether there was a main effect of prime type.
>>>>
>>>> fit.0 = lmer(Response~1+(1|Subject)+(1|Item), family=binomial)
>>>> fit.p = lmer(Response~Prime+(1|Subject)+(1|Item), family=binomial)
>>>> anova (fit.0, fit.p)
>>>>
>>>> Then, I did pairwise comparison by changing the reference level for
>>>> Prime, e.g.,
>>>>
>>>> fit.p = lmer(Response~relevel(Prime,ref="B")+(1|Subject)+(1|Item),
>>>> family=binomial)
>>>>
>>>> It seems that all the levels differed from each other. In particular,
>>>> the comparison between C and D results in Estimate = -1.02, SE = .32, Z
>>>> = -3.21, p<  .01.
>>>>
>>>> But it seems I have to consider whether the slope for Prime differs
>>>> across subjects or item (at least this is a requirement from JML). So the
>>>> following is the way I considered whether a random slope should be included
>>>> in a model. I wonder whether I did the correct thing. I first determined
>>>> whether subject random slope should be included by comparing the following
>>>> two models.
>>>>
>>>> fit.p = lmer(Response~Prime+(1|Subject)+(1|Item), family=binomial)
>>>> fit.ps = lmer(Response~Prime+(Prime+1|Subject)+(1|Item),
>>>> family=binomial)
>>>> anova (fit.p, fit.ps)
>>>>
>>>> I did the same thing about item random slopes.
>>>>
>>>> fit.p = lmer(Response~Prime+(1|Subject)+(1|Item), family=binomial)
>>>> fit.pi = lmer(Response~Prime+(1|Subject)+(Prime+1|Item),
>>>> family=binomial)
>>>> anova (fit.p, fit.pi)
>>>>
>>>> The subject random slope had a significant effect, so I included it in
>>>> the final model (e.g., fit.ps). But pairwise comparison returned
>>>> something that is different from my initial analyses (when random slope was
>>>> not considered). That is, the comparison between C and D became only
>>>> marginally significant (Estimate = -.85, SE = .47, z = -1.79, p = .07). It
>>>> is a bit strange because the 9% difference between B and C turned out to be
>>>> significant, but the 12% difference between C and D was not.
>>>>
>>>> Or did I do anything wrong in the analyses?
>>>>
>>>> Thanks,
>>>>
>>>> Garry
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>
> -----------------------------------------------
> Philip Hofmeister
> University of California - San Diego
> La Jolla, CA 92093-0526
> phofmeister@ucsd.edu
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucsd.edu/pipermail/ling-r-lang-l/attachments/20110220/353d0eab/attachment-0001.html 


More information about the ling-r-lang-L mailing list