From jvandyke at haskins.yale.edu Fri May 1 13:54:35 2009 From: jvandyke at haskins.yale.edu (Julie Van Dyke) Date: Fri, 1 May 2009 16:54:35 -0400 Subject: [R-lang] Post-doc using mixed-effects modeling Message-ID: <903B2E94-978F-4D4A-B621-558A35909EE7@haskins.yale.edu> POST-DOCTORAL POSITION IN MEMORY AND READING Applications are invited for a post-doctoral position at Haskins Laboratories, New Haven, CT. This research is funded by an NIH/NICHD grant directed by Dr. Julie Van Dyke, and involves investigations of individual differences in memory retrieval and sensitivity to interference in reading comprehension. The primary method of investigation is the speed-accuracy tradeoff (SAT) technique, however eye-tracking and computational modeling methods will also be used. The project targets the population of non-college-bound adolescents, who have reading difficulties serious enough to interfere with successful completion of high-school, but who have not received a diagnosis of dyslexia. More information about the project can be found at http://www.haskins.yale.edu/staff/vandyke.html. We are particularly interested in postdoctoral fellows with expertise in mixed-effects modeling using the R statistical package. We encourage applications from candidates with expertise in the areas of memory, psycholinguistics, computational linguistics, or reading. The position is available for one year, renewable for a second year. Haskins Laboratories is an independent, world-renowned center for the study of the cognitive and neurobiological foundations of spoken and written language, disability, and instruction. Additional information about the institution can be found at http://www.haskins.yale.edu. Interested applicants should send a personal statement describing their graduate training and how their research interests relate to those of the project (described at the website above), a curriculum vitae, and arrange to have three letters of recommendation sent to Julie Van Dyke at jvandyke at haskins.yale.edu. Materials may also be sent to the mailing address at the bottom of this message. Review of applications will begin on June 1 and continue until the position has been filled; anticipated start date is Sept 1. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Julie Van Dyke, Ph.D., Senior Research Scientist Haskins Laboratories THE SCIENCE OF THE SPOKEN AND WRITTEN WORD 300 George Street / New Haven, CT 06511 USA (203) 865-6163 x 214 / FAX (203) 865-8963 http://www.haskins.yale.edu/staff/vandyke.html ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Linda.Mortensen at psy.ku.dk Mon May 4 02:36:09 2009 From: Linda.Mortensen at psy.ku.dk (Linda Mortensen) Date: Mon, 4 May 2009 11:36:09 +0200 Subject: [R-lang] How to use mixed-effects models on multinomial data Message-ID: <7A47FC91544BDC44B54C6807E6995019CE8D1E@ibtmail1.ibt.ku.dk.ad> Dear R-language experts, I'm trying to run a logistic regression on ordered multinomial data. - The dependent variable is number of correct items with accuracy ranging from 0 to 5. I want to use a mixed-effects model, but am unsure about how to use this model to fit multinomial data. Can you help? Thanks Linda Linda Mortensen Post-doctoral research fellow Department of Psychology University of Copenhagen ?ster Farimagsgade 2A 1353 Copenhagen K Denmark Tel.: +45 3532 4889 E-mail: linda.mortensen at psy.ku.dk From andy.fugard at sbg.ac.at Tue May 5 00:20:54 2009 From: andy.fugard at sbg.ac.at (Andy Fugard) Date: Tue, 05 May 2009 09:20:54 +0200 Subject: [R-lang] How to use mixed-effects models on multinomial data In-Reply-To: <7A47FC91544BDC44B54C6807E6995019CE8D1E@ibtmail1.ibt.ku.dk.ad> References: <7A47FC91544BDC44B54C6807E6995019CE8D1E@ibtmail1.ibt.ku.dk.ad> Message-ID: <49FFE8D6.7070505@sbg.ac.at> Linda Mortensen wrote: > > I'm trying to run a logistic regression on ordered multinomial data. > - The dependent variable is number of correct items with accuracy > ranging from 0 to 5. I want to use a mixed-effects model, but am > unsure about how to use this model to fit multinomial data. Can you > help? Dear Linda, The most common package used by psycholinguists, lme4, doesn't fit such models, but apparently you can fit them using glmmADMB. (I don't yet know how.) Anyway, venture to: http://otter-rsch.com/admbre/examples/glmmadmb/glmmADMB.html and follow the instructions for download. This example should work when all is installed properly: --8<---------------------------------------------------------------------- require(glmmADMB) M0 = glmm.admb(y~Base*trt+Visit,random=~Visit, group="subject",data=epil2,family="nbinom") M1 = glmm.admb(y~Base*trt+Age+Visit,random=~Visit, group="subject",data=epil2,family="nbinom") anova(M0,M1) ---------------------------------------------------------------------->8-- There's a message board on which you can request help. If you're at home in an MCMCed up Bayesian framework you could also try MCMCglmm. --8<---------------------------------------------------------------------- install.packages("MCMCglmm", dep=T) require(MCMCglmm) vignette("Tutorial", "MCMCglmm") ---------------------------------------------------------------------->8-- Cheers, Andy -- Andy Fugard, Post-doc, ESF LogICCC (LcpR) project Fachbereich Psychologie, Universitaet Salzburg Hellbrunnerstr. 34, 5020 Salzburg, Austria +43 (0)680 2199 346 http://figuraleffect.googlepages.com From austin.frank at gmail.com Thu May 7 01:19:36 2009 From: austin.frank at gmail.com (Austin Frank) Date: Thu, 07 May 2009 04:19:36 -0400 Subject: [R-lang] [SPAM] Re: How to use mixed-effects models on multinomial data References: <7A47FC91544BDC44B54C6807E6995019CE8D1E@ibtmail1.ibt.ku.dk.ad> Message-ID: On Mon, May 04 2009, Linda Mortensen wrote: > Dear R-language experts, > > I'm trying to run a logistic regression on ordered multinomial data. - > The dependent variable is number of correct items with accuracy > ranging from 0 to 5. I want to use a mixed-effects model, but am > unsure about how to use this model to fit multinomial data. Can you > help? Hi Linda-- I've posted my response to this question at http://hlplab.wordpress.com/2009/05/07/multinomial-random-effects-models-in-r/. While I've recently had some success using MCMCglmm for unordered multinomial data, I have no experience with analyzing ordered categorical data in R. The relevant kinds of models are proportional odds regression (the polr function from the MASS package, for example) and multinomial probit regression, but I don't know which R packages (if any) allow random effects in those frameworks. The post linked above includes a list of candidate packages that you might want to check out. At least one, mprobit, seems to allow structured variance based on a cluster term. Thanks, /au -- Austin Frank http://aufrank.net GPG Public Key (D7398C2F): http://aufrank.net/personal.asc -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 193 bytes Desc: not available URL: From H.Quene at uu.nl Thu May 7 05:06:49 2009 From: H.Quene at uu.nl (=?UTF-8?B?SHVnbyBRdWVuwo7DqQ==?=) Date: Thu, 07 May 2009 14:06:49 +0200 Subject: [R-lang] How to use mixed-effects models on multinomial data [R-lang Digest, Vol 23, Issue 2] In-Reply-To: References: Message-ID: <4A02CED9.2010405@uu.nl> Dear Linda, A few years ago we solved a similar problem by means of a two-stage bootstrap analysis. Our data consisted of mispronunciations in various categories (no error, error interrupted, error noninterrupted, other error, etc). First, we did a two-stage resampling of the data set (i.e. first resampling subjects, and then resampling responses within resampled subjects). Then we ran a multinomial logistic regression on the selected responses, yielding a regression coefficient for each term in the model. The above steps were repeated 250 times. The resulting regression coefficients, 250 for each term in the model), were used to determine the 95%CI (i.e. the P025 to P975 interval) for each term. For more details and results, see: Nooteboom, S.G., & Quen?, H. (2008). Self-monitoring and feedback: a new attempt to find the main cause of lexical bias in phonological speech errors. Journal of Memory and Language, 58 (3), 837-861. [doi:10.1016/j.jml.2007.05.003]. Hope this helps! With kind regards, Hugo Quen? > Date: Mon, 4 May 2009 11:36:09 +0200 > From: "Linda Mortensen" > Subject: [R-lang] How to use mixed-effects models on multinomial data > To: > Message-ID: > <7A47FC91544BDC44B54C6807E6995019CE8D1E at ibtmail1.ibt.ku.dk.ad> > Content-Type: text/plain; charset="iso-8859-1" > > Dear R-language experts, > > I'm trying to run a logistic regression on ordered multinomial data. - The dependent variable is number of correct items with accuracy ranging from 0 to 5. I want to use a mixed-effects model, but am unsure about how to use this model to fit multinomial data. Can you help? > > Thanks > > Linda > > Linda Mortensen > Post-doctoral research fellow > Department of Psychology > University of Copenhagen > ?ster Farimagsgade 2A > 1353 Copenhagen K > Denmark > Tel.: +45 3532 4889 > E-mail: linda.mortensen at psy.ku.dk > -- Dr Hugo Quen? | assoc prof in Phonetics | Utrecht inst of Linguistics OTS | Utrecht University | Trans 10 | 3512 JK Utrecht | The Netherlands | T +31 30 253 6070 | F +31 30 253 6000 | H.Quene at uu.nl | www.hugoquene.nl | www.hum.uu.nl From jvandyke at haskins.yale.edu Wed May 13 08:21:29 2009 From: jvandyke at haskins.yale.edu (Julie Van Dyke) Date: Wed, 13 May 2009 11:21:29 -0400 Subject: [R-lang] REVISED ANNOUNCEMENT: POST-DOC IN MEMORY AND READING Message-ID: POST-DOCTORAL POSITION IN MEMORY AND READING Applications are invited for a post-doctoral position at Haskins Laboratories, New Haven, CT. Applications from NON-US CITIZENS are welcome. This research is funded by an NIH/NICHD grant directed by Dr. Julie Van Dyke, and involves investigations of individual differences in memory retrieval and sensitivity to interference in reading comprehension. The primary method of investigation is the speed- accuracy tradeoff (SAT) technique, however eye-tracking and computational modeling methods will also be used. The project targets the population of non-college-bound adolescents, who have reading difficulties serious enough to interfere with successful completion of high-school, but who have not received a diagnosis of dyslexia. More information about the project can be found at http://www.haskins.yale.edu/staff/vandyke.html . We are particularly interested in postdoctoral fellows with expertise in mixed-effects modeling using the R statistical package. We encourage applications from candidates with expertise in the areas of memory, psycholinguistics, computational linguistics, or reading. The position is available for one year, renewable for a second year. Haskins Laboratories is an independent, world-renowned center for the study of the cognitive and neurobiological foundations of spoken and written language, disability, and instruction. Additional information about the institution can be found at http://www.haskins.yale.edu. Interested applicants should send a personal statement describing their graduate training and how their research interests relate to those of the project (described at the website above), a curriculum vitae, and arrange to have three letters of recommendation sent to Julie Van Dyke at jvandyke at haskins.yale.edu. Materials may also be sent to the mailing address at the bottom of this message. Review of applications will begin IMMEDIATELY and continue until the position has been filled; anticipated start date is Sept 1. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Julie Van Dyke, Ph.D., Senior Research Scientist Haskins Laboratories THE SCIENCE OF THE SPOKEN AND WRITTEN WORD 300 George Street / New Haven, CT 06511 USA (203) 865-6163 x 214 / FAX (203) 865-8963 http://www.haskins.yale.edu/staff/vandyke.html ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Linda.Mortensen at psy.ku.dk Wed May 20 03:34:40 2009 From: Linda.Mortensen at psy.ku.dk (Linda Mortensen) Date: Wed, 20 May 2009 12:34:40 +0200 Subject: [R-lang] How to compare mixed logit models with crossed random effects Message-ID: <7A47FC91544BDC44B54C6807E6995019CE8D59@ibtmail1.ibt.ku.dk.ad> Dear LanguageR users, I'm trying to fit a mixed logit model using the lmer function in the lme4 package. My question concerns the random effects part of this model (i.e., the random effects for my subjects and items) and how I decide between models that differ in the number of random effect terms that are estimated. So far, I have used two procedures: 1. For a given model, I remove a random effect term if it correlates very strongly with either the intercept or any of the other random effect terms. Eventually, I end up with a model in which all correlations are modest. 2. I compare the quasi-log likelihood (logLik) values of a model with a given random effect term (e.g. an interaction term, ... (1 + a * b | sub) and of a model without that term (... (1 + a + b | sub). If the logLik values are very similar (i.e., if the value is not, or at least not much, smaller for the model without the term than for the model with the term), I go for the former model. Is it acceptable to select a model on the basis of this comparison? Or, when the logLik values are similar (which they usually are for my models), should I instead look at the measures of likelihood that take into account the number of parameters in a model when evaluating its fit (i.e., AIC, BIC, deviance)? According to these other measures, a simple model seems always to be better than a more complex one, but if I want to rule out that my fixed effects can be explained, in part, by random effects for subjects and items, then a simple model (with few random effects) is not necessarily better than a complex one, I would think. >From prior postings on this lists and from other sources, I get the impression that a direct comparison of the likelihoods of two mixed logit models that differ in the random effect part (or in the fixed effect part) using the anova () function is not recommended. Please correct me if I'm wrong in assuming this. Any advice on how I go about making these model comparisons is much appreciated. Linda Linda Mortensen Post-doctoral research fellow Department of Psychology University of Copenhagen ?ster Farimagsgade 2A 1353 Copenhagen K Denmark Tel.: +45 3532 4889 E-mail: linda.mortensen at psy.ku.dk From rlevy at ling.ucsd.edu Sun May 24 10:52:48 2009 From: rlevy at ling.ucsd.edu (Roger Levy) Date: Sun, 24 May 2009 10:52:48 -0700 Subject: [R-lang] How to compare mixed logit models with crossed random effects In-Reply-To: <7A47FC91544BDC44B54C6807E6995019CE8D59@ibtmail1.ibt.ku.dk.ad> References: <7A47FC91544BDC44B54C6807E6995019CE8D59@ibtmail1.ibt.ku.dk.ad> Message-ID: <661E0318-5787-4275-816C-7F0E30E630BD@ling.ucsd.edu> Dear Linda, On May 20, 2009, at 3:34 AM, Linda Mortensen wrote: > Dear LanguageR users, > > I'm trying to fit a mixed logit model using the lmer function in the > lme4 package. My question concerns the random effects part of this > model (i.e., the random effects for my subjects and items) and how I > decide between models that differ in the number of random effect > terms that are estimated. First of all, in my assessment the problem of which random effects terms to include in your model when the primary target of inference is the fixed effects is still open. > So far, I have used two procedures: > > 1. For a given model, I remove a random effect term if it correlates > very strongly with either the intercept or any of the other random > effect terms. Eventually, I end up with a model in which all > correlations are modest. This is an interesting idea, but I would emphasize two things: 1) it's important to distinguish between positive and negative correlations. A strong negative correlation is telling you something very important about your dataset. Imagine a word recognition task where the response variable is correct answer and the covariate x1 is word frequency. A strong negative correlation between intercept and x1 is telling you that participants who answer more correctly overall are less sensitive to word frequency, and vice versa, and that this is a very reliable generalization. You can see this in model log- likelihoods too: compare the two lmer model fits below. set.seed(9) library(mvtnorm) library(lme4) k <- 10 n <- 1000 cl <- gl(k,1,n) x1 <- runif(n) sigma <- matrix(c(1,-1.8,-1.8,4),2,2) b <- rmvnorm(k,mean=c(0,0),sigma) eta <- b[cl,1] + b[cl,2]*x1 y <- rbinom(n,1,exp(eta)/ (1+exp(eta))) lmer(y ~ 1 + (1 | cl),family="binomial") lmer(y ~ 1 + (x1 | cl),family="binomial") 2) When you say "remove a term", what really would be justified is if the random parameters for covariates x1 and x2 are correlated at >0.99, create a third, "proxy" parameter x12=x1+x2, add x12 to the random-effects structure, and drop x1 and x2. This would save you two parameters at basically no modeling cost. > 2. I compare the quasi-log likelihood (logLik) values of a model > with a given random effect term (e.g. an interaction term, ... (1 + > a * b | sub) and of a model without that term (... (1 + a + b | > sub). If the logLik values are very similar (i.e., if the value is > not, or at least not much, smaller for the model without the term > than for the model with the term), I go for the former model. This is OK, and more of the recommended practice (see Baayen et al., 2008, for discussion with respect to linear mixed-effect models). You can actually do a likelihood-ratio test, though with the dual caveats that (a) Laplace-approximated log-likelihood is not true loglikelihood; and (b) the test is conservative. > Is it acceptable to select a model on the basis of this comparison? > Or, when the logLik values are similar (which they usually are for > my models), should I instead look at the measures of likelihood that > take into account the number of parameters in a model when > evaluating its fit (i.e., AIC, BIC, deviance)? According to these > other measures, a simple model seems always to be better than a more > complex one, but if I want to rule out that my fixed effects can be > explained, in part, by random effects for subjects and items, then a > simple model (with few random effects) is not necessarily better > than a complex one, I would think. Well, first of all the deviance is just -2*logLik. The AIC and BIC are still dominated by log-likelihood too. And it's not always going to be the case that the logLik will not be appreciably better for more complex models -- see my above example. Finally, I'd agree with you that it's better to be cautious and include the extra, more complex terms if you want to be sure that you have a "real" fixed effect. Hope this helps. Best Roger -- Roger Levy Email: rlevy at ling.ucsd.edu Assistant Professor Phone: 858-534-7219 Department of Linguistics Fax: 858-534-4789 UC San Diego Web: http://ling.ucsd.edu/~rlevy From rlevy at ling.ucsd.edu Sun May 24 11:47:39 2009 From: rlevy at ling.ucsd.edu (Roger Levy) Date: Sun, 24 May 2009 11:47:39 -0700 Subject: [R-lang] How to compare mixed logit models with crossed random effects In-Reply-To: <661E0318-5787-4275-816C-7F0E30E630BD@ling.ucsd.edu> References: <7A47FC91544BDC44B54C6807E6995019CE8D59@ibtmail1.ibt.ku.dk.ad> <661E0318-5787-4275-816C-7F0E30E630BD@ling.ucsd.edu> Message-ID: <538498AB-7747-4AA4-9148-5D5FBD40D1E8@ling.ucsd.edu> There is a minor error in my post from earlier today that I should correct (see below): On May 24, 2009, at 10:52 AM, Roger Levy wrote: > Dear Linda, > > On May 20, 2009, at 3:34 AM, Linda Mortensen wrote: > >> Dear LanguageR users, >> >> I'm trying to fit a mixed logit model using the lmer function in >> the lme4 package. My question concerns the random effects part of >> this model (i.e., the random effects for my subjects and items) and >> how I decide between models that differ in the number of random >> effect terms that are estimated. > > First of all, in my assessment the problem of which random effects > terms to include in your model when the primary target of inference > is the fixed effects is still open. > >> So far, I have used two procedures: >> >> 1. For a given model, I remove a random effect term if it >> correlates very strongly with either the intercept or any of the >> other random effect terms. Eventually, I end up with a model in >> which all correlations are modest. > > This is an interesting idea, but I would emphasize two things: > > 1) it's important to distinguish between positive and negative > correlations. A strong negative correlation is telling you > something very important about your dataset. Imagine a word > recognition task where the response variable is correct answer and > the covariate x1 is word frequency. A strong negative correlation > between intercept and x1 is telling you that participants who answer > more correctly overall are less sensitive to word frequency, and > vice versa, and that this is a very reliable generalization. You > can see this in model log-likelihoods too: compare the two lmer > model fits below. > > set.seed(9) > library(mvtnorm) > library(lme4) > k <- 10 > n <- 1000 > cl <- gl(k,1,n) > x1 <- runif(n) > sigma <- matrix(c(1,-1.8,-1.8,4),2,2) > b <- rmvnorm(k,mean=c(0,0),sigma) > eta <- b[cl,1] + b[cl,2]*x1 > y <- rbinom(n,1,exp(eta)/ (1+exp(eta))) > lmer(y ~ 1 + (1 | cl),family="binomial") > lmer(y ~ 1 + (x1 | cl),family="binomial") > > 2) When you say "remove a term", what really would be justified is > if the random parameters for covariates x1 and x2 are correlated at > >0.99, create a third, "proxy" parameter x12=x1+x2, add x12 to the > random-effects structure, and drop x1 and x2. This would save you > two parameters at basically no modeling cost. This proxy parameter x12 should be equal to x1+C*x2, for some value of C which you could read off of the old model fit where x1 and x2 are separate (divide the standard deviation of the random effect for x2 by the standard deviation for x1). > > >> 2. I compare the quasi-log likelihood (logLik) values of a model >> with a given random effect term (e.g. an interaction term, ... (1 + >> a * b | sub) and of a model without that term (... (1 + a + b | >> sub). If the logLik values are very similar (i.e., if the value is >> not, or at least not much, smaller for the model without the term >> than for the model with the term), I go for the former model. > > This is OK, and more of the recommended practice (see Baayen et al., > 2008, for discussion with respect to linear mixed-effect models). > You can actually do a likelihood-ratio test, though with the dual > caveats that (a) Laplace-approximated log-likelihood is not true > loglikelihood; and (b) the test is conservative. > >> Is it acceptable to select a model on the basis of this comparison? >> Or, when the logLik values are similar (which they usually are for >> my models), should I instead look at the measures of likelihood >> that take into account the number of parameters in a model when >> evaluating its fit (i.e., AIC, BIC, deviance)? According to these >> other measures, a simple model seems always to be better than a >> more complex one, but if I want to rule out that my fixed effects >> can be explained, in part, by random effects for subjects and >> items, then a simple model (with few random effects) is not >> necessarily better than a complex one, I would think. > > Well, first of all the deviance is just -2*logLik. The AIC and BIC > are still dominated by log-likelihood too. And it's not always going > to be the case that the logLik will not be appreciably better for > more complex models -- see my above example. Finally, I'd agree > with you that it's better to be cautious and include the extra, more > complex terms if you want to be sure that you have a "real" fixed > effect. > > Hope this helps. > > Best > > Roger > > -- > > Roger Levy Email: rlevy at ling.ucsd.edu > Assistant Professor Phone: 858-534-7219 > Department of Linguistics Fax: 858-534-4789 > UC San Diego Web: http://ling.ucsd.edu/~rlevy > > > > > > > > > _______________________________________________ > R-lang mailing list > R-lang at ling.ucsd.edu > http://pidgin.ucsd.edu/mailman/listinfo/r-lang -- Roger Levy Email: rlevy at ling.ucsd.edu Assistant Professor Phone: 858-534-7219 Department of Linguistics Fax: 858-534-4789 UC San Diego Web: http://ling.ucsd.edu/~rlevy