[R-lang] Re: word frequency covariate question

Steven Piantadosi piantado@mit.edu
Thu Jan 13 07:24:55 PST 2011


Hi Ray,

I think the trick you are looking for is that you can estimate the
effect of frequency using a larger data set. For instance, you can take
the entire set of first-pass durations on *all* words in the corpus, and
do a regression on frequency and other control variables you might want
like length, subject effects, etc. This constructs a much richer and
accurate model of how subjects respond to changes in frequency since it
will be based on the whole range of frequencies and many words--not just
two. 

This regression gives you residuals for each word in the corpus,
including your instances of "before" and "after." If you look at these
residual reading times for "before" and "after," then you will have
measurements that have regressed out effects of frequency.

By the way, you probably want log frequency. 

Best,

++Steve



> 
> I am studying the first-pass gaze duration for sentences beginning with 
> either "Before" or "After"; these word-regions are my interest areas. 
> However, the frequency between these words is 154,000 hits in the corpus 
> for "After" and only 57,000 for "Before". How do I partial-out the 
> variance due to word frequency. I tried to do simple random regressions 
> for each participant, as I usually do when comparing one group of 
> items/sentences with another, but when there are only two words, then it 
> essentially amounts to dummy coding the variables that I am contrasting 
> using their frequency as a the coding scheme and leaving no residual 
> variance.
> 
> Any help here would be appreciated.
> 
> Best,
> -Ray Becker
> 




More information about the ling-r-lang-L mailing list