<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Hi Emily et al,<br>
<br>
That sounds like an interesting topic as well. I had some significant
exposure to Bayesian surprise with respect to modeling saccades in a
cognitive modeling seminar course that I took last quarter, so I'd like
to see how it might be effectively integrated into a hierarchical model
to do reading time analysis. The Demberg and Keller model makes sense,
though I haven't had a chance yet to look at their results and analysis.<br>
<br>
To everyone who's planning on doing a final project, I'd like to
suggest that we convene for a short meeting after class tomorrow to
figure out how we want to group off with each other. Hopefully that
will give us all enough time to submit proposals to Roger in a timely
manner.<br>
<br>
Best,<br>
Randy<br>
<br>
Emily Morgan wrote:
<blockquote
cite="mid:5e3c89101002231756t54fbd8a3qcc7c11198a578340@mail.gmail.com"
type="cite">Hi Randy et al,<br>
As it turns out, we're mostly all behind on this... Here's an idea I
had for a project:<br>
I'm interested in eye movements during reading. There's a dataset
called the Dundee corpus with eye-tracking data for people reading
large amounts of text. There's been some work on predicting reading
times from this dataset using hierarchical models, e.g. <img
moz-do-not-send="true"
src="http://www.sciencedirect.com/scidirimg/clear.gif" alt=""
border="0" height="10" width="1"><a moz-do-not-send="true"
href="http://dx.doi.org/10.1016/j.cognition.2008.07.008"
target="doilink"
onclick="var doiWin; doiWin=window.open('http://dx.doi.org/10.1016/j.cognition.2008.07.008','doilink','scrollbars=yes,resizable=yes,directories=yes,toolbar=yes,menubar=yes,status=yes'); doiWin.focus()">doi:10.1016/j.cognition.2008.07.008</a>.
One of big questions is the roles of frequency versus contextual
probability as predictors of reading time. (Of course both will on
average lead to faster reading times, but what about a low-frequency
word that's highly predictable in context, or a high-frequency word
that's unpredicted given the context?) The linked paper above begins to
answer this question. Another angle to approach this question from
(with credit to Nathaniel Smith for suggesting this to me) is whether
frequency and predictability would play different roles for different
ranges of fixation durations: he suggests that frequency is playing a
larger role for very short fixations, while contextual probability is
playing a larger role in longer fixations. Hierarchical models seem
like a great tool to address this question. Anyone interested in
working on it with me? (I'm a first year student in the linguistics PhD
program, by the way.)<br>
<br>
~Emily<br>
<br>
<br>
<div class="gmail_quote">2010/2/22 Randy West <span dir="ltr"><<a
moz-do-not-send="true" href="mailto:rdwest@ucsd.edu" target="_blank">rdwest@ucsd.edu</a>></span><br>
<blockquote class="gmail_quote"
style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi
All,<br>
<br>
I might be a little bit late to the punch here, but I'm still looking
for group member(s) for our final project. Briefly, I'm a first year
Master's student in computer science with a focus on artificial
intelligence. I have a little bit of formal training in English syntax,
but other than that I have very little linguistics background. I've
included an outline of my idea for a final project below, but if
everyone has already formed groups then please let me know so that I
can help out with one of your projects.<br>
<br>
Here's my idea:<br>
<br>
I'd like to do an analysis of the efficacy of various methods that
we've covered in class for use in search engine technology. The dataset
could be any collection of documents, but I was thinking of building a
web-crawling script to run over, say, Google news for a few days and
build up a database that way. The idea would be to produce for each
model (or mixture of models) an ordering of documents in the dataset
based on an ordered vector of search terms, i.e. a vector of documents
ordered by p(document | search terms). The simplest such model would be
the product of unigram likelihoods for each search term in the
document, while something more complex might be using LDA to determine
topic distributions over words and documents and leveraging those
distributions for search.<br>
<br>
Please let me know what you all think, and again, if everyone has
already settled into groups then please let me know so that I can help
out.<br>
<br>
Best,<br>
Randy<br>
<br>
_______________________________________________<br>
Lign274 mailing list<br>
<a moz-do-not-send="true" href="mailto:Lign274@ling.ucsd.edu"
target="_blank">Lign274@ling.ucsd.edu</a><br>
<a moz-do-not-send="true"
href="http://pidgin.ucsd.edu/mailman/listinfo/lign274" target="_blank">http://pidgin.ucsd.edu/mailman/listinfo/lign274</a><br>
</blockquote>
</div>
<br>
<pre wrap="">
<hr size="4" width="90%">
_______________________________________________
Lign274 mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Lign274@ling.ucsd.edu">Lign274@ling.ucsd.edu</a>
<a class="moz-txt-link-freetext" href="http://pidgin.ucsd.edu/mailman/listinfo/lign274">http://pidgin.ucsd.edu/mailman/listinfo/lign274</a>
</pre>
</blockquote>
</body>
</html>