<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

Hi Emily et al,<br>

<br>

That sounds like an interesting topic as well. I had some significant

exposure to Bayesian surprise with respect to modeling saccades in a

cognitive modeling seminar course that I took last quarter, so I'd like

to see how it might be effectively integrated into a hierarchical model

to do reading time analysis. The Demberg and Keller model makes sense,

though I haven't had a chance yet to look at their results and analysis.<br>

<br>

To everyone who's planning on doing a final project, I'd like to

suggest that we convene for a short meeting after class tomorrow to

figure out how we want to group off with each other. Hopefully that

will give us all enough time to submit proposals to Roger in a timely

manner.<br>

<br>

Best,<br>

Randy<br>

<br>

Emily Morgan wrote:

<blockquote

 cite="mid:5e3c89101002231756t54fbd8a3qcc7c11198a578340@mail.gmail.com"

 type="cite">Hi Randy et al,<br>

As it turns out, we're mostly all behind on this... Here's an idea I

had for a project:<br>

I'm interested in eye movements during reading. There's a dataset

called the Dundee corpus with eye-tracking data for people reading

large amounts of text. There's been some work on predicting reading

times from this dataset using hierarchical models, e.g. <img

 moz-do-not-send="true"

 src="http://www.sciencedirect.com/scidirimg/clear.gif" alt=""

 border="0" height="10" width="1"><a moz-do-not-send="true"

 href="http://dx.doi.org/10.1016/j.cognition.2008.07.008"

 target="doilink"

 onclick="var doiWin; doiWin=window.open('http://dx.doi.org/10.1016/j.cognition.2008.07.008','doilink','scrollbars=yes,resizable=yes,directories=yes,toolbar=yes,menubar=yes,status=yes'); doiWin.focus()">doi:10.1016/j.cognition.2008.07.008</a>.

One of big questions is the roles of frequency versus contextual

probability as predictors of reading time. (Of course both will on

average lead to faster reading times, but what about a low-frequency

word that's highly predictable in context, or a high-frequency word

that's unpredicted given the context?) The linked paper above begins to

answer this question. Another angle to approach this question from

(with credit to Nathaniel Smith for suggesting this to me) is whether

frequency and predictability would play different roles for different

ranges of fixation durations: he suggests that frequency is playing a

larger role for very short fixations, while contextual probability is

playing a larger role in longer fixations. Hierarchical models seem

like a great tool to address this question. Anyone interested in

working on it with me? (I'm a first year student in the linguistics PhD

program, by the way.)<br>

  <br>

~Emily<br>

  <br>

  <br>

  <div class="gmail_quote">2010/2/22 Randy West <span dir="ltr">&lt;<a

 moz-do-not-send="true" href="mailto:rdwest@ucsd.edu" target="_blank">rdwest@ucsd.edu</a>&gt;</span><br>

  <blockquote class="gmail_quote"

 style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi

All,<br>

    <br>

I might be a little bit late to the punch here, but I'm still looking

for group member(s) for our final project. Briefly, I'm a first year

Master's student in computer science with a focus on artificial

intelligence. I have a little bit of formal training in English syntax,

but other than that I have very little linguistics background. I've

included an outline of my idea for a final project below, but if

everyone has already formed groups then please let me know so that I

can help out with one of your projects.<br>

    <br>

Here's my idea:<br>

    <br>

I'd like to do an analysis of the efficacy of various methods that

we've covered in class for use in search engine technology. The dataset

could be any collection of documents, but I was thinking of building a

web-crawling script to run over, say, Google news for a few days and

build up a database that way. The idea would be to produce for each

model (or mixture of models) an ordering of documents in the dataset

based on an ordered vector of search terms, i.e. a vector of documents

ordered by p(document | search terms). The simplest such model would be

the product of unigram likelihoods for each search term in the

document, while something more complex might be using LDA to determine

topic distributions over words and documents and leveraging those

distributions for search.<br>

    <br>

Please let me know what you all think, and again, if everyone has

already settled into groups then please let me know so that I can help

out.<br>

    <br>

Best,<br>

Randy<br>

    <br>

_______________________________________________<br>

Lign274 mailing list<br>

    <a moz-do-not-send="true" href="mailto:Lign274@ling.ucsd.edu"

 target="_blank">Lign274@ling.ucsd.edu</a><br>

    <a moz-do-not-send="true"

 href="http://pidgin.ucsd.edu/mailman/listinfo/lign274" target="_blank">http://pidgin.ucsd.edu/mailman/listinfo/lign274</a><br>

  </blockquote>

  </div>

  <br>

  <pre wrap="">

<hr size="4" width="90%">

_______________________________________________

Lign274 mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lign274@ling.ucsd.edu">Lign274@ling.ucsd.edu</a>

<a class="moz-txt-link-freetext" href="http://pidgin.ucsd.edu/mailman/listinfo/lign274">http://pidgin.ucsd.edu/mailman/listinfo/lign274</a>

  </pre>

</blockquote>

</body>

</html>