[Lign274] Final project topic and groups
Randy West
rdwest at ucsd.edu
Mon Feb 22 18:29:05 PST 2010
Hi All,
I might be a little bit late to the punch here, but I'm still looking
for group member(s) for our final project. Briefly, I'm a first year
Master's student in computer science with a focus on artificial
intelligence. I have a little bit of formal training in English syntax,
but other than that I have very little linguistics background. I've
included an outline of my idea for a final project below, but if
everyone has already formed groups then please let me know so that I can
help out with one of your projects.
Here's my idea:
I'd like to do an analysis of the efficacy of various methods that we've
covered in class for use in search engine technology. The dataset could
be any collection of documents, but I was thinking of building a
web-crawling script to run over, say, Google news for a few days and
build up a database that way. The idea would be to produce for each
model (or mixture of models) an ordering of documents in the dataset
based on an ordered vector of search terms, i.e. a vector of documents
ordered by p(document | search terms). The simplest such model would be
the product of unigram likelihoods for each search term in the document,
while something more complex might be using LDA to determine topic
distributions over words and documents and leveraging those
distributions for search.
Please let me know what you all think, and again, if everyone has
already settled into groups then please let me know so that I can help out.
Best,
Randy
More information about the Lign274
mailing list