[Lign274] Final project topic and groups

Randy West rdwest at ucsd.edu
Mon Feb 22 18:29:05 PST 2010


Hi All,

I might be a little bit late to the punch here, but I'm still looking 
for group member(s) for our final project. Briefly, I'm a first year 
Master's student in computer science with a focus on artificial 
intelligence. I have a little bit of formal training in English syntax, 
but other than that I have very little linguistics background. I've 
included an outline of my idea for a final project below, but if 
everyone has already formed groups then please let me know so that I can 
help out with one of your projects.

Here's my idea:

I'd like to do an analysis of the efficacy of various methods that we've 
covered in class for use in search engine technology. The dataset could 
be any collection of documents, but I was thinking of building a 
web-crawling script to run over, say, Google news for a few days and 
build up a database that way. The idea would be to produce for each 
model (or mixture of models) an ordering of documents in the dataset 
based on an ordered vector of search terms, i.e. a vector of documents 
ordered by p(document | search terms). The simplest such model would be 
the product of unigram likelihoods for each search term in the document, 
while something more complex might be using LDA to determine topic 
distributions over words and documents and leveraging those 
distributions for search.

Please let me know what you all think, and again, if everyone has 
already settled into groups then please let me know so that I can help out.

Best,
Randy



More information about the Lign274 mailing list