[Lign274] [Fwd: Sharon Goldwater, April 9, 2:00pm: From sounds to words: Bayesian modeling of early language acquisition]

Sun Apr 8 21:33:21 PDT 2007

On Monday, April 9, 2:00pm, Sharon Goldwater (Stanford University;
http://www.stanford.edu/~sgwater/) will give a colloquium at the UCSD
Linguistics Department, in AP&M 4301.  Title and abstract follow.

***

 From sounds to words: Bayesian modeling of early language acquisition

The child learning language is faced with a difficult problem: given a
set of specific linguistic observations, the learner must infer some
abstract representation that generalizes correctly to novel
observations and productions. In this talk, I argue that Bayesian
computational models provide a principled way to examine the
representations, biases, and sources of information that lead to
successful learning. As an example, I discuss my work on modeling word
segmentation. First, I discuss a computational study exploring the
effects of context on statistical word segmentation. I present two
models of segmentation developed using techniques from nonparametric
Bayesian statistics.  These models make different assumptions about
what defines a word. In the first model, words are assumed to be
statistically independent (as in the experimental stimuli of Saffran
et al. 1996 and many previous computational models). In the second
model, words are defined as units that help predict following words. I
show that the context-independent model undersegments the data, while
the contextual model yields much more accurate segmentations,
outperforming previous models on this task. This difference suggests
the need to consider contextual effects in infant word segmentation.

Simulations like these can provide insight into the usefulness of
different sources of information during learning, but do not directly
address the question of whether human generalizations are consistent
with the models developed. I discuss another project that addresses
this question by examining human performance in a word segmentation
experiment where utterance length is varied, and comparing results to
the predictions of a number of different word segmentation models. The
Bayesian model described above correlates better with the human data
than any other model tested, suggesting that this model captures
important properties of human segmentation.

If time allows, I will also discuss some work in progress in which I
am examining the acquisition of syntactic categories using similar
Bayesian modeling techniques.

***

For further information about the Linguistics department colloquia
series, including the schedule of future events, please visit
http://ling.ucsd.edu/events/colloquia.html