<OT> New Posting: ROA-811

roa at ruccs.rutgers.edu roa at ruccs.rutgers.edu
Sat Mar 18 10:35:03 PST 2006


ROA 811-0306

Learning underlying forms by searching restricted lexical subspaces

Nazarre Merchant <nazarrem at eden.rutgers.edu>
Bruce Tesar <tesar at rutgers.edu>

Direct link: http://roa.rutgers.edu/view.php3?roa=811


Abstract:
Two intertwined tasks that face a learner are learning a
lexicon and learning a constraint ranking.  Given a set
of surface forms, it is not possible in general to determine
what ranking produces the given forms without knowing what
the underlying forms are and similarly the underlying forms
are not discernable without information about the ranking
that produced the given surface forms (Albright and Hayes
2002; Hale and Reiss 1997; Tesar and Smolensky 2000; Tesar
et al. 2003).  Exhaustive searching of the possible underlying
forms and constraint rankings is untenable.  There simply
are too many possible combinations of lexica and constraint
rankings; the search space is too large.  Because of this
the learner must adopt some strategy to search the space
non-exhaustively.


In this paper we propose an algorithm whereby the learner
attends to contrast pairs.  Contrast pairs are pairs of
surface forms that contrast in some surface feature and
differ in exactly one morpheme, as described in Tesar 2004.
Because these pairs contrast it follows that they must differ
underlyingly in some featural specification; the surface
contrast could not obtain if the pair had identical underlying
specifications.  Furthermore, because they differ in only
one morpheme the difference in underlying specification
of the words must be a consequence of different underlying
specifications for the differing morphemes.


The two English words ‘cats’ and ‘cads’ form a contrast
pair.  Here the roots ‘cat’ and ‘cad’ are in the morphological
environment plural.  Because the plural morpheme has one
underlying specification the surface contrast must come
about from differing underlying specifications of the morphemes
‘cat’ and ‘cad’.


(1)	kæt + s ~ kæd + z


In this algorithm, for each contrast pair, the learner tests
all possible underlying forms for consistency.  The learner
then sets the underlying values of those features that have
the same value for all consistent underlying forms.  The
learner processes contrast pairs in order by the number
of undetermined features.  This way the learner processes
pairs with the fewest undetermined features (the ones requiring
the least computational effort) first.

 
There are two main ideas captured in the algorithm presented
here.  First, relevant information about the lexicon can
be extracted by only considering contrast pairs.  At no
time does the learner consider more than two surface forms
simultaneously, nor does it consider single forms in isolation,
as does the Surgery Learning Algorithm (henceforth SLA)
(Tesar et al 2003).  By using contrast pairs, the algorithm
can be shown to generalize the capabilities of the CA procedure,
being able to set all features that the CA procedure would
set and others as well.


The second main idea is that processing with contrast pairs
permits a meaningful reduction in the computational effort
required of the learner, relative to a simultaneous evaluation
of all the data.  For each contrast pair, the learner is
only considering combinations of values for the unset features
of the morphemes in that pair.  At no time does the learner
consider all combinations of unset features of all morphemes
simultaneously.  It is in this way that the learner avoids
the combinatorial nightmare of exhaustive search.


The results reported here support the claim that using contrast
pairs in learning provides a good trade-off between information
and computational effort.  Contrast pairs provide enough
information about underlying forms to allow the learner
to ultimately learn the correct grammar without requiring
excessive computational effort to process.

Comments: To appear in CLS 41.
Keywords: learnability, contrast, phonology
Areas: Learnability,Phonology
Type: Conference Proceedings Chapter

Direct link: http://roa.rutgers.edu/view.php3?roa=811


More information about the Optimal mailing list