[R-lang] Re: Determining whether a model fits a second data set

Maria Wolters maria.wolters@ed.ac.uk
Wed Oct 13 07:13:00 PDT 2010


Coming from a Machine Learning background, I would definitely use the trained model to predict 
the outcomes on the unseen data sets - this is how we validate classifiers, and in the end, regression, no matter how fancy, can be viewed in this way. I would prefer using the model where predictors have been normed, and norm the predictors from the new data sets based on the mean and standard deviation of the original data.

Having said that, this is but one tool in addition to detailed error analysis, as suggested by Daniel, simulations, and the usual model diagnostics (residuals). 

Cheers

Maria

Am 13.10.2010 um 14:05 schrieb Daniel Ezra Johnson:

> That's a great question. I'm jumping in without having a great answer.
> 
> 1) Does anyone know if you can do something like.... deviance per
> observation or something else that would be comparable across models,
> maybe only at the level of "this data fits the model better than that
> data".
> 
> 2) I think simulations would be very useful for answering all the
> questions you raise here:
> 
>> I'd like to
>> be able to comment on whether the relative contributions of the
>> predictors was an artifact of the data set, the particular children
>> involved,  or are true more generally.
> 
> I could talk more off list about constructing such simulations but
> it's not very hard.
> 
> Dan
> 
> 


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




More information about the ling-r-lang-L mailing list