Comparing assessments on a subjective scale across countries or socio-economic groups is often hampered by differences in response scales across groups. Anchoring vignettes help to correct for such differences, either in parametric models (the compound hierarchical ordered probit (CHOPIT) model and extensions) or non-parametrically, comparing rankings of vignette ratings and self-assessments across groups. We construct specification tests of parametric models, comparing non-parametric rankings with rankings by using the parametric estimates. Applied to six domains of health, the test always rejects the standard CHOPIT model, but an extended CHOPIT model performs better. This implies a need for more flexible (parametric or semiparametric) models than the standard CHOPIT model.