Accounting for Multidimensionality in Item Responses in Patient-Centered and Patient Reported Outcomes Measurement

Main Article Content

Clement Stone
Brian Leventhal


Background: More robust and rigorous psychometric models, such as Item Response Theory (IRT) models, have been advocated for applications measuring health sciences outcomes. However, there are challenges to the use of IRT models with health assessments. In particular, item responses from measuring health-related outcomes are typically determined by multiple traits or dimensions. This multidimensionality can be caused by various factors including designed multidimensional structure to the instrument, heterogeneity in item content, and from other sources such as differential item functioning in subpopulations and individual differences in response styles to survey items and rating scales. Objectives: This paper discusses different extensions to IRT models that can be used to account for different types of multidimensionality as well as the use of Bayesian methods with person-centered medicine research.Methods: Use of the SAS PROC MCMC platform for implementing Bayesian analyses is illustrated to estimate and analyze IRT applications to health-related assessments. Results: PROC MCMC involves a straightforward translation of the response probability model along with specifications of the model parameters and prior distributions for the model parameters. Conclusions: Bayesian analysis of multidimensional IRT models is more accessible to researchers and scale developers in measuring health sciences outcomes for person-centered medicine research.

Article Details

Regular Articles


Stewart M (2001): Towards a global definition of patient centred care. British Medical Journal 322, 444-445.

Hays, R. D., Morales, L. S., and Reise, S. P. (2000). Item response theory and health outcomes measurement in the 21st century. Medical Care 38, 517-527.

The Health Foundation (2014). Helping measure person-centred care: Evidence review.

Reise, S. P., and Waller, N. G. (2009). Item response theory and clinical measurement. Annual Review of Clinical Psychology 5, 27-48.

Reise, S. P., Ainsworth, A. T., and Haviland, M. G. (2005). Item response theory: Fundamentals, applications, and promise in psychological research. Current Directions in Psychological Science 14, 95-101.

Reise, S.P. (2009). The emergence of item response theory models and the patient reported outcomes measurement information systems. Austrian Journal of Statistics 38, 211-220.

Stewart, A.L., & Ware, J.E. (1992). Measuring functioning and well-being: The medical outcomes study approach. Durham, NC: Duke University Press.

Mezzich J E, Kirisci L, Salloum I. (2011). INPCM-WHO Project on Developing Measures to Assess Progress Towards people-centered Care. Technical Report. International College of Person Centered Medicine, New York.

Reise, S.P., Moore, T.M., & Haviland, M.G. (2010). Bifactor models and rotations: Exploring the extent to which multidimensional data yield univocal scale scores. Journal of Personality Assessment 92, 544-559.

Cronbach, L.J. (1946). Response sets and test validity. Educational and Psychological Measurement 6, 475-494.

Zumbo, B.D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly 4, 223-233.

Paulhus, D. L. (1991). Measurement and control of response bias. In: J. P. Robinson, P. R. Shaver, & L. S. Wrightsman (Eds.), Measures of Personality and Social Psychological Attitudes (pp. 17-59). San Diego, CA: Academic Press.

De Beuckelaer, A., Weijters, B., & Rutten, A. (2010). Using ad hoc measures for response styles. A cautionary note. Quality & Quantity 44, 761-775.

Meisenberg, G., & Williams, A. (2008). Are acquiescent and extreme response styles related to low intelligence and education? Personality and Individual Differences 44, 1539-1550.

Meiser, T., & Machunsky, M. (2008). The personal structure of personal need for structure: A mixture-distribution Rasch analysis. European Journal of Psychological Assessment 24, 27-34.

De Jong, M. G., Steenkamp, J.-B. E. M., Fox, J.-P., & Baumgartner, H. (2008). Using item response theory to measure extreme response style in marketing research: A global investigation. Journal of Marketing Research 45, 104-115.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: principles and applications. Boston, MA: Kluwer-Nijhoff.

Reeve, B. B. 2002. An introduction to modern measurement theory. National Cancer Institute. Retrieved from

Yen, W. M. & Fitzpatrick, A. R. (2006). Item response theory. In R. L. Brennan (Ed.), Educational measurement (4th ed.). Westport, CT: American Council on Education/Praeger.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph (No. 17).

Samejima, F. (Ed.). (1996). The graded response model. New York: Springer. In van der Linden, W.J. and Hambleton, R.K. (1996). Handbook of Modern Item Response Theory. New York: Springer-Verlag.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika 47, 149-174.

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika 43, 561-573.

Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement 21, 25-36.

Wang, X., Bradlow, E. T., & Wainer, H. (2002). A general Bayesian model for testlets: Theory and applications. Applied Psychological Measurement 26, 109-128.

DeMars, C. E. (2013). A tutorial on interpreting bifactor model scores. International Journal of Testing 13, 354-378.

Gibbons, R.D., & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika 57, 423-436.

Gibbons, R. D., Immekus, J. C., & Bock, R. D. (2007). The added value of multidimensional IRT models. Downloaded from

Li, Y., Bolt, D.M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement 30, 3-21.

Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics 33, 204–229.

Sinharay, S. & Haberman, S. J. (2014). How often is the misfit of item response theory models practically significant? Educational Measurement: Issues and Practice, 33, 2335.

Feinberg, R.A. & Wainer, H. (2014). A simple equation to predict a subscore's value. Educational Measurement: Issues and Practice 33, 55-56.

Chen , F.F., West, S.G. & Sousa, K.H. (2006) A Comparison of Bifactor and Second-Order Models of Quality of Life. Multivariate Behavioral Research 41, 189-225.

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika 37, 29-51.

Bolt, D. M., & Johnson, T. R. (2009). Addressing score bias and differential item functioning due to individual differences in response style. Applied Psychological Measurement 33, 335–352.

Johnson, T. R., & Bolt, D. M. (2010). On the use of factor-analytic multinomial logit item response models to account for individual differences in response style. Journal of Educational and Behavioral Statistics 35, 92-114.

Bolt, D. M., & Newton, J. R. (2011). Multiscale measurement of extreme response style. Educational and Psychological Measurement 71, 814–833.

Böckenholt, U. (2012). Modeling multiple response processes in judgment and choice. Psychological Methods 17, 665-678.

Thissen-Roe, A., & Thissen, D. (2013) A two-decision model for responses to Likert-type items. Journal of Educational and Behavioral Statistics 38, 522-547.

Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2001). A mixture item response model for multiple-choice data. Journal of Educational and Behavioral Statistics 26, 381-409.

Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement 14, 271-282.

Sawatzky, R., Ratner, P.A., Kopec, J.A., & Zumbo, B.D. (2012). Latent variable mixture models: A promising approach for the validation of patient reported outcomes. Quality of Life Research 21, 637-650.

Vermunt, J. K., & Magidson, J. (2006). Latent GOLD 4.0 and IRT modeling. Retrieved February 13, 2009, from

Muthén, L.K. & Muthén, B.O. (2012). Mplus user’s guide. Seventh Edition. Los Angeles, CA: Muthén & Muthén.

Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO for Windows [Computer software]. Lincolnwood, IL: Scientific Software International.

Congdon, P. (2006). Bayesian Statistical Modelling. 2nd ed. Chichester: Wiley.

Fox, J.-P. (2010). Bayesian item response modeling: Theory and applications. New York: Springer.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D.B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis. 3rd ed. Boca Raton: CRC Press.

Lunn, D, Jackson, C, Best, N, Thomas, A, Spiegelhalter D (2013). The BUGS book: A practical introduction to Bayesian analysis. New York: CRC Press.

Spiegelhalter, D. J., Thomas, A., Best, N., & Lunn, D. (2003). WinBUGS User Manual Version 1.4 [Computer software manual]. Cambridge, UK: MRC Biostatistics Unit.

SAS Institute (2014). SAS/STAT 13.2 User’s Guide. Chapter 61, The MCMC Procedure. Cary, NC: SAS Institute Inc. Retrieved from:

Stone, C.A. & Zhu, X. (2015). Bayesian analysis of item response theory models using SAS. Cary, NC: SAS Institute Inc.

Spiegelhalter, D. J., Best, N., Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B, 64(4), 583-639.

Sinharay, S. (2006). Bayesian item fit analysis for unidimensional item response theory models. British Journal of Mathematical and Statistical Psychology 59, 429-449.

Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement 30, 298-321.

Zhu, X., & Stone, C.A. (2011). Assessing fit of unidimensional graded response models using Bayesian methods. Journal of Educational Measurement 48, 81-97.

Zhu, X., & Stone, C.A. (2012). Bayesian comparison of alternative graded response models for performance assessment applications. Educational and Psychological Measurement 72, 774-799.

Li, Y. & Baser, R. (2012). Using R and WinBUGS to fit a generalized partial credit model for developing and evaluating patient-reported outcomes assessment. Statistics in Medicine 31, 2010-2026.