Glossary
Classical Test Theory (CTT)
A scale development method where the assumption is that the participants’ responses or overall score on a measure are a linear combination of their true ability plus random error. The goal in CTT is to get as close to the true score as possible by minimizing noise (De Ayala, 2013; McCoach et al., 2013; Revelle, 2009).
Construct
A construct refers to the unobserved (i.e., latent) attitude, cognition, or attribute that is the target of the study (Bollen & Hoyle, 2012; Kline, 2023). Unobserved (or latent) in this context simply refers to a type of construct that exists in the mind of the participant and cannot easily be directly observed. The term construct can be used interchangeably with other terms such as domain and latent variable (Field et al., 2012).
Concurrent Validity
Measures the degree to which the performance on the current scale predicts performance on a criterion (gold standard) measurement (Cronbach & Meehl, 1955). Typically, the two measures are administered at the same time or consecutively (hence “concurrent”). It is common, however, that no gold standard measure exists making evaluation of concurrent validity impossible (Boateng et al., 2018); this is especially true in human robot interaction.
Construct Validity
Construct validity refers to the extent to which the scale measures what it was developed to measure and how much it is associated with other factors within the domain (Godfred et al., 2018; Borsboom et al., 2004).
Convergent Validity
Convergent validity refers to how well the new scale correlates with other variables that are designed to measure similar constructs (Campbell & Fiske, 1959).
Criterion Validity
Criterion validity refers to the degree to which there is a relationship between the construct on the current scale and construct on another similar measure or in another context that is of interest to the researcher (DeVellis & Thorpe, 2021; Raykov & Marcoulides, 2011).
Custom Scale
Any scale that has not been validated.
Dimension
A psychological variable that represents a component of the construct that is captured by the items within a scale (Furr, 2011). This term is also used interchangeably with factor.
Discriminant Validity
Discriminant validity refers to the extent to which the scale differs from other unrelated constructs (Raykov & Marcoulides, 2011). Discriminant validity is measured by analyzing correlations between the measure of interest and other measures that do not measure the same domain or concept (Boateng et al., 2018) where weaker correlations are expected.
Domain
A domain refers to the unobserved (i.e., latent) attitude, cognition, or attribute that is the target of the study (Bollen & Hoyle, 2012; Kline, 2023). It is often used interchangeably with “construct” (Field et al., 2012).
Factor
A psychological variable that represents a component of the construct that is captured by the items within a scale (Furr, 2011). This term is also used interchangeably with dimension.
Item
An item refers to the direct questions, directives, or statements that make up a scale. Each item within a scale is intended to capture the construct (i.e., attitude or behavior) either in part or in full (Zumbo et al., 2002).
Item Response Theory (IRT)
Item response theory uses an item-level approach to determining item and person fit within the scale (De Ayala, 2013; McCoach et al., 2013; Revelle, 2009).
Latent variable
The unobservable behavior, attitude, or attribute that is being measured (Bollen & Hoyle, 2012; Kline, 2023). This term is often used interchangeably with construct and domain (Field et al., 2012).
Predictive Validity
Predictive validity is a type of validation method. It measures the degree to which performance on the current scale predicts performance on another scale taken at a later time (Anastasi, 1985).
Psychometrics or Psychometric theory
The scientific study of testing, measurement, and assessment in the social and behavioral sciences (Revelle, 2009).
Rasch
One of the more common IRT models is the Rasch model. The Rasch model prioritizes invariance in measurement (Wind & Hua, 2021) and can be thought of as a theory for how the data should be structured which can then be used to identify deviations in observed data. In other words, the Rasch model is a process for fitting data to a model (Aryadoust et al., 2021; Wind & Hua, 2021).
Reliability
Reliability refers to the principle that a measurement produces similar results under similar conditions (Menon et al., 2025).
Scale
The term “scale” refers to any instrument that measures a behavior, attitude, or other latent construct that isn’t directly observable (DeVellis & Thorpe, 2021).
Subscale
Subscales refer to complete sets of items that load onto one factor in an existing validated scale. For example, the competence subscale in the RoSAS consists of six items that are related to the intelligence or ability of the robot (Carpinella et al., 2017).
References
Anastasi, A. (1985). Psychological testing: Basic concepts and common misconceptions. In The G. Stanley Hall lecture series, Vol. 5. (pp. 87–120). American Psychological Association. https://doi.org/10.1037/10052-003
Aryadoust, V., Ng, L. Y., & Sayama, H. (2021). A comprehensive review of Rasch measurement in language assessment: Recommendations and guidelines for research. Language Testing, 38(1), 6–40. https://doi.org/10.1177/0265532220927487
Boateng, G. O., Neilands, T. B., Frongillo, E. A., Melgar-Quiñonez, H. R., & Young, S. L. (2018). Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Frontiers in Public Health, 6. https://doi.org/10.3389/fpubh.2018.00149
Bollen, K. A., & Hoyle, R. H. (2012). Latent variables in structural equation modeling. In Handbook of structural equation modeling (pp. 56–67). The Guilford Press.
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061
Carpinella, C. M., Wyman, A. B., Perez, M. A., & Stroessner, S. J. (2017). The Robotic Social Attributes Scale (RoSAS): Development and Validation. Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, HRI ’17, 254–262. https://doi.org/10.1145/2909824.3020208
De Ayala, R. J. (2013). The Theory and Practice of Item Response Theory. Guilford Publications. https://www.guilford.com/books/The-Theory-and-Practice-of-Item-Response-Theory/R-de-Ayala/9781462547753
DeVellis, R. F., & Thorpe, C. T. (2021). Scale Development: Theory and Applications. SAGE Publications. https://collegepublishing.sagepub.com/products/scale-development-5-269114
Field, A., Field, Z., & Miles, J. (2012). Discovering Statistics Using R. SAGE Publications. https://collegepublishing.sagepub.com/products/discovering-statistics-using-r-1-236067
Furr, R. M. (2011). Scale Construction and Psychometrics for Social and Personality Psychology. SAGE Publications. https://doi.org/10.4135/9781446287866
Kline, R. B. (2023). Principles and Practice of Structural Equation Modeling. Guilford Publications. https://www.guilford.com/books/Principles-and-Practice-of-Structural-Equation-Modeling/Rex-Kline/9781462551910
McCoach, D. B., Gable, R. K., & Madura, J. (2013). Instrument Development in the Affective Domain. Springer. https://doi.org/10.1007/978-1-4614-7135-6
Menon, V., Grover, S., Gupta, S., Indu, P., Chacko, D., & Vidhukumar, K. (2025). A primer on reliability testing of a rating scale. Indian Journal of Psychiatry, 67(7), 725–729. https://doi.org/10.4103/indianjpsychiatry_584_25
Raykov, T., & Marcoulides, G. A. (2011). Introduction to Psychometric Theory. Routledge. https://www.routledge.com/Introduction-to-Psychometric-Theory/Raykov-Marcoulides/p/book/9780415878227
Revelle, W. (2009). An introduction to psychometric theory with applications in R. https://personality-project.org/r/book/ Zumbo, B., Gelin, M., & Hubley, A. (2002). The Construction and Use of Psychological Tests and Measures. In Encyclopedia of Life Support Systems (EOLSS). Eolss Publishers.