Classical Test Theory (CTT)

A scale development method where the assumption is that the participants’ responses or overall score on a measure are a linear combination of their true ability plus random error. The goal in CTT is to get as close to the true score as possible by minimizing noise (De Ayala, 2013; McCoach et al., 2013; Revelle, 2009).

Construct

A construct refers to the unobserved (i.e., latent) attitude, cognition, or attribute that is the target of the study (Bollen & Hoyle, 2012; Kline, 2023). Unobserved (or latent) in this context simply refers to a type of construct that exists in the mind of the participant and cannot easily be directly observed. The term construct can be used interchangeably with other terms such as domain and latent variable (Field et al., 2012).

Concurrent Validity

Measures the degree to which the performance on the current scale predicts performance on a criterion (gold standard) measurement (Cronbach & Meehl, 1955). Typically, the two measures are administered at the same time or consecutively (hence “concurrent”). It is common, however, that no gold standard measure exists making evaluation of concurrent validity impossible (Boateng et al., 2018); this is especially true in human robot interaction.

Construct Validity

Construct validity refers to the extent to which the scale measures what it was developed to measure and how much it is associated with other factors within the domain (Godfred et al., 2018; Borsboom et al., 2004).

Convergent Validity

Convergent validity refers to how well the new scale correlates with other variables that are designed to measure similar constructs (Campbell & Fiske, 1959).

Criterion Validity

Criterion validity refers to the degree to which there is a relationship between the construct on the current scale and construct on another similar measure or in another context that is of interest to the researcher (DeVellis & Thorpe, 2021; Raykov & Marcoulides, 2011).

Custom Scale

Any scale that has not been validated.

Dimension

A psychological variable that represents a component of the construct that is captured by the items within a scale (Furr, 2011). This term is also used interchangeably with factor.

Discriminant Validity

Discriminant validity refers to the extent to which the scale differs from other unrelated constructs (Raykov & Marcoulides, 2011). Discriminant validity is measured by analyzing correlations between the measure of interest and other measures that do not measure the same domain or concept (Boateng et al., 2018) where weaker correlations are expected.

Domain

A domain refers to the unobserved (i.e., latent) attitude, cognition, or attribute that is the target of the study (Bollen & Hoyle, 2012; Kline, 2023). It is often used interchangeably with “construct” (Field et al., 2012).

Factor

A psychological variable that represents a component of the construct that is captured by the items within a scale (Furr, 2011). This term is also used interchangeably with dimension.

Item

An item refers to the direct questions, directives, or statements that make up a scale. Each item within a scale is intended to capture the construct (i.e., attitude or behavior) either in part or in full (Zumbo et al., 2002).

Item Response Theory (IRT)

Item response theory uses an item-level approach to determining item and person fit within the scale (De Ayala, 2013; McCoach et al., 2013; Revelle, 2009).

Latent variable

The unobservable behavior, attitude, or attribute that is being measured (Bollen & Hoyle, 2012; Kline, 2023). This term is often used interchangeably with construct and domain (Field et al., 2012).

Predictive Validity

Predictive validity is a type of validation method. It measures the degree to which performance on the current scale predicts performance on another scale taken at a later time (Anastasi, 1985).

Psychometrics or Psychometric theory

The scientific study of testing, measurement, and assessment in the social and behavioral sciences (Revelle, 2009).

Rasch

One of the more common IRT models is the Rasch model. The Rasch model prioritizes invariance in measurement (Wind & Hua, 2021) and can be thought of as a theory for how the data should be structured which can then be used to identify deviations in observed data. In other words, the Rasch model is a process for fitting data to a model (Aryadoust et al., 2021; Wind & Hua, 2021).

Reliability

Reliability refers to the principle that a measurement produces similar results under similar conditions (Menon et al., 2025).

Scale

The term “scale” refers to any instrument that measures a behavior, attitude, or other latent construct that isn’t directly observable (DeVellis & Thorpe, 2021).

Subscale

Subscales refer to complete sets of items that load onto one factor in an existing validated scale. For example, the competence subscale in the RoSAS consists of six items that are related to the intelligence or ability of the robot (Carpinella et al., 2017).

References

Anastasi, A. (1985). Psychological testing: Basic concepts and common misconceptions. In The G. Stanley Hall lecture series, Vol. 5. (pp. 87–120). American Psychological Association. https://doi.org/10.1037/10052-003

Aryadoust, V., Ng, L. Y., & Sayama, H. (2021). A comprehensive review of Rasch measurement in language assessment: Recommendations and guidelines for research. Language Testing, 38(1), 6–40. https://doi.org/10.1177/0265532220927487

Boateng, G. O., Neilands, T. B., Frongillo, E. A., Melgar-Quiñonez, H. R., & Young, S. L. (2018). Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Frontiers in Public Health, 6. https://doi.org/10.3389/fpubh.2018.00149

Bollen, K. A., & Hoyle, R. H. (2012). Latent variables in structural equation modeling. In Handbook of structural equation modeling (pp. 56–67). The Guilford Press.

Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061

Carpinella, C. M., Wyman, A. B., Perez, M. A., & Stroessner, S. J. (2017). The Robotic Social Attributes Scale (RoSAS): Development and Validation. Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, HRI ’17, 254–262. https://doi.org/10.1145/2909824.3020208

De Ayala, R. J. (2013). The Theory and Practice of Item Response Theory. Guilford Publications. https://www.guilford.com/books/The-Theory-and-Practice-of-Item-Response-Theory/R-de-Ayala/9781462547753

DeVellis, R. F., & Thorpe, C. T. (2021). Scale Development: Theory and Applications. SAGE Publications. https://collegepublishing.sagepub.com/products/scale-development-5-269114

Field, A., Field, Z., & Miles, J. (2012). Discovering Statistics Using R. SAGE Publications. https://collegepublishing.sagepub.com/products/discovering-statistics-using-r-1-236067

Furr, R. M. (2011). Scale Construction and Psychometrics for Social and Personality Psychology. SAGE Publications. https://doi.org/10.4135/9781446287866

Kline, R. B. (2023). Principles and Practice of Structural Equation Modeling. Guilford Publications. https://www.guilford.com/books/Principles-and-Practice-of-Structural-Equation-Modeling/Rex-Kline/9781462551910

McCoach, D. B., Gable, R. K., & Madura, J. (2013). Instrument Development in the Affective Domain. Springer. https://doi.org/10.1007/978-1-4614-7135-6

Menon, V., Grover, S., Gupta, S., Indu, P., Chacko, D., & Vidhukumar, K. (2025). A primer on reliability testing of a rating scale. Indian Journal of Psychiatry, 67(7), 725–729. https://doi.org/10.4103/indianjpsychiatry_584_25

Raykov, T., & Marcoulides, G. A. (2011). Introduction to Psychometric Theory. Routledge. https://www.routledge.com/Introduction-to-Psychometric-Theory/Raykov-Marcoulides/p/book/9780415878227

Revelle, W. (2009). An introduction to psychometric theory with applications in R. https://personality-project.org/r/book/ Zumbo, B., Gelin, M., & Hubley, A. (2002). The Construction and Use of Psychological Tests and Measures. In Encyclopedia of Life Support Systems (EOLSS). Eolss Publishers.