How the guideline works

The guideline was developed to help you, the reader, pick the appropriate scale, evaluate whether it was developed adequately, and determine its validity. The guideline includes 13 high-level questions to assist you in this process. These guideline questions should be applied to the scale development or validation paper. If the scale does not meet the acceptable guideline criteria (as outlined below or in the paper), then it does not get the point for that guideline question. The maximum number of points a scale can achieve is 13/13 or 100%. Below, we will provide a very high-level review of the information that is contained in the guideline.

Click here for a glossary

The guideline questions are grouped into the three stages of the scale development and validation process: item development, scale development, and scale evaluation.

Stage 1: Item development

Stage summary: Item development refers to the process by which the items of the scale are created. Each item is intended to capture the construct of interest either in part or in full. There are three main questions the reader can ask of the scale at this stage:

Is the construct that the scale attempting to measure defined clearly somewhere in the paper? The reader should look for a clear and precise definition of the construct. Scale developers can develop construct definitions by using a theory or data-driven approach depending on whether an agreed-upon theoretical framework of the construct exists or does not, respectively.
Is the item generation process discussed (e.g., via a literature review, the Delphi method, or crowd-sourcing)? The reader should look for any information at all (e.g., description of a procedure, pilot study reports, preliminary analyses) about how the items were generated (e.g., via literature review, Delphi method, or crowd-sourcing)
Does the final version of the items capture the construct as it has been defined by the authors? The reader should ensure that the items are related to the reported definition of the construct and also that they are clear and unambiguous.

Stage 2: Scale development

Stage summary: Though there are many different methods for developing a scale (e.g., classical test theory or item response theory), there are some components of the process that are consistent across methods. This section of the guideline includes seven questions.

Did the scale developers report the full initial set of items? The reader should ensure that the developers of the scale made the full initial set of items publicly available, either by reporting them in the main text of the paper, in an appendix, or in an online repository.
Does the test sample size meet the 10:1 minimum criteria? The reader should ensure that sample sizes for scale development studies follow the 10:1 (people to initial number of items) rule, though more participants is considered a positive feature. This rule pertains to the initial set of items, not the final version of the scale.
Did the scale developers perform an ECA, PCA, Rasch analysis, or similar test to determine the item to factor relationship? The reader should determine whether the scale developers reported using at least one scale development method (such as EFA, PCA, or Rasch) in their paper.
Did the scale developers describe how their determined the number of factors? The reader should verify that the scale developers reported how they determined or verified the number of factors that exist within the construct.
Did the scale developers provide factor loadings (EFA/CFA) or item fits (Rasch) of all items? The reader should look for quantitative values that indicate how the items in the scale relate to the construct of interest. These values can be in the form of factor loadings (if the scale development process used an EFA or CFA), communalities, or in the form of infit/outfit values (if used Rasch analysis).
Is there a description of the item removal process (e.g., using infit/outfit, factor loading minimum values, or cross-loading values)? The reader should determine whether items were removed from the final version of the scale. If items were removed, the reason (e.g., lack of fit or redundancy) should be explicitly mentioned; quantitative criteria should also be reported when possible.
Did the scale developers report the complete list of items included in the final version of the scale? The reader should look for the final version of the scale in the main text of the publication, in an appendix, or in an online repository.

Stage 3: Scale evaluation

Stage summary: Scale evaluation occurs after the original scale is created and attempts to answer the following three questions.

Did the scale developers include a factor structure test (e.g., second EFA, CFA, DIF, test of unidimensionality if using Rasch, or similar)? The reader should check to see if there is a test for factor structure. A confirmatory factor analysis on a new sample or a Differential Item Function (Rasch) are common approaches.
Was a measure of reliability (e.g., Cronbach’s α or McDonald’s ω_t or ω_h, Tarkkhonen’s Rho) reported? The reader should look for some test of the scale’s reliability in the paper. This can be completed using metrics such as McDonald’s ω_t or ω_h in addition to Cronbach’s coefficient α. A reasonable minimum threshold for reliability is ≥ 0.80.
Was a test of validity (e.g., predictive, concurrent, convergent, discriminant) reported? The reader should look for assessments of validity of the scale. Typical this includes a comparison of the scale of interest to others in the field and see if there are any relationships that exist.

Note: the guideline includes recommendation for minimum acceptable criteria. Where possible we provide citations for recommendations with exact values. These values can and should be interepreted as heuristics. We do not encourage you to discard a scale simply because it does not meet a specific threshold that is suggested here.