When developing a new questionnaire, scale, or test, researchers typically ask a panel of subject-matter experts to rate each candidate item for relevance to the construct being measured. The expert ratings are then summarized into content validity indices that quantify how well the items represent the intended construct.
The contentValidity package implements the standard set
of content validity indices used in nursing, education, psychology, and
health sciences research:
The package ships with cvi_example, a simulated set of
expert ratings for a 10-item depression screening instrument, with 6
expert raters using a 4-point relevance scale (1 = not relevant, 4 =
highly relevant).
The simplest place to start is icvi(), which gives the
proportion of experts rating each item as 3 or 4:
icvi(cvi_example)
#> item1 item2 item3 item4 item5 item6 item7 item8
#> 1.0000000 1.0000000 1.0000000 0.8333333 0.6666667 1.0000000 0.8333333 1.0000000
#> item9 item10
#> 0.5000000 1.0000000By Polit and Beck (2006), I-CVI ≥ 0.78 is considered excellent with six or more experts. Items 5 and 9 in our example (0.67 and 0.50) would be flagged for revision.
Plain I-CVI doesn’t correct for chance agreement. With small panels, a high I-CVI can be partly luck. Modified kappa addresses this:
mod_kappa(cvi_example)
#> item1 item2 item3 item4 item5 item6 item7 item8
#> 1.0000000 1.0000000 1.0000000 0.8160920 0.5646259 1.0000000 0.8160920 1.0000000
#> item9 item10
#> 0.2727273 1.0000000Notice that item 9 drops sharply (0.50 → 0.27) — its I-CVI was inflated by chance agreement among only six raters.
Aiken’s V uses the full rating scale rather than dichotomizing relevant/not-relevant. A “4” contributes more than a “3”:
Two scale-level indices summarize content validity across all items:
scvi_ave(cvi_example) # average of I-CVIs
#> [1] 0.8833333
scvi_ua(cvi_example) # proportion of items with universal agreement
#> [1] 0.6Polit and Beck (2006) recommend reporting both. S-CVI/Ave ≥ 0.90 indicates excellent overall content validity; S-CVI/UA gives a stricter view of how many items achieved unanimous endorsement.
content_validity() is the workhorse function for routine
analysis. It returns the complete set of item-level and scale-level
indices in one tidy structure:
result <- content_validity(cvi_example)
result
#> Content Validity Analysis
#> -------------------------
#> Experts: 6
#> Items: 10
#>
#> Item-level indices:
#> item icvi mod_kappa aiken_v
#> item1 1.0000 1.0000 1.0000
#> item2 1.0000 1.0000 0.8889
#> item3 1.0000 1.0000 0.7778
#> item4 0.8333 0.8161 0.6667
#> item5 0.6667 0.5646 0.6111
#> item6 1.0000 1.0000 0.9444
#> item7 0.8333 0.8161 0.6667
#> item8 1.0000 1.0000 0.9444
#> item9 0.5000 0.2727 0.5000
#> item10 1.0000 1.0000 0.9444
#>
#> Scale-level indices:
#> scvi_ave scvi_ua mean_kappa
#> 0.8833 0.6000 0.8470The result is an object you can subset, just like a list:
result$items
#> item icvi mod_kappa aiken_v
#> 1 item1 1.0000000 1.0000000 1.0000000
#> 2 item2 1.0000000 1.0000000 0.8888889
#> 3 item3 1.0000000 1.0000000 0.7777778
#> 4 item4 0.8333333 0.8160920 0.6666667
#> 5 item5 0.6666667 0.5646259 0.6111111
#> 6 item6 1.0000000 1.0000000 0.9444444
#> 7 item7 0.8333333 0.8160920 0.6666667
#> 8 item8 1.0000000 1.0000000 0.9444444
#> 9 item9 0.5000000 0.2727273 0.5000000
#> 10 item10 1.0000000 1.0000000 0.9444444
result$scale
#> scvi_ave scvi_ua mean_kappa
#> 0.8833333 0.6000000 0.8469537apa_table() formats the result for journal
manuscripts:
apa_table(result)
#> Item I-CVI Modified Kappa Aiken's V Interpretation
#> 1 item1 1.00 1.00 1.00 Excellent
#> 2 item2 1.00 1.00 0.89 Excellent
#> 3 item3 1.00 1.00 0.78 Excellent
#> 4 item4 0.83 0.82 0.67 Excellent
#> 5 item5 0.67 0.56 0.61 Fair
#> 6 item6 1.00 1.00 0.94 Excellent
#> 7 item7 0.83 0.82 0.67 Excellent
#> 8 item8 1.00 1.00 0.94 Excellent
#> 9 item9 0.50 0.27 0.50 Poor
#> 10 item10 1.00 1.00 0.94 ExcellentFor R Markdown output (HTML, PDF, Word), use the appropriate format
argument. The function returns a knitr::kable() object that
renders correctly in your document:
| Item | I-CVI | Modified Kappa | Aiken’s V | Interpretation |
|---|---|---|---|---|
| item1 | 1.00 | 1.00 | 1.00 | Excellent |
| item2 | 1.00 | 1.00 | 0.89 | Excellent |
| item3 | 1.00 | 1.00 | 0.78 | Excellent |
| item4 | 0.83 | 0.82 | 0.67 | Excellent |
| item5 | 0.67 | 0.56 | 0.61 | Fair |
| item6 | 1.00 | 1.00 | 0.94 | Excellent |
| item7 | 0.83 | 0.82 | 0.67 | Excellent |
| item8 | 1.00 | 1.00 | 0.94 | Excellent |
| item9 | 0.50 | 0.27 | 0.50 | Poor |
| item10 | 1.00 | 1.00 | 0.94 | Excellent |
CVR uses a different rating convention: each expert classifies items
as essential, useful but not
essential, or not necessary. Use Lawshe-style
coding (1 = essential, 2 = useful, 3 = not necessary) and call
cvr() directly:
# 10 experts rating 3 items on Lawshe's scale
lawshe_ratings <- matrix(
c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, # 8 of 10 essential
1, 1, 1, 2, 2, 2, 2, 3, 3, 3, # 3 of 10 essential
1, 1, 1, 1, 1, 1, 1, 1, 1, 1), # 10 of 10 essential
nrow = 10,
dimnames = list(NULL, paste0("item", 1:3))
)
cvr(lawshe_ratings)
#> item1 item2 item3
#> 0.6 -0.4 1.0Compare each item’s CVR to the critical value for the panel size, using the corrected Wilson, Pan, and Schumsky (2012) thresholds:
cvr_critical(n_experts = 10) # one-tailed alpha = 0.05
#> [1] 0.8
cvr_critical(n_experts = 10, alpha = 0.01)
#> [1] 1In this example, only items 1 and 3 (CVR = 0.6 and 1.0) reach the critical value of 0.8 at α = 0.05. Item 2 would be revised or dropped.
If you use contentValidity in published research, please
run:
to get a current citation block in BibTeX or plain-text form.
Aiken, L. R. (1985). Three coefficients for analyzing the reliability and validity of ratings. Educational and Psychological Measurement, 45(1), 131–142. https://doi.org/10.1177/0013164485451012
Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Lynn, M. R. (1986). Determination and quantification of content validity. Nursing Research, 35(6), 382–385. https://doi.org/10.1097/00006199-198611000-00017
Polit, D. F., & Beck, C. T. (2006). The content validity index: Are you sure you know what’s being reported? Critique and recommendations. Research in Nursing & Health, 29(5), 489–497. https://doi.org/10.1002/nur.20147
Polit, D. F., Beck, C. T., & Owen, S. V. (2007). Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Research in Nursing & Health, 30(4), 459–467. https://doi.org/10.1002/nur.20199
Wilson, F. R., Pan, W., & Schumsky, D. A. (2012). Recalculation of the critical values for Lawshe’s content validity ratio. Measurement and Evaluation in Counseling and Development, 45(3), 197–210. https://doi.org/10.1177/0748175612440286