Getting Started with contentValidity

library(contentValidity)

Background

When developing a new questionnaire, scale, or test, researchers typically ask a panel of subject-matter experts to rate each candidate item for relevance to the construct being measured. The expert ratings are then summarized into content validity indices that quantify how well the items represent the intended construct.

The contentValidity package implements the standard set of content validity indices used in nursing, education, psychology, and health sciences research:

The example dataset

The package ships with cvi_example, a simulated set of expert ratings for a 10-item depression screening instrument, with 6 expert raters using a 4-point relevance scale (1 = not relevant, 4 = highly relevant).

data(cvi_example)
head(cvi_example)
#>         item1 item2 item3 item4 item5 item6 item7 item8 item9 item10
#> expert1     4     3     3     2     3     4     3     4     2      4
#> expert2     4     4     3     3     2     4     3     4     3      4
#> expert3     4     4     4     3     3     4     2     4     2      3
#> expert4     4     4     3     4     3     3     3     4     3      4
#> expert5     4     3     4     3     4     4     3     4     3      4
#> expert6     4     4     3     3     2     4     4     3     2      4

Item-level analysis

The simplest place to start is icvi(), which gives the proportion of experts rating each item as 3 or 4:

icvi(cvi_example)
#>     item1     item2     item3     item4     item5     item6     item7     item8 
#> 1.0000000 1.0000000 1.0000000 0.8333333 0.6666667 1.0000000 0.8333333 1.0000000 
#>     item9    item10 
#> 0.5000000 1.0000000

By Polit and Beck (2006), I-CVI ≥ 0.78 is considered excellent with six or more experts. Items 5 and 9 in our example (0.67 and 0.50) would be flagged for revision.

Plain I-CVI doesn’t correct for chance agreement. With small panels, a high I-CVI can be partly luck. Modified kappa addresses this:

mod_kappa(cvi_example)
#>     item1     item2     item3     item4     item5     item6     item7     item8 
#> 1.0000000 1.0000000 1.0000000 0.8160920 0.5646259 1.0000000 0.8160920 1.0000000 
#>     item9    item10 
#> 0.2727273 1.0000000

Notice that item 9 drops sharply (0.50 → 0.27) — its I-CVI was inflated by chance agreement among only six raters.

Aiken’s V uses the full rating scale rather than dichotomizing relevant/not-relevant. A “4” contributes more than a “3”:

aiken_v(cvi_example, lo = 1, hi = 4)
#>     item1     item2     item3     item4     item5     item6     item7     item8 
#> 1.0000000 0.8888889 0.7777778 0.6666667 0.6111111 0.9444444 0.6666667 0.9444444 
#>     item9    item10 
#> 0.5000000 0.9444444

Scale-level analysis

Two scale-level indices summarize content validity across all items:

scvi_ave(cvi_example)   # average of I-CVIs
#> [1] 0.8833333
scvi_ua(cvi_example)    # proportion of items with universal agreement
#> [1] 0.6

Polit and Beck (2006) recommend reporting both. S-CVI/Ave ≥ 0.90 indicates excellent overall content validity; S-CVI/UA gives a stricter view of how many items achieved unanimous endorsement.

All indices at once

content_validity() is the workhorse function for routine analysis. It returns the complete set of item-level and scale-level indices in one tidy structure:

result <- content_validity(cvi_example)
result
#> Content Validity Analysis
#> -------------------------
#> Experts: 6
#> Items:   10
#> 
#> Item-level indices:
#>    item   icvi mod_kappa aiken_v
#>   item1 1.0000    1.0000  1.0000
#>   item2 1.0000    1.0000  0.8889
#>   item3 1.0000    1.0000  0.7778
#>   item4 0.8333    0.8161  0.6667
#>   item5 0.6667    0.5646  0.6111
#>   item6 1.0000    1.0000  0.9444
#>   item7 0.8333    0.8161  0.6667
#>   item8 1.0000    1.0000  0.9444
#>   item9 0.5000    0.2727  0.5000
#>  item10 1.0000    1.0000  0.9444
#> 
#> Scale-level indices:
#>   scvi_ave    scvi_ua mean_kappa 
#>     0.8833     0.6000     0.8470

The result is an object you can subset, just like a list:

result$items
#>      item      icvi mod_kappa   aiken_v
#> 1   item1 1.0000000 1.0000000 1.0000000
#> 2   item2 1.0000000 1.0000000 0.8888889
#> 3   item3 1.0000000 1.0000000 0.7777778
#> 4   item4 0.8333333 0.8160920 0.6666667
#> 5   item5 0.6666667 0.5646259 0.6111111
#> 6   item6 1.0000000 1.0000000 0.9444444
#> 7   item7 0.8333333 0.8160920 0.6666667
#> 8   item8 1.0000000 1.0000000 0.9444444
#> 9   item9 0.5000000 0.2727273 0.5000000
#> 10 item10 1.0000000 1.0000000 0.9444444
result$scale
#>   scvi_ave    scvi_ua mean_kappa 
#>  0.8833333  0.6000000  0.8469537

Publication-ready tables

apa_table() formats the result for journal manuscripts:

apa_table(result)
#>      Item I-CVI Modified Kappa Aiken's V Interpretation
#> 1   item1  1.00           1.00      1.00      Excellent
#> 2   item2  1.00           1.00      0.89      Excellent
#> 3   item3  1.00           1.00      0.78      Excellent
#> 4   item4  0.83           0.82      0.67      Excellent
#> 5   item5  0.67           0.56      0.61           Fair
#> 6   item6  1.00           1.00      0.94      Excellent
#> 7   item7  0.83           0.82      0.67      Excellent
#> 8   item8  1.00           1.00      0.94      Excellent
#> 9   item9  0.50           0.27      0.50           Poor
#> 10 item10  1.00           1.00      0.94      Excellent

For R Markdown output (HTML, PDF, Word), use the appropriate format argument. The function returns a knitr::kable() object that renders correctly in your document:

apa_table(result, format = "markdown")
Content validity indices (N = 6 experts, 10 items; S-CVI/Ave = 0.88, S-CVI/UA = 0.60).
Item I-CVI Modified Kappa Aiken’s V Interpretation
item1 1.00 1.00 1.00 Excellent
item2 1.00 1.00 0.89 Excellent
item3 1.00 1.00 0.78 Excellent
item4 0.83 0.82 0.67 Excellent
item5 0.67 0.56 0.61 Fair
item6 1.00 1.00 0.94 Excellent
item7 0.83 0.82 0.67 Excellent
item8 1.00 1.00 0.94 Excellent
item9 0.50 0.27 0.50 Poor
item10 1.00 1.00 0.94 Excellent

Lawshe’s CVR

CVR uses a different rating convention: each expert classifies items as essential, useful but not essential, or not necessary. Use Lawshe-style coding (1 = essential, 2 = useful, 3 = not necessary) and call cvr() directly:

# 10 experts rating 3 items on Lawshe's scale
lawshe_ratings <- matrix(
  c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2,    # 8 of 10 essential
    1, 1, 1, 2, 2, 2, 2, 3, 3, 3,    # 3 of 10 essential
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1),   # 10 of 10 essential
  nrow = 10,
  dimnames = list(NULL, paste0("item", 1:3))
)

cvr(lawshe_ratings)
#> item1 item2 item3 
#>   0.6  -0.4   1.0

Compare each item’s CVR to the critical value for the panel size, using the corrected Wilson, Pan, and Schumsky (2012) thresholds:

cvr_critical(n_experts = 10)        # one-tailed alpha = 0.05
#> [1] 0.8
cvr_critical(n_experts = 10, alpha = 0.01)
#> [1] 1

In this example, only items 1 and 3 (CVR = 0.6 and 1.0) reach the critical value of 0.8 at α = 0.05. Item 2 would be revised or dropped.

Citing the package

If you use contentValidity in published research, please run:

citation("contentValidity")

to get a current citation block in BibTeX or plain-text form.

References

Aiken, L. R. (1985). Three coefficients for analyzing the reliability and validity of ratings. Educational and Psychological Measurement, 45(1), 131–142. https://doi.org/10.1177/0013164485451012

Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x

Lynn, M. R. (1986). Determination and quantification of content validity. Nursing Research, 35(6), 382–385. https://doi.org/10.1097/00006199-198611000-00017

Polit, D. F., & Beck, C. T. (2006). The content validity index: Are you sure you know what’s being reported? Critique and recommendations. Research in Nursing & Health, 29(5), 489–497. https://doi.org/10.1002/nur.20147

Polit, D. F., Beck, C. T., & Owen, S. V. (2007). Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Research in Nursing & Health, 30(4), 459–467. https://doi.org/10.1002/nur.20199

Wilson, F. R., Pan, W., & Schumsky, D. A. (2012). Recalculation of the critical values for Lawshe’s content validity ratio. Measurement and Evaluation in Counseling and Development, 45(3), 197–210. https://doi.org/10.1177/0748175612440286