Type: Package
Title: Digital Epidemiological Analysis and Visualization Tools
Version: 0.1.2
Description: Integrates methods for epidemiological analysis, modeling, and visualization, including functions for summary statistics, SIR (Susceptible-Infectious-Recovered) modeling, DALY (Disability-Adjusted Life Years) estimation, age standardization, diagnostic test evaluation, NLP (Natural Language Processing) keyword extraction, clinical trial power analysis, survival analysis, SNP (Single Nucleotide Polymorphism) association, and machine learning methods such as logistic regression, k-means clustering, Random Forest, and Support Vector Machine (SVM). Includes datasets for prevalence estimation, SIR modeling, genomic analysis, clinical trials, DALY, diagnostic tests, and survival analysis. Methods are based on Gelman et al. (2013) <doi:10.1201/b16018> and Wickham et al. (2019, ISBN:9781492052040>.
License: MIT + file LICENSE
Encoding: UTF-8
Language: en-US
LazyData: true
RoxygenNote: 7.3.2
Depends: R (≥ 4.0.0)
Imports: deSolve, sp, tm, glmnet, caret, survival
Suggests: kernlab, randomForest, stats, knitr, rmarkdown, quarto, usethis, testthat (≥ 3.0.0)
VignetteBuilder: knitr
Config/testthat/edition: 3
NeedsCompilation: no
Author: Esther Atsabina Wanjala [aut, cre]
Maintainer: Esther Atsabina Wanjala <digitalepidemiologist23@gmail.com>
Packaged: 2025-11-03 07:06:11 UTC; atsab
Repository: CRAN
Date/Publication: 2025-11-05 20:30:13 UTC

EpidigiR: Digital Epidemiological Analysis and Visualization Tools

Description

Provides tools for epidemiological analysis, modeling, and visualization. Includes functions for summary statistics, SIR (Susceptible-Infectious-Recovered) modeling, DALY (Disability-Adjusted Life Years) calculations, age standardization, diagnostic test evaluation, NLP (Natural Language Processing) keyword extraction, clinical trial power analysis, survival analysis, SNP (Single Nucleotide Polymorphism) association, logistic regression, k-means clustering, Random Forest, and Support Vector Machine (SVM) modeling. Supplies datasets for prevalence, SIR modeling, genomic analysis, machine learning, NLP, clinical trials, DALY, age standardization, diagnostic tests, and survival analysis.

Author(s)

Maintainer: Esther Atsabina Wanjala digitalepidemiologist23@gmail.com


Clinical Trials Data for Epidemiological Analysis

Description

A dataset containing simulated clinical trial data for analyzing treatment outcomes, suitable for power calculations, logistic regression, Random Forest, and SVM.

Usage

clinical_data

Format

A data frame with 200 rows and 6 columns:

trial_id

Character, unique identifier for each trial participant.

arm

Character, treatment arm (e.g., Treatment, Control).

outcome

Numeric, binary outcome (0 = no response, 1 = response).

age

Numeric, patient age (years).

health_score

Numeric, baseline health score (0 to 100).

dose

Numeric, treatment dose level (e.g., 0 for control, 1 for low dose, 2 for high dose).

Source

Simulated data for demonstration purposes.

Examples

data("clinical_data")
clinical_data$outcome <- as.factor(clinical_data$outcome)
epi_model(clinical_data, formula = outcome ~ age + health_score + dose, type = "logistic")
epi_model(clinical_data, formula = outcome ~ age + health_score + dose, type = "rf")
epi_visualize(clinical_data, x = "arm", y = "outcome", type = "boxplot")

DALY Data for Global Health Burden

Description

A dataset containing simulated data for calculating Disability-Adjusted Life Years (DALY) in epidemiological studies.

Usage

daly_data

Format

A data frame with 20 rows and 3 columns:

group

Character, population group (e.g., region or age group).

yll

Numeric, years of life lost due to premature mortality.

yld

Numeric, years lived with disability.

Source

Simulated data for demonstration purposes.

Examples

data("daly_data")
epi_analyze(daly_data, outcome = NULL, type = "daly")

Diagnostic Test Data for Evaluation

Description

A dataset containing simulated data for evaluating diagnostic tests in epidemiological studies.

Usage

diagnostic_data

Format

A data frame with 10 rows and 5 columns:

test_id

Character, unique identifier for each test.

true_positives

Numeric, number of true positive results.

false_positives

Numeric, number of false positive results.

true_negatives

Numeric, number of true negative results.

false_negatives

Numeric, number of false negative results.

Source

Simulated data for demonstration purposes.

Examples

data("diagnostic_data")
epi_analyze(diagnostic_data, outcome = NULL, type = "diagnostic")

Performs summary statistics, SIR modeling, DALY calculation, age standardization, diagnostic test evaluation, or NLP keyword extraction.

Description

Performs summary statistics, SIR modeling, DALY calculation, age standardization, diagnostic test evaluation, or NLP keyword extraction.

Usage

epi_analyze(
  data,
  outcome,
  population,
  group = NULL,
  type = c("summary", "sir", "daly", "age_standardize", "diagnostic", "nlp"),
  ...
)

Arguments

data

Input data frame with relevant columns (e.g., cases, population, yll, yld, text).

outcome

Outcome column name (character, e.g., "cases").

population

Population column name (character, e.g., "population", required for summary).

group

Grouping column name (character, e.g., "region", optional).

type

Analysis type: "summary", "sir", "daly", "age_standardize", "diagnostic", "nlp".

...

Additional parameters (e.g., N, beta, gamma for SIR).

Value

A data frame with analysis results.


Unified Epidemiological Modeling

Description

Performs clinical trial power calculation, survival analysis, SNP association, logistic regression, k-means clustering, Random Forest, or SVM.

Usage

epi_model(
  data,
  formula = NULL,
  type = c("power", "survival", "snp", "logistic", "kmeans", "rf", "svmRadial"),
  ...
)

Arguments

data

Input data frame with relevant columns (e.g., outcome, genotypes).

formula

Model formula (optional, for survival/logistic/rf/svmRadial, e.g., "outcome ~ x").

type

Model type: "power", "survival", "snp", "logistic", "kmeans", "rf", "svmRadial".

...

Additional parameters (e.g., n, effect_size for power; k for kmeans).

Value

A data frame or list with model results.


Disease Prevalence Data by Region and Age Group

Description

A dataset containing disease prevalence data across different regions and age groups, including spatial coordinates.

Usage

epi_prevalence

Format

A data frame with 12 rows and 7 columns:

region

Character, region name (e.g., North, South, East, West).

age_group

Character, age group (e.g., 0-19, 20-59, 60+).

cases

Numeric, number of disease cases.

population

Numeric, population size in the region and age group.

prevalence

Numeric, prevalence percentage (cases / population * 100).

lat

Numeric, latitude for spatial mapping.

lon

Numeric, longitude for spatial mapping.

Source

Simulated data for demonstration purposes.

Examples

data("epi_prevalence")
library(sp)
coordinates(epi_prevalence) <- ~lon+lat
epi_visualize(epi_prevalence, x = "prevalence", type = "map")
epi_analyze(epi_prevalence,outcome    = "cases",population = "population",type       = "summary")
if (interactive()) {
epi_prevalence$region_id <- as.numeric(factor(epi_prevalence$region))
epi_visualize(epi_prevalence, x = "region_id", y = "prevalence", type = "scatter")
with(epi_prevalence, axis(1, at = unique(region_id), labels = levels(factor(region))))
}

Flexible Epidemiological Visualization

Description

Creates visualizations for prevalence mapping, epidemic curves, or general plots (scatter, boxplot).

Usage

epi_visualize(
  data,
  x,
  y = NULL,
  type = c("map", "curve", "scatter", "boxplot"),
  ...
)

Arguments

data

Input data frame or SpatialPolygonsDataFrame with relevant columns.

x

X-axis column name (character, e.g., "region").

y

Y-axis column name (character, e.g., "prevalence", optional).

type

Plot type: "map", "curve", "scatter", "boxplot".

...

Additional plotting parameters (e.g., main, xlab).

Value

A plot (spplot for maps, base R for others).


Genomic SNP-Case Data

Description

A dataset containing simulated genotypes and case-control status for SNP association analysis.

Usage

geno_data

Format

A data frame with 100 rows and 2 columns:

genotypes

Numeric, genotype (0 = AA, 1 = Aa, 2 = aa).

cases

Numeric, case (1) or control (0) status.

Source

Simulated data for demonstration purposes.

Examples

data("geno_data")
epi_model(geno_data, type = "snp")

Machine Learning Data for Disease Risk Prediction

Description

A dataset containing simulated patient data for predicting disease risk, suitable for logistic regression, clustering, Random Forest, and SVM.

Usage

ml_data

Format

A data frame with 100 rows and 5 columns:

outcome

Numeric, binary disease status (0 = healthy, 1 = diseased).

age

Numeric, patient age (years).

exposure

Numeric, exposure level (0 to 1, e.g., environmental risk).

genetic_risk

Numeric, genetic risk score (0 to 1).

region

Character, region name (e.g., North, South, East, West).

Source

Simulated data for demonstration purposes.

Examples

data("ml_data")
ml_data$outcome <- as.factor(ml_data$outcome)
epi_model(ml_data, formula = outcome ~ age + exposure + genetic_risk, type = "logistic")
epi_model(ml_data, formula = outcome ~ age + exposure + genetic_risk, type = "rf")
epi_visualize(ml_data, x = "age", y = "outcome", type = "scatter")

NLP Data for Epidemiological Text Analysis

Description

A dataset containing simulated epidemiological text data, such as outbreak reports or health alerts, for NLP analysis.

Usage

nlp_data

Format

A data frame with 100 rows and 2 columns:

id

Character, unique identifier for each text entry.

text

Character, text content (e.g., outbreak descriptions, health reports).

Source

Simulated data for demonstration purposes.

Examples

data("nlp_data")
epi_analyze(nlp_data, outcome = NULL, type = "nlp", n = 5)

SIR Model Simulation Data

Description

A dataset containing simulated SIR model outputs for a population of 1000.

Usage

sir_data

Format

A data frame with 50 rows and 4 columns:

time

Numeric, time point (1 to 50 days).

Susceptible

Numeric, number of susceptible individuals.

Infected

Numeric, number of infected individuals.

Recovered

Numeric, number of recovered individuals.

Source

Generated using epi_analyze(type = "sir", N = 1000, beta = 0.3, gamma = 0.1, days = 50).

Examples

data("sir_data")
epi_visualize(sir_data, x = "time", y = "Infected", type = "curve")

Survey Data for Age Standardization

Description

A dataset containing simulated survey data for age standardization in epidemiological studies.

Usage

survey_data

Format

A data frame with 20 rows and 3 columns:

age_group

Character, age group (e.g., 0-19, 20-39, 40-59, 60+).

rates

Numeric, disease rates (e.g., cases per 1000).

pop_weights

Numeric, population weights for standardization.

Source

Simulated data for demonstration purposes.

Examples

data("survey_data")
epi_analyze(survey_data, outcome = NULL, type = "age_standardize")

Survival Analysis Data

Description

A dataset containing simulated data for survival analysis in epidemiological studies.

Usage

survival_data

Format

A data frame with 100 rows and 3 columns:

id

Character, unique identifier for each individual.

time

Numeric, time to event (e.g., years until death or censoring).

status

Numeric, event status (0 = censored, 1 = event occurred).

Source

Simulated data for demonstration purposes.

Examples

data("survival_data")
epi_model(survival_data, type = "survival")
epi_visualize(survival_data, x = "time", y = "status", type = "scatter")