Title: An R Package for Extended Behavior Genetics Analysis
Version: 1.5.0
Description: Provides functions for behavior genetics analysis, including variance component model identification [Hunter et al. (2021) <doi:10.1007/s10519-021-10055-x>], calculation of relatedness coefficients using path-tracing methods [Wright (1922) <doi:10.1086/279872>; McArdle & McDonald (1984) <doi:10.1111/j.2044-8317.1984.tb00802.x>], inference of relatedness, pedigree conversion, and simulation of multi-generational family data [Lyu et al. (2024) <doi:10.1101/2024.12.19.629449>]. For a full overview, see [Garrison et al. (2024) <doi:10.21105/joss.06203>].
License: GPL-3
URL: https://github.com/R-Computing-Lab/BGmisc/, https://r-computing-lab.github.io/BGmisc/
BugReports: https://github.com/R-Computing-Lab/BGmisc/issues
Depends: R (≥ 3.5.0)
Imports: data.table, igraph, Matrix, stats, stringr, methods
Suggests: corrplot, discord, dplyr, EasyMx, ggpedigree, ggplot2, kinship2, knitr, OpenMx, rmarkdown, testthat (≥ 3.0.0), tidyverse, withr
VignetteBuilder: knitr
Config/testthat/edition: 3
Encoding: UTF-8
Language: en-US
LazyData: true
RoxygenNote: 7.3.2
NeedsCompilation: no
Packaged: 2025-07-19 01:56:19 UTC; smaso
Author: S. Mason Garrison ORCID iD [aut, cre], Michael D. Hunter ORCID iD [aut], Xuanyu Lyu ORCID iD [aut], Rachel N. Good [ctb], Jonathan D. Trattner ORCID iD [aut] (url: https://www.jdtrat.com/), S. Alexandra Burt ORCID iD [aut]
Maintainer: S. Mason Garrison <garrissm@wfu.edu>
Repository: CRAN
Date/Publication: 2025-07-19 02:40:08 UTC

BGmisc: An R Package for Extended Behavior Genetics Analysis

Description

Provides functions for behavior genetics analysis, including variance component model identification [Hunter et al. (2021) doi:10.1007/s10519-021-10055-x], calculation of relatedness coefficients using path-tracing methods [Wright (1922) doi:10.1086/279872; McArdle & McDonald (1984) doi:10.1111/j.2044-8317.1984.tb00802.x], inference of relatedness, pedigree conversion, and simulation of multi-generational family data [Lyu et al. (2024) doi:10.1101/2024.12.19.629449]. For a full overview, see [Garrison et al. (2024) doi:10.21105/joss.06203].

Author(s)

Maintainer: S. Mason Garrison garrissm@wfu.edu (ORCID)

Authors:

Other contributors:

See Also

Useful links:


Construct Adjacency Matrix for Parent-Child Relationships Using Beta Method This function constructs an adjacency matrix for parent-child relationships using a beta method. It identifies parent-child pairs based on the specified component of relatedness.

Description

Construct Adjacency Matrix for Parent-Child Relationships Using Beta Method This function constructs an adjacency matrix for parent-child relationships using a beta method. It identifies parent-child pairs based on the specified component of relatedness.

Usage

.adjBeta(
  ped,
  component,
  adjBeta_method = 5,
  parList = NULL,
  lastComputed = 0,
  lens = NULL,
  saveable = FALSE,
  resume = FALSE,
  save_path = NULL,
  verbose = FALSE,
  save_rate_parlist = NULL,
  update_rate = NULL,
  checkpoint_files = NULL,
  config,
  ...
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

component

character. Which component of the pedigree to return. See Details.

adjBeta_method

numeric The method to use for computing the building the adjacency_method matrix when using the "beta" build

parList

a list of parent-child relationships

lastComputed

the last computed index

lens

a vector of the lengths of the parent-child relationships

saveable

logical. If TRUE, save the intermediate results to disk

resume

logical. If TRUE, resume from a checkpoint

save_path

character. The path to save the checkpoint files

verbose

logical. If TRUE, print progress through stages of algorithm

save_rate_parlist

numeric. The rate at which to save the intermediate results by parent list. If NULL, defaults to save_rate*1000

update_rate

numeric. The rate at which to print progress

checkpoint_files

a list of checkpoint files

config

a configuration list that passes parameters to the function

...

additional arguments to be passed to ped2com


Construct Adjacency Matrix for Parent-Child Relationships Using Direct Method

Description

This function constructs an adjacency matrix for parent-child relationships using a direct method. It identifies parent-child pairs based on the specified component of relatedness.

Usage

.adjDirect(
  ped,
  component,
  saveable,
  resume,
  save_path,
  verbose,
  lastComputed,
  checkpoint_files,
  update_rate,
  parList,
  lens,
  save_rate_parlist,
  config,
  ...
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

component

character. Which component of the pedigree to return. See Details.

saveable

logical. If TRUE, save the intermediate results to disk

resume

logical. If TRUE, resume from a checkpoint

save_path

character. The path to save the checkpoint files

verbose

logical. If TRUE, print progress through stages of algorithm

lastComputed

the last computed index

checkpoint_files

a list of checkpoint files

update_rate

numeric. The rate at which to print progress

parList

a list of parent-child relationships

lens

a vector of the lengths of the parent-child relationships

save_rate_parlist

numeric. The rate at which to save the intermediate results by parent list. If NULL, defaults to save_rate*1000

config

a configuration list that passes parameters to the function

...

additional arguments to be passed to ped2com


Construct Adjacency Matrix for Parent-Child Relationships Using Indexed Method

Description

Construct Adjacency Matrix for Parent-Child Relationships Using Indexed Method

Usage

.adjIndexed(
  ped,
  component,
  saveable,
  resume,
  save_path,
  verbose,
  lastComputed,
  checkpoint_files,
  update_rate,
  parList,
  lens,
  save_rate_parlist,
  config
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

component

character. Which component of the pedigree to return. See Details.

saveable

logical. If TRUE, save the intermediate results to disk

resume

logical. If TRUE, resume from a checkpoint

save_path

character. The path to save the checkpoint files

verbose

logical. If TRUE, print progress through stages of algorithm

lastComputed

the last computed index

checkpoint_files

a list of checkpoint files

update_rate

numeric. The rate at which to print progress

parList

a list of parent-child relationships

lens

a vector of the lengths of the parent-child relationships

save_rate_parlist

numeric. The rate at which to save the intermediate results by parent list. If NULL, defaults to save_rate*1000

config

a configuration list that passes parameters to the function


Construct Adjacency Matrix for Parent-Child Relationships

Description

Construct Adjacency Matrix for Parent-Child Relationships

Usage

.adjLoop(
  ped,
  component,
  saveable,
  resume,
  save_path,
  verbose,
  lastComputed,
  checkpoint_files,
  update_rate,
  parList,
  lens,
  save_rate_parlist,
  config,
  ...
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

component

character. Which component of the pedigree to return. See Details.

saveable

logical. If TRUE, save the intermediate results to disk

resume

logical. If TRUE, resume from a checkpoint

save_path

character. The path to save the checkpoint files

verbose

logical. If TRUE, print progress through stages of algorithm

lastComputed

the last computed index

checkpoint_files

a list of checkpoint files

update_rate

numeric. The rate at which to print progress

parList

a list of parent-child relationships

lens

a vector of the lengths of the parent-child relationships

save_rate_parlist

numeric. The rate at which to save the intermediate results by parent list. If NULL, defaults to save_rate*1000

config

a configuration list that passes parameters to the function

...

additional arguments to be passed to ped2com


Assign parent values based on component type

Description

Assign parent values based on component type

Usage

.assignParentValue(component)

Arguments

component

character. Which component of the pedigree to return. See Details.


collapse Names

Description

This function combines the 'name_given' and 'name_given_pieces' columns in a data frame.

Usage

.collapseNames.legacy(verbose, df_temp)

Arguments

verbose

A logical value indicating whether to print messages.

df_temp

A data frame containing the columns to be combined.


Convert Sparse Relationship Matrices to Kinship Links

Description

Convert Sparse Relationship Matrices to Kinship Links

Usage

.com2links.legacy(
  rel_pairs_file = "dataRelatedPairs.csv",
  ad_ped_matrix = NULL,
  mit_ped_matrix = mt_ped_matrix,
  mt_ped_matrix = NULL,
  cn_ped_matrix = NULL,
  write_buffer_size = 1000,
  update_rate = 1000,
  gc = TRUE,
  writetodisk = TRUE,
  verbose = FALSE,
  legacy = FALSE,
  outcome_name = "data",
  drop_upper_triangular = TRUE,
  ...
)

Arguments

rel_pairs_file

File path to write related pairs to (CSV format).

ad_ped_matrix

Matrix of additive genetic relatedness coefficients.

mit_ped_matrix

Matrix of mitochondrial relatedness coefficients. Alias: mt_ped_matrix.

mt_ped_matrix

Matrix of mitochondrial relatedness coefficients.

cn_ped_matrix

Matrix of common nuclear relatedness coefficients.

write_buffer_size

Number of related pairs to write to disk at a time.

update_rate

Numeric. Frequency (in iterations) at which progress messages are printed.

gc

Logical. If TRUE, performs garbage collection via gc to free memory.

writetodisk

Logical. If TRUE, writes the related pairs to disk; if FALSE, returns a data frame.

verbose

Logical. If TRUE, prints progress messages.

legacy

Logical. If TRUE, uses the legacy branch of the function.

outcome_name

Character string representing the outcome name (used in file naming).

drop_upper_triangular

Logical. If TRUE, drops the upper triangular portion of the matrix.

...

Additional arguments to be passed to com2links


Convert Pedigree Matrices to Related Pairs File (Legacy)

Description

This legacy function converts pedigree matrices into a related pairs file.

Usage

.com2links.og(
  rel_pairs_file = "dataRelatedPairs.csv",
  ad_ped_matrix = NULL,
  mit_ped_matrix = mt_ped_matrix,
  mt_ped_matrix = NULL,
  cn_ped_matrix = NULL,
  update_rate = 500,
  verbose = FALSE,
  outcome_name = "data",
  ...
)

Arguments

rel_pairs_file

File path to write related pairs to (CSV format).

ad_ped_matrix

Matrix of additive genetic relatedness coefficients.

mit_ped_matrix

Matrix of mitochondrial relatedness coefficients. Alias: mt_ped_matrix.

mt_ped_matrix

Matrix of mitochondrial relatedness coefficients.

cn_ped_matrix

Matrix of common nuclear relatedness coefficients.

update_rate

Numeric. Frequency (in iterations) at which progress messages are printed.

verbose

Logical. If TRUE, prints progress messages.

outcome_name

Character string representing the outcome name (used in file naming).

...

Additional arguments to be passed to com2links


Combine Columns

Description

This function combines two columns, handling conflicts and merging non-conflicting data.

Usage

.combine_columns.legacy(col1, col2)

Arguments

col1

The first column to combine.

col2

The second column to combine.

Value

A list with the combined column and a flag indicating if the second column should be retained.


Compute the transpose multiplication for the relatedness matrix

Description

Compute the transpose multiplication for the relatedness matrix

Usage

.computeTranspose(r2, transpose_method = "tcrossprod", verbose = FALSE)

Arguments

r2

a relatedness matrix

transpose_method

character. The method to use for computing the transpose. Options are "tcrossprod", "crossprod", or "star"

verbose

logical. If TRUE, print progress through stages of algorithm

Details

The algorithms and methodologies used in this function are further discussed and exemplified in the vignette titled "examplePedigreeFunctions". For more advanced scenarios and detailed explanations, consult this vignette.


Check for Pattern Rows

Description

This function counts the number of rows containing specific patterns.

Usage

.countPatternRows.legacy(file)

Arguments

file

A data frame containing the GEDCOM file.

Value

A list with the number of rows containing each pattern.


Extract Information from Line

Description

This function extracts information from a line based on a specified type.

Usage

.extract_info.legacy(line, type)

Arguments

line

A character string representing a line from a GEDCOM file.

type

A character string representing the type of information to extract.

Value

A character string with the extracted information.


Fisher's r to z transformation and back

Description

These functions convert correlation coefficients (r) to Fisher's z-scores and back.

Usage

.fisherz(rho)

Arguments

rho

A numeric vector of correlation coefficients.

Details

Credit to the psych package for the Fisher's r to z transformation.

Value

A numeric vector of transformed values.


Fisher's r to z transformation and back

Description

Fisher's r to z transformation and back

Usage

.fisherz2r(z)

Arguments

z

A numeric vector of Fisher's z-scores.

Value

A numeric vector of transformed values.


Get the Modal Value of a Vector

Description

This function calculates the modal value of a vector, which is the most frequently occurring value. If the vector is empty or contains only NA values, it returns NA.

Usage

.getModalValue(x)

Arguments

x

A vector of values.

Value

The modal value of the vector. If the vector is empty or contains only NA values, returns NA.


Load or compute the inverse diagonal matrix

Description

Load or compute the inverse diagonal matrix

Usage

.loadOrComputeInverseDiagonal(r, isChild, checkpoint_files, config)

Arguments

r

The relatedness matrix.

config

A list containing configuration parameters such as 'resume', 'verbose', and 'saveable'.

Value

The computed inverse diagonal matrix.


Load or compute the isChild matrix

Description

Load or compute the isChild matrix

Usage

.loadOrComputeIsChild(ped, checkpoint_files, config)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

checkpoint_files

A list of checkpoint file paths.

@keywords internal

config

A list containing configuration parameters such as 'resume', 'verbose', and 'saveable'.


Load or compute the isPar matrix

Description

Load or compute the isPar matrix

Usage

.loadOrComputeIsPar(iss, jss, parVal, ped, checkpoint_files, config)

Arguments

iss

The row indices of the sparse matrix.

jss

The column indices of the sparse matrix.

parVal

The value to assign to the non-zero elements of the sparse matrix.

ped

The pedigree dataset.

checkpoint_files

A list of checkpoint file paths.

config

A list containing configuration parameters such as 'resume', 'verbose', and 'saveable'.


parent-child adjacency data

Description

parent-child adjacency data

Usage

.loadOrComputeParList(
  checkpoint_files,
  config,
  ped = NULL,
  parList = NULL,
  lens = NULL
)

Arguments

checkpoint_files

A list of checkpoint file paths.

config

A list containing configuration parameters such as 'resume', 'verbose', and 'saveable'.

ped

a pedigree dataset. Needs ID, momID, and dadID columns

parList

A list of parent-child adjacency data.

lens

A vector of lengths for each parent-child relationship.

Value

A list containing the parent-child adjacency data either loaded from a checkpoint or initialized.


Assign momID and dadID based on family mapping

Description

This function assigns mother and father IDs to individuals in the data frame based on the mapping of family IDs to parent IDs.

Usage

.mapFAMC2parents.legacy(df_temp, family_to_parents)

Arguments

df_temp

A data frame containing individual information.

family_to_parents

A list mapping family IDs to parent IDs.

Value

A data frame with added momID and dad_ID columns.


Create a mapping of family IDs to parent IDs

Description

This function creates a mapping from family IDs to the IDs of the parents.

Usage

.mapFAMS2parents.legacy(df_temp)

Arguments

df_temp

A data frame containing information about individuals.

Value

A list mapping family IDs to parent IDs.


Post-process GEDCOM Data Frame

Description

Post-process GEDCOM Data Frame

Usage

.postProcessGedcom.legacy(
  df_temp,
  remove_empty_cols = TRUE,
  combine_cols = TRUE,
  add_parents = TRUE,
  skinny = TRUE,
  verbose = FALSE
)

Arguments

df_temp

A data frame containing information about individuals.

remove_empty_cols

A logical value indicating whether to remove columns with all missing values.

combine_cols

A logical value indicating whether to combine columns with duplicate values.

add_parents

A logical value indicating whether to add parents to the data frame.

skinny

A logical value indicating whether to return a skinny data frame.

verbose

A logical value indicating whether to print messages.

Value

A data frame with processed information.


Process parents information

Description

This function processes the dataframe to add momID and dadID columns.

Usage

.processParents.legacy(df_temp, datasource)

Arguments

df_temp

A data frame containing information about individuals.

Value

A data frame with added momID and dadID columns.


Process a GEDCOM Tag

Description

Extracts and assigns a value to a specified field in 'vars' if the pattern is present. Returns both the updated variable list and a flag indicating whether the tag was matched.

Usage

.process_tag.legacy(
  tag,
  field_name,
  pattern_rows,
  line,
  vars,
  extractor = NULL,
  mode = "replace"
)

Arguments

tag

The GEDCOM tag (e.g., "SEX", "CAST", etc.).

field_name

The name of the variable to assign to in 'vars'.

pattern_rows

Output from 'countPatternRows()'.

line

The GEDCOM line to parse.

vars

The current list of variables to update.

Value

A list with updated 'vars' and a 'matched' flag.


Read a GEDCOM File

Description

This function reads a GEDCOM file and parses it into a structured data frame of individuals. Inspired by https://raw.githubusercontent.com/jjfitz/readgedcom/master/R/read_gedcom.R

Usage

.readGedcom.legacy(
  file_path,
  verbose = FALSE,
  add_parents = TRUE,
  remove_empty_cols = TRUE,
  combine_cols = TRUE,
  skinny = FALSE,
  update_rate = 1000,
  post_process = TRUE,
  ...
)

Arguments

file_path

The path to the GEDCOM file.

verbose

A logical value indicating whether to print messages.

add_parents

A logical value indicating whether to add parents to the data frame.

remove_empty_cols

A logical value indicating whether to remove columns with all missing values.

combine_cols

A logical value indicating whether to combine columns with duplicate values.

skinny

A logical value indicating whether to return a skinny data frame.

update_rate

numeric. The rate at which to print progress

...

Additional arguments to be passed to the function.

Value

A data frame containing information about individuals, with the following potential columns: - 'id': ID of the individual - ‘momID': ID of the individual’s mother - ‘dadID': ID of the individual’s father - 'sex': Sex of the individual - 'name': Full name of the individual - 'name_given': First name of the individual - 'name_surn': Last name of the individual - 'name_marriedsurn': Married name of the individual - 'name_nick': Nickname of the individual - 'name_npfx': Name prefix - 'name_nsfx': Name suffix - 'birth_date': Birth date of the individual - 'birth_lat': Latitude of the birthplace - 'birth_long': Longitude of the birthplace - 'birth_place': Birthplace of the individual - 'death_caus': Cause of death - 'death_date': Death date of the individual - 'death_lat': Latitude of the place of death - 'death_long': Longitude of the place of death - 'death_place': Place of death of the individual - 'attribute_caste': Caste of the individual - 'attribute_children': Number of children of the individual - 'attribute_description': Description of the individual - 'attribute_education': Education of the individual - 'attribute_idnumber': Identification number of the individual - 'attribute_marriages': Number of marriages of the individual - 'attribute_nationality': Nationality of the individual - 'attribute_occupation': Occupation of the individual - 'attribute_property': Property owned by the individual - 'attribute_religion': Religion of the individual - 'attribute_residence': Residence of the individual - 'attribute_ssn': Social security number of the individual - 'attribute_title': Title of the individual - 'FAMC': ID(s) of the family where the individual is a child - 'FAMS': ID(s) of the family where the individual is a spouse


Compute the null space of a matrix

Description

Compute the null space of a matrix

Usage

Null(M)

Arguments

M

a matrix of which the null space is desired

Details

The method uses the QR factorization to determine a basis for the null space of a matrix. This is sometimes also called the orthogonal complement of a matrix. As implemented, this function is identical to the function of the same name in the MASS package.


Create a properly formatted parent row for the pedigree

Description

Create a properly formatted parent row for the pedigree

Usage

addParentRow(template_row, newID, sex, momID = NA, dadID = NA)

Arguments

template_row

A single row from ped, used as a template for column structure

newID

The new parent's ID

sex

The new parent's sex value (e.g., 0 for female, 1 for male, or "F"/"M")

momID

The new parent's mother ID (default is NA)

dadID

The new parent's father ID (default is NA)

Value

A single-row dataframe for the new parent


addPersonToTree A function to add a new person to an existing pedigree data.frame.

Description

addPersonToTree A function to add a new person to an existing pedigree data.frame.

Usage

addPersonToPed(
  ped,
  name = NULL,
  sex = NULL,
  momID = NA,
  dadID = NA,
  twinID = NULL,
  personID = NULL,
  zygosity = NULL,
  notes = NULL,
  url = NULL,
  overwrite = FALSE
)

Arguments

ped

A data.frame representing the existing pedigree.

name

Optional. A character string representing the name of the new person. If not provided, the name will be set to NA.

sex

A value representing the sex of the new person.

momID

Optional. The ID of the mother of the new person. If not provided, it will be set to NA.

dadID

Optional. The ID of the father of the new person. If not provided, it will be set to NA.

twinID

Optional. The ID of the twin of the new person. If not provided, it will be set to NA.

personID

Optional. The ID of the new person. If not provided, it will be generated as the maximum existing personID + 1.

zygosity

Optional. A character string indicating the zygosity of the new person. If not provided, it will be set to NA.

notes

Optional. A character string for notes about the new person. If not provided, it will be set to NA.

url

Optional. A URL column for the new person. If not provided, it will be set to NA.

overwrite

Logical. If TRUE, the function will overwrite an existing person with the same personID. If FALSE, it will stop if a person with the same personID already exists.

Value

A data.frame with the new person added to the existing pedigree.


Add addRowlessParents

Description

This function adds parents who appear in momID or dadID but are missing from ID

Usage

addRowlessParents(ped, verbose, validation_results)

Arguments

ped

A dataframe representing the pedigree data with columns 'ID', 'dadID', and 'momID'.

verbose

A logical flag indicating whether to print progress and validation messages to the console.

validation_results

validation results


Generate or Adjust Number of Kids per Couple Based on Mating Rate

Description

This function generates or adjusts the number of kids per couple in a generation based on the specified average and whether the count should be randomly determined.

Usage

adjustKidsPerCouple(nMates, kpc, rd_kpc = TRUE)

Arguments

nMates

Integer, the number of mated pairs in the generation.

kpc

Number of kids per couple. An integer >= 2 that determines how many kids each fertilized mated couple will have in the pedigree. Default value is 3. Returns an error when kpc equals 1.

rd_kpc

logical. If TRUE, the number of kids per mate will be randomly generated from a poisson distribution with mean kpc. If FALSE, the number of kids per mate will be fixed at kpc.

Value

A numeric vector with the generated or adjusted number of kids per couple.


Apply Tag Mappings to a Line

Description

Iterates over a list of tag mappings and, if a tag matches the line, updates the record.

Usage

applyTagMappings(line, record, pattern_rows, tag_mappings)

Arguments

line

A character string from the GEDCOM file.

record

A named list representing the individual's record.

pattern_rows

A list with GEDCOM tag counts.

tag_mappings

A list of lists. Each sublist should define: - tag: the GEDCOM tag, - field: the record field to update, - mode: either "replace" or "append", - extractor: (optional) a custom extraction function.

Value

A list with the updated record (record) and a logical flag (matched).


Assign Couple IDs

Description

This subfunction assigns a unique couple ID to each mated pair in the generation. Unmated individuals are assigned NA for their couple ID.

Usage

assignCoupleIDs(df_Ngen)

assignCoupleIds(df_Ngen)

Arguments

df_Ngen

The dataframe for the current generation, including columns for individual IDs and spouse IDs.

Value

The input dataframe augmented with a 'coupleId' column, where each mated pair has a unique identifier.


Process Generation Connections

Description

This function processes connections between each two generations in a pedigree simulation. It marks individuals as parents, sons, or daughters based on their generational position and relationships. The function also handles the assignment of couple IDs, manages single and coupled individuals, and establishes parent-offspring links across generations.

Usage

buildBetweenGenerations(
  df_Fam,
  Ngen,
  sizeGens,
  verbose = FALSE,
  marR,
  sexR,
  kpc,
  rd_kpc,
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  code_male = "M",
  code_female = "F"
)

Arguments

df_Fam

A data frame containing the simulated pedigree information up to the current generation. Must include columns for family ID, individual ID, generation number, spouse ID (spID), and sex. This data frame is updated in place to include flags for parental status (ifparent), son status (ifson), and daughter status (ifdau), as well as couple IDs.

Ngen

Number of generations. An integer >= 2 that determines how many generations the simulated pedigree will have. The first generation is always a fertilized couple. The last generation has no mated individuals.

sizeGens

A numeric vector containing the sizes of each generation within the pedigree.

verbose

logical If TRUE, message progress through stages of algorithm

marR

Mating rate. A numeric value ranging from 0 to 1 which determines the proportion of mated (fertilized) couples in the pedigree within each generation. For instance, marR = 0.5 suggests 50 percent of the offspring in a specific generation will be mated and have their offspring.

sexR

Sex ratio of offspring. A numeric value ranging from 0 to 1 that determines the proportion of males in all offspring in this pedigree. For instance, 0.4 means 40 percent of the offspring will be male.

kpc

Number of kids per couple. An integer >= 2 that determines how many kids each fertilized mated couple will have in the pedigree. Default value is 3. Returns an error when kpc equals 1.

rd_kpc

logical. If TRUE, the number of kids per mate will be randomly generated from a poisson distribution with mean kpc. If FALSE, the number of kids per mate will be fixed at kpc.

personID

character. Name of the column in ped for the person ID variable

momID

character. Name of the column in ped for the mother ID variable

dadID

character. Name of the column in ped for the father ID variable

code_male

The value to use for males. Default is "M"

code_female

The value to use for females. Default is "F"

Details

The function iterates through each generation, starting from the second, to establish connections based on mating and parentage. For the first generation, it sets the parental status directly. For subsequent generations, it calculates the number of couples, the expected number of offspring, and assigns offspring to parents. It handles gender-based assignments for sons and daughters, and deals with the nuances of single individuals and couple formation. The function relies on external functions 'assignCoupleIds' and 'adjustKidsPerCouple' to handle specific tasks related to couple ID assignment and offspring number adjustments, respectively.

Value

The function updates the 'df_Fam' data frame in place, adding or modifying columns related to parental and offspring status, as well as assigning unique couple IDs. It does not return a value explicitly.


Parse Tree

Description

Parse Tree

Usage

buildTreeGrid(tree_lines)

Arguments

tree_lines

A character vector containing the lines of the tree structure.

Value

A data frame containing the tree structure.


Process Generations for Pedigree Simulation

Description

This function iterates through generations in a pedigree simulation, assigning IDs, creating data frames, determining sexes, and managing pairing within each generation.

Usage

buildWithinGenerations(
  sizeGens,
  marR,
  sexR,
  Ngen,
  verbose = FALSE,
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  code_male = "M",
  code_female = "F"
)

Arguments

sizeGens

A numeric vector containing the sizes of each generation within the pedigree.

marR

Mating rate. A numeric value ranging from 0 to 1 which determines the proportion of mated (fertilized) couples in the pedigree within each generation. For instance, marR = 0.5 suggests 50 percent of the offspring in a specific generation will be mated and have their offspring.

sexR

Sex ratio of offspring. A numeric value ranging from 0 to 1 that determines the proportion of males in all offspring in this pedigree. For instance, 0.4 means 40 percent of the offspring will be male.

Ngen

Number of generations. An integer >= 2 that determines how many generations the simulated pedigree will have. The first generation is always a fertilized couple. The last generation has no mated individuals.

verbose

logical If TRUE, message progress through stages of algorithm

personID

character. Name of the column in ped for the person ID variable

momID

character. Name of the column in ped for the mother ID variable

dadID

character. Name of the column in ped for the father ID variable

code_male

The value to use for males. Default is "M"

code_female

The value to use for females. Default is "F"

Value

A data frame representing the simulated pedigree, including columns for family ID ('fam'),


calcAllGens A function to calculate the number of individuals in each generation. This is a supporting function for simulatePedigree.

Description

calcAllGens A function to calculate the number of individuals in each generation. This is a supporting function for simulatePedigree.

Usage

calcAllGens(kpc, Ngen, marR)

allGens(kpc, Ngen, marR)

Arguments

kpc

Number of kids per couple (integer >= 2).

Ngen

Number of generations (integer >= 1).

marR

Mating rate (numeric value ranging from 0 to 1).

Value

Returns a vector containing the number of individuals in every generation.


calcFamilySize A function to calculate the total number of individuals in a pedigree given parameters. This is a supporting function for function simulatePedigree

Description

calcFamilySize A function to calculate the total number of individuals in a pedigree given parameters. This is a supporting function for function simulatePedigree

Usage

calcFamilySize(kpc, Ngen, marR)

famSizeCal(kpc, Ngen, marR)

Arguments

kpc

Number of kids per couple (integer >= 2).

Ngen

Number of generations (integer >= 1).

marR

Mating rate (numeric value ranging from 0 to 1).

Value

Returns a numeric value indicating the total pedigree size.


calcFamilySizeByGen An internal supporting function for simulatePedigree.

Description

calcFamilySizeByGen An internal supporting function for simulatePedigree.

Usage

calcFamilySizeByGen(kpc, Ngen, marR)

sizeAllGens(kpc, Ngen, marR)

Arguments

kpc

Number of kids per couple (integer >= 2).

Ngen

Number of generations (integer >= 1).

marR

Mating rate (numeric value ranging from 0 to 1).

Value

Returns a vector including the number of individuals in every generation.


Calculate Confidence Intervals for Correlation Coefficients

Description

This function calculates confidence intervals for correlation coefficients using different methods.

Usage

calculateCIs(
  tbl,
  rho_var,
  se_var,
  doubleentered = FALSE,
  method = "raykov",
  adjust_base = 1,
  design_effect_m = NULL,
  design_effect_rho = NULL,
  design_effect_m_col = NULL,
  design_effect_rho_col = NULL,
  conf_level = 0.95
)

Arguments

tbl

A data frame or tibble containing the correlation coefficient and standard error variables.

rho_var

The name of the column in tbl that contains the correlation coefficients.

se_var

The name of the column in tbl that contains the standard errors.

doubleentered

Logical. If TRUE, the function assumes that the correlation coefficients are double-entered, which adjusts the standard errors accordingly. Default is FALSE.

method

The method to use for calculating the confidence intervals. Options are "raykov", "fisherz", "doubleenteredconserv", or "doubleentered".

adjust_base

A numeric value to adjust the standard errors. Default is 1.

design_effect_m

A numeric value for the design effect related to the mean. Default is NULL.

design_effect_rho

A numeric value for the design effect related to the correlation. Default is NULL.

design_effect_m_col

A character string specifying the column name for the design effect related to the mean. Default is NULL.

design_effect_rho_col

A character string specifying the column name for the design effect related to the correlation. Default is NULL.

conf_level

The confidence level for the intervals. Default is 0.95.

Value

A modified version of tbl with additional columns for the confidence intervals and related statistics. Everything uses adjusted standard errors, including confidence intervals, z-tests, and p-values.

Examples

tbl <- data.frame(rho = c(0.5, 0.7, 0.3), se = c(0.1, 0.2, 0.05))
calculateCIs(tbl, rho_var = "rho", se_var = "se", method = "raykov")


Falconer's Formula

Description

Use Falconer's formula to solve for H using the observed correlations for two groups of any two levels of relatednesses.

Usage

calculateH(r1, r2, obsR1, obsR2)

Arguments

r1

Relatedness coefficient of the first group.

r2

Relatedness coefficient of the second group.

obsR1

Observed correlation between members of the first group.

obsR2

Observed correlation between members of the second group.

Details

This generalization of Falconer's formula provides a method to calculate heritability by using the observed correlations for two groups of any two relatednesses. This function solves for H using the formula:

H^2 = \frac{obsR1 - obsR2}{r1 - r2}

where r1 and r2 are the relatedness coefficients for the first and second group, respectively, and obsR1 and obsR2 are the observed correlations.

Value

Heritability estimates ('heritability_estimates').


Calculate Relatedness Coefficient

Description

This function calculates the relatedness coefficient between two individuals based on their shared ancestry, as described by Wright (1922).

Usage

calculateRelatedness(
  generations = 2,
  path = NULL,
  full = TRUE,
  maternal = FALSE,
  empirical = FALSE,
  segregating = TRUE,
  total_a = 6800 * 1e+06,
  total_m = 16500,
  weight_a = 1,
  weight_m = 1,
  denom_m = FALSE,
  ...
)

related_coef(...)

Arguments

generations

Number of generations back of common ancestors the pair share.

path

Traditional method to count common ancestry, which is twice the number of generations removed from common ancestors. If not provided, it is calculated as 2*generations.

full

Logical. Indicates if the kin share both parents at the common ancestor's generation. Default is TRUE.

maternal

Logical. Indicates if the maternal lineage should be considered in the calculation.

empirical

Logical. Adjusts the coefficient based on empirical data, using the total number of nucleotides and other parameters.

segregating

Logical. Adjusts for segregating genes.

total_a

Numeric. Represents the total size of the autosomal genome in terms of nucleotides, used in empirical adjustment. Default is 6800*1000000.

total_m

Numeric. Represents the total size of the mitochondrial genome in terms of nucleotides, used in empirical adjustment. Default is 16500.

weight_a

Numeric. Represents the weight of phenotypic influence from additive genetic variance, used in empirical adjustment.

weight_m

Numeric. Represents the weight of phenotypic influence from mitochondrial effects, used in empirical adjustment.

denom_m

Logical. Indicates if 'total_m' and 'weight_m' should be included in the denominator of the empirical adjustment calculation.

...

Further named arguments that may be passed to another function.

Details

The relatedness coefficient between two people (b & c) is defined in relation to their common ancestors: r_{bc} = \sum \left(\frac{1}{2}\right)^{n+n'+1} (1+f_a)

Value

Relatedness Coefficient ('coef'): A measure of the genetic relationship between two individuals.

Examples

## Not run: 
# For full siblings, the relatedness coefficient is expected to be 0.5:
calculateRelatedness(generations = 1, full = TRUE)
# For half siblings, the relatedness coefficient is expected to be 0.25:
calculateRelatedness(generations = 1, full = FALSE)

## End(Not run)

Function to calculate summary statistics for all numeric variables This function calculates summary statistics for all numeric variables in a data.table. It is supposed to be used internally by the summarize_pedigree function.

Description

Function to calculate summary statistics for all numeric variables This function calculates summary statistics for all numeric variables in a data.table. It is supposed to be used internally by the summarize_pedigree function.

Usage

calculateSummaryDT(data, group_var, skip_var, five_num_summary = FALSE)

Arguments

data

A data.table containing the pedigree data.

group_var

A character string specifying the column name of the grouping variable.

skip_var

Character vector. Variables to exclude from summary calculations.

five_num_summary

Logical. If 'TRUE', includes the first quartile (Q1) and third quartile (Q3) in addition to the minimum, median, and maximum values.

Value

A data.table containing the summary statistics for all numeric variables.


Validates and Optionally Repairs Unique IDs in a Pedigree Dataframe

Description

This function takes a pedigree object and performs two main tasks: 1. Checks for the uniqueness of individual IDs. 2. Optionally repairs non-unique IDs based on a specified logic.

Usage

checkIDs(ped, verbose = FALSE, repair = FALSE)

Arguments

ped

A dataframe representing the pedigree data with columns 'ID', 'dadID', and 'momID'.

verbose

A logical flag indicating whether to print progress and validation messages to the console.

repair

A logical flag indicating whether to attempt repairs on non-unique IDs.

Value

Depending on 'repair' value, either returns a list containing validation results or a repaired dataframe

Examples

## Not run: 
ped <- data.frame(ID = c(1, 2, 2, 3), dadID = c(NA, 1, 1, 2), momID = c(NA, NA, 2, 2))
checkIDs(ped, verbose = TRUE, repair = FALSE)

## End(Not run)

Check for duplicated individual IDs

Description

This function checks for duplicated individual IDs in a pedigree.

Usage

checkIDuniqueness(ped, verbose = FALSE)

Arguments

ped

A dataframe representing the pedigree data with columns 'ID', 'dadID', and 'momID'.

verbose

A logical flag indicating whether to print progress and validation messages to the console.

Value

A list containing the results of the check


Validates and Optionally Repairs Parent IDs in a Pedigree Dataframe

Description

This function takes a pedigree object and performs two main tasks: 1. Checks for the validity of parent IDs, specifically looking for instances where only one parent ID is missing. 2. Optionally repairs the missing parent IDs based on a specified logic.

Usage

checkParentIDs(
  ped,
  verbose = FALSE,
  repair = FALSE,
  repairsex = repair,
  addphantoms = repair,
  parentswithoutrow = repair,
  famID = "famID",
  personID = "ID",
  momID = "momID",
  dadID = "dadID"
)

Arguments

ped

A dataframe representing the pedigree data with columns 'ID', 'dadID', and 'momID'.

verbose

A logical flag indicating whether to print progress and validation messages to the console.

repair

A logical flag indicating whether to attempt repairs on missing parent IDs.

repairsex

A logical flag indicating whether to attempt repairs on sex of the parents

addphantoms

A logical flag indicating whether to add phantom parents for missing parent IDs.

parentswithoutrow

A logical flag indicating whether to add parents without a row in the pedigree.

famID

Character. Column name for family IDs.

personID

Character. Column name for individual IDs.

momID

Character. Column name for maternal IDs.

dadID

Character. Column name for paternal IDs.

Value

Depending on the value of 'repair', either a list containing validation results or a repaired dataframe is returned.

Examples

## Not run: 
ped <- data.frame(ID = 1:4, dadID = c(NA, 1, 1, 2), momID = c(NA, NA, 2, 2))
checkParentIDs(ped, verbose = TRUE, repair = FALSE)

## End(Not run)

Check Parental Role Sex Consistency

Description

Validates sex coding consistency for a given parental role (momID or dadID).

Usage

checkParentSex(ped, parent_col, sex_col = "sex", verbose = FALSE)

Arguments

ped

Pedigree dataframe.

parent_col

The column name for parent IDs ("momID" or "dadID").

sex_col

The column name for sex coding. Default is "sex".

verbose

Logical, whether to print messages.

Value

A list containing role, unique sex codes, modal sex, inconsistent parents, and linked children.


Validate Pedigree Network Structure

Description

Checks for structural issues in pedigree networks, including: - Individuals with more than two parents. - Presence of cyclic parent-child relationships.

Usage

checkPedigreeNetwork(
  ped,
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  verbose = FALSE
)

Arguments

ped

Dataframe representing the pedigree.

personID

Character. Column name for individual IDs.

momID

Character. Column name for maternal IDs.

dadID

Character. Column name for paternal IDs.

verbose

Logical. If TRUE, print informative messages.

Value

List containing detailed validation results.

Examples

## Not run: 
results <- checkPedigreeNetwork(ped,
  personID = "ID",
  momID = "momID", dadID = "dadID", verbose = TRUE
)

## End(Not run)

Validates and Optionally Repairs Sex Coding in a Pedigree Dataframe

Description

This function checks and optionally modifies the coding of the biological 'sex' variable in a pedigree dataset. It serves two primary purposes: 1. Recodes the 'sex' variable based on specified codes for males and females, if provided. 2. Identifies and optionally repairs inconsistencies in sex coding that could break the algorithm for constructing genetic pedigrees.

Usage

checkSex(
  ped,
  code_male = NULL,
  code_female = NULL,
  verbose = FALSE,
  repair = FALSE,
  momID = "momID",
  dadID = "dadID"
)

Arguments

ped

A dataframe representing the pedigree data with a 'sex' column.

code_male

The current code used to represent males in the 'sex' column.

code_female

The current code used to represent females in the 'sex' column. If both are NULL, no recoding is performed.

verbose

A logical flag indicating whether to print progress and validation messages to the console.

repair

A logical flag indicating whether to attempt repairs on the sex coding.

momID

The column name for maternal IDs. Default is "momID".

dadID

The column name for paternal IDs. Default is "dadID".

Details

The validation process identifies: - The unique sex codes present in the dataset. - Whether individuals listed as fathers or mothers have inconsistent sex codes. - Instances where an individual's recorded sex does not align with their parental role.

If 'repair = TRUE', the function standardizes sex coding by: - Assigning individuals listed as fathers the most common male code in the dataset. - Assigning individuals listed as mothers the most common female code.

This function uses the terms 'male' and 'female' in a biological context, referring to chromosomal and other biologically-based characteristics necessary for constructing genetic pedigrees. The biological aspect of sex used in genetic analysis (genotype) is distinct from the broader, richer concept of gender identity (phenotype).

We recognize the importance of using language and methodologies that affirm and respect the full spectrum of gender identities. The developers of this package express unequivocal support for folx in the transgender and LGBTQ+ communities.

Value

Depending on the value of 'repair', either a list containing validation results or a repaired dataframe is returned.

Examples

## Not run: 
ped <- data.frame(ID = c(1, 2, 3), sex = c("M", "F", "M"))
checkSex(ped, code_male = "M", verbose = TRUE, repair = FALSE)

## End(Not run)

Check for within-row duplicates (self-parents, same mom/dad)

Description

This function checks for within-row duplicates in a pedigree.

Usage

checkWithinRowDuplicates(ped, verbose = FALSE)

Arguments

ped

A dataframe representing the pedigree data with columns 'ID', 'dadID', and 'momID'.

verbose

A logical flag indicating whether to print progress and validation messages to the console.

Value

A list containing the results of the check


collapse Names

Description

This function combines the 'name_given' and 'name_given_pieces' columns in a data frame.

Usage

collapseNames(verbose, df_temp)

Arguments

verbose

A logical value indicating whether to print messages.

df_temp

A data frame containing the columns to be combined.

Value

A data frame with the combined columns.


Description

This function processes one or more sparse relationship components (additive, mitochondrial, and common nuclear) and converts them into kinship link pairs. The resulting related pairs are either returned as a data frame or written to disk in CSV format.

Usage

com2links(
  rel_pairs_file = "dataRelatedPairs.csv",
  ad_ped_matrix = NULL,
  mit_ped_matrix = mt_ped_matrix,
  mt_ped_matrix = NULL,
  cn_ped_matrix = NULL,
  write_buffer_size = 1000,
  update_rate = 1000,
  gc = TRUE,
  writetodisk = TRUE,
  verbose = FALSE,
  legacy = FALSE,
  outcome_name = "data",
  drop_upper_triangular = TRUE,
  include_all_links_1ped = FALSE,
  ...
)

Arguments

rel_pairs_file

File path to write related pairs to (CSV format).

ad_ped_matrix

Matrix of additive genetic relatedness coefficients.

mit_ped_matrix

Matrix of mitochondrial relatedness coefficients. Alias: mt_ped_matrix.

mt_ped_matrix

Matrix of mitochondrial relatedness coefficients.

cn_ped_matrix

Matrix of common nuclear relatedness coefficients.

write_buffer_size

Number of related pairs to write to disk at a time.

update_rate

Numeric. Frequency (in iterations) at which progress messages are printed.

gc

Logical. If TRUE, performs garbage collection via gc to free memory.

writetodisk

Logical. If TRUE, writes the related pairs to disk; if FALSE, returns a data frame.

verbose

Logical. If TRUE, prints progress messages.

legacy

Logical. If TRUE, uses the legacy branch of the function.

outcome_name

Character string representing the outcome name (used in file naming).

drop_upper_triangular

Logical. If TRUE, drops the upper triangular portion of the matrix.

include_all_links_1ped

Logical. If TRUE, includes all links in the output. (Default is true when only one ped is provided)

...

Additional arguments to be passed to com2links

Value

A data frame of related pairs if writetodisk is FALSE; otherwise, writes the results to disk.


Combine Columns

Description

This function combines two columns, handling conflicts and merging non-conflicting data.

Usage

combine_columns(col1, col2)

Arguments

col1

The first column to combine.

col2

The second column to combine.

Value

A list with the combined column and a flag indicating if the second column should be retained.


comp2vech Turn a variance component relatedness matrix into its half-vectorization

Description

comp2vech Turn a variance component relatedness matrix into its half-vectorization

Usage

comp2vech(x, include.zeros = FALSE)

Arguments

x

Relatedness component matrix (can be a matrix, list, or object that inherits from 'Matrix').

include.zeros

logical. Whether to include all-zero rows. Default is FALSE.

Details

This function is a wrapper around the vech function, extending it to allow for blockwise matrices and specific classes. It facilitates the conversion of a variance component relatedness matrix into a half-vectorized form.

Value

The half-vectorization of the relatedness component matrix.

Examples

comp2vech(list(matrix(c(1, .5, .5, 1), 2, 2), matrix(1, 2, 2)))


Compute Parent Adjacency Matrix with Multiple Approaches

Description

Compute Parent Adjacency Matrix with Multiple Approaches

Usage

computeParentAdjacency(
  ped,
  component,
  adjacency_method = "direct",
  saveable,
  resume,
  save_path,
  verbose = FALSE,
  lastComputed = 0,
  checkpoint_files,
  update_rate,
  parList,
  lens,
  save_rate_parlist,
  adjBeta_method = NULL,
  config,
  ...
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

component

character. Which component of the pedigree to return. See Details.

adjacency_method

character. The method to use for computing the adjacency matrix. Options are "loop", "indexed", direct or beta

saveable

logical. If TRUE, save the intermediate results to disk

resume

logical. If TRUE, resume from a checkpoint

save_path

character. The path to save the checkpoint files

verbose

logical. If TRUE, print progress through stages of algorithm

lastComputed

the last computed index

checkpoint_files

a list of checkpoint files

update_rate

the rate at which to update the progress

parList

a list of parent-child relationships

lens

a vector of the lengths of the parent-child relationships

save_rate_parlist

numeric. The rate at which to save the intermediate results by parent list. If NULL, defaults to save_rate*1000

adjBeta_method

numeric The method to use for computing the building the adjacency_method matrix when using the "beta" build

config

a configuration list that passes parameters to the function

...

additional arguments to be passed to ped2com

Details

The algorithms and methodologies used in this function are further discussed and exemplified in the vignette titled "examplePedigreeFunctions". For more advanced scenarios and detailed explanations, consult this vignette.


Make Long Tree

Description

Make Long Tree

Usage

convertGrid2LongTree(tree_df, cols_to_pivot)

Arguments

tree_df

A data frame containing the tree structure.

cols_to_pivot

A character vector of column names to pivot.

Value

A long data frame containing the tree structure.


Count GEDCOM Pattern Rows

Description

Counts the number of lines in a file (passed as a data frame with column "X1") that match various GEDCOM patterns.

Usage

countPatternRows(file)

Arguments

file

A data frame with a column X1 containing GEDCOM lines.

Value

A list with counts of specific GEDCOM tag occurrences.


Create Data Frame for Generation

Description

This function creates a data frame for a specific generation within the simulated pedigree. It initializes the data frame with default values for family ID, individual ID, generation number, paternal ID, maternal ID, spouse ID, and sex. All individuals are initially set with NA for paternal, maternal, spouse IDs, and sex, awaiting further assignment.

Usage

createGenDataFrame(sizeGens, genIndex, idGen)

Arguments

sizeGens

A numeric vector containing the sizes of each generation within the pedigree.

genIndex

An integer representing the current generation index for which the data frame is being created.

idGen

A numeric vector containing the ID numbers to be assigned to individuals in the current generation.

Value

A data frame representing the initial structure for the individuals in the specified generation before any relationships (parental, spousal) are defined. The columns include family ID ('fam'), individual ID (‘id'), generation number ('gen'), father’s ID (‘pat'), mother’s ID ('mat'), spouse's ID ('spID'), and sex ('sex'), with NA values for paternal, maternal, and spouse IDs, and sex.

Examples

sizeGens <- c(3, 5, 4) # Example sizes for 3 generations
genIndex <- 2 # Creating data frame for the 2nd generation
idGen <- 101:105 # Example IDs for the 2nd generation
df_Ngen <- createGenDataFrame(sizeGens, genIndex, idGen)
print(df_Ngen)

Deduplicate pairs of IDs in a data frame

Description

Deduplicate pairs of IDs in a data frame

Usage

deduplicatePairs(df)

Arguments

df

A data frame with columns from_id and to_id

Value

A data frame with unique pairs of IDs


Determine Sex of Offspring

Description

This internal function assigns sexes to the offspring in a generation based on the specified sex ratio.

Usage

determineSex(idGen, sexR, code_male = "M", code_female = "F")

Arguments

idGen

Vector of IDs for the generation.

sexR

Numeric value indicating the sex ratio (proportion of males).

code_male

The value to use for males. Default is "M"

code_female

The value to use for females. Default is "F"

Value

Vector of sexes ("M" for male, "F" for female) for the offspring.


Description

dropLink A function to drop a person from his/her parents in the simulated pedigree data.frame. The person can be dropped by specifying his/her ID or by specifying the generation which the randomly to-be-dropped person is in. The function can separate one pedigree into two pedigrees. Separating into small pieces should be done by running the function multiple times. This is a supplementary function for simulatePedigree.

Usage

dropLink(
  ped,
  ID_drop = NA_integer_,
  gen_drop = 2,
  sex_drop = NA_character_,
  n_drop = 1
)

Arguments

ped

a pedigree simulated from simulatePedigree function or the same format

ID_drop

the ID of the person to be dropped from his/her parents.

gen_drop

the generation in which the randomly dropped person is. Will work if 'ID_drop' is not specified.

sex_drop

the biological sex of the randomly dropped person.

n_drop

the number of times the mutation happens.

Value

a pedigree with the dropped person's 'dadID' and 'momID' set to NA.


Error Function

Description

Error Function

Usage

efunc(error)

Arguments

error

error output

Value

Replaces error message (error) with NA


Match Members

Description

Match Members

Usage

extractMemberTable(text)

Arguments

text

A character string containing the text of a family tree in wiki format.

Value

A data frame containing information about the members of the family tree.


Extract Information from Line

Description

This function extracts information from a line based on a specified type.

Usage

extract_info(line, type)

Arguments

line

A character string representing a line from a GEDCOM file.

type

A character string representing the type of information to extract.

Value

A character string with the extracted information.


Function to find the biggest families in a pedigree This function finds the biggest families in a pedigree. It is supposed to be used internally by the summarize_pedigree function.

Description

Function to find the biggest families in a pedigree This function finds the biggest families in a pedigree. It is supposed to be used internally by the summarize_pedigree function.

Usage

findBiggest(foo_summary_dt, n_biggest, n_foo)

Arguments

foo_summary_dt

A data.table containing the summary statistics.

n_biggest

Integer. Number of largest lineages to return (sorted by count).

n_foo

An integer specifying the number of individuals in the summary.

Value

a data.table containing the biggest families in the pedigree.


Function to find the originating member for each line

Description

This function finds the originating member for each line in a pedigree. It is supposed to be used internally by the summarize_pedigree function.

Usage

findFounder(data, group_var, sort_var)

Arguments

data

A data.table containing the pedigree data.

sort_var

A character string specifying the column name to sort by.

Value

A data.table containing the originating member for each line.


This function finds the oldest families in a pedigree. It is supposed to be used internally by the summarize_pedigree function.

Description

This function finds the oldest families in a pedigree. It is supposed to be used internally by the summarize_pedigree function.

Usage

findOldest(foo_summary_dt, byr, n_oldest, n_foo)

Arguments

foo_summary_dt

A data.table containing the summary statistics.

byr

Character. Optional column name for birth year. Used to determine the oldest lineages.

n_oldest

Integer. Number of oldest lineages to return (sorted by birth year).

n_foo

An integer specifying the number of individuals in the summary.

Value

a data.table containing the oldest families in the pedigree.


fitComponentModel Fit the estimated variance components of a model to covariance data

Description

fitComponentModel Fit the estimated variance components of a model to covariance data

Usage

fitComponentModel(covmat, ...)

Arguments

covmat

The covariance matrix of the raw data, which may be blockwise.

...

Comma-separated relatedness component matrices representing the variance components of the model.

Details

This function fits the estimated variance components of a model to given covariance data. The rank of the component matrices is checked to ensure that the variance components are all identified. Warnings are issued if there are inconsistencies.

Value

A regression (linear model fitted with lm). The coefficients of the regression represent the estimated variance components.

Examples

## Not run: 
# install.packages("OpenMX")
data(twinData, package = "OpenMx")
sellVars <- c("ht1", "ht2")
mzData <- subset(twinData, zyg %in% c(1), c(selVars, "zyg"))
dzData <- subset(twinData, zyg %in% c(3), c(selVars, "zyg"))

fitComponentModel(
  covmat = list(cov(mzData[, selVars], use = "pair"), cov(dzData[, selVars], use = "pair")),
  A = list(matrix(1, nrow = 2, ncol = 2), matrix(c(1, 0.5, 0.5, 1), nrow = 2, ncol = 2)),
  C = list(matrix(1, nrow = 2, ncol = 2), matrix(1, nrow = 2, ncol = 2)),
  E = list(diag(1, nrow = 2), diag(1, nrow = 2))
)

## End(Not run)


Build adjacency list (4-way neighbors)

Description

Build adjacency list (4-way neighbors)

Usage

getGridNeighbors(cell, active_keys)

Arguments

cell

A data frame with columns Row and Column

Value

A character vector of neighboring cell keys


Extract Summary Text

Description

Extract Summary Text

Usage

getWikiTreeSummary(text)

Arguments

text

A character string containing the text of a family tree in wiki format.

Value

A character string containing the summary text.


Simulated pedigree with two extended families and an age-related hazard

Description

A dataset simulated to have an age-related hazard. There are two extended families that are sampled from the same population.

Usage

data(hazard)

Format

A data frame with 43 rows and 14 variables

Details

The variables are as follows:


identifyComponentModel Determine if a variance components model is identified

Description

identifyComponentModel Determine if a variance components model is identified

Usage

identifyComponentModel(..., verbose = TRUE)

Arguments

...

Comma-separated relatedness component matrices representing the variance components of the model.

verbose

logical. If FALSE, suppresses messages about identification; TRUE by default.

Details

This function checks the identification status of a given variance components model by examining the rank of the concatenated matrices of the components. If any components are not identified, their names are returned in the output.

Value

A list of length 2 containing:

Examples


identifyComponentModel(A = list(matrix(1, 2, 2)), C = list(matrix(1, 2, 2)), E = diag(1, 2))


Artificial pedigree data on eight families with inbreeding

Description

A dataset created purely from imagination that includes several types of inbreeding. Different kinds of inbreeding occur in each extended family.

Usage

data(inbreeding)

Format

A data frame (and ped object) with 134 rows and 7 variables

Details

The types of inbreeding are as follows:

Although not all of the above structures are technically inbreeding, they aim to test pedigree diagramming and path tracing algorithms. This dataset is not intended to represent any real individuals or families.

The variables are as follows:


Infer Relatedness Coefficient

Description

This function infers the relatedness coefficient between two groups based on the observed correlation between their additive genetic variance and shared environmental variance.

Usage

inferRelatedness(obsR, aceA = 0.9, aceC = 0, sharedC = 0)

relatedness(...)

Arguments

obsR

Numeric. Observed correlation between the two groups. Must be between -1 and 1.

aceA

Numeric. Proportion of variance attributable to additive genetic variance. Must be between 0 and 1. Default is 0.9.

aceC

Numeric. Proportion of variance attributable to shared environmental variance. Must be between 0 and 1. Default is 0.

sharedC

Numeric. Proportion of shared environment shared between the two individuals. Must be between 0 (no shared environment) and 1 (completely shared environment). Default is 0.

...

Further named arguments that may be passed to another function.

Details

The function uses the ACE (Additive genetic, Common environmental, and Unique environmental) model to infer the relatedness between two individuals or groups. By considering the observed correlation ('obsR'), the proportion of variance attributable to additive genetic variance ('aceA'), and the proportion of shared environmental variance ('aceC'), it calculates the relatedness coefficient.

Value

Numeric. The calculated relatedness coefficient ('est_r').

Examples

## Not run: 
# Infer the relatedness coefficient:
inferRelatedness(obsR = 0.5, aceA = 0.9, aceC = 0, sharedC = 0)

## End(Not run)

Initialize checkpoint files

Description

Initialize checkpoint files

Usage

initializeCheckpoint(
  config = list(verbose = FALSE, saveable = FALSE, resume = FALSE, save_path =
    "checkpoint/")
)

Initialize an Empty Individual Record

Description

Creates a named list with all GEDCOM fields set to NA.

Usage

initializeRecord(all_var_names)

Arguments

all_var_names

A character vector of variable names.

Value

A named list representing an empty individual record.


initialize_empty_df

Description

This function initializes an empty data frame with specified column names.

Usage

initialize_empty_df(relNames)

Arguments

relNames

A vector of column names to be included in the data frame.

Value

An empty data frame with specified column names.


evenInsert A function to insert m elements evenly into a length n vector.

Description

evenInsert A function to insert m elements evenly into a length n vector.

Usage

insertEven(m, n, verbose = FALSE)

evenInsert(m, n, verbose = FALSE)

Arguments

m

A numeric vector of length less than or equal to n. The elements to be inserted.

n

A numeric vector. The vector into which the elements of m will be inserted.

verbose

logical If TRUE, prints additional information. Default is FALSE.

Details

The function takes two vectors, m and n, and inserts the elements of m evenly into n. If the length of m is greater than the length of n, the vectors are swapped, and the insertion proceeds. The resulting vector is a combination of m and n, with the elements of m evenly distributed within n.

Value

Returns a numeric vector with the elements of m evenly inserted into n.

See Also

SimPed for the main function that uses this supporting function.


Determine isChild Status, isChild is the 'S' matrix from RAM

Description

Determine isChild Status, isChild is the 'S' matrix from RAM

Usage

isChild(isChild_method, ped)

Arguments

isChild_method

method to determine isChild status

ped

pedigree data frame

Value

isChild 'S' matrix


Load or compute a checkpoint

Description

Load or compute a checkpoint

Usage

loadOrComputeCheckpoint(
  file,
  compute_fn,
  config,
  message_resume = NULL,
  message_compute = NULL
)

Arguments

file

The file path to load the checkpoint from.

compute_fn

The function to compute the checkpoint if it doesn't exist.

config

A list containing configuration parameters such as 'resume', 'verbose', and 'saveable'.

message_resume

Optional message to display when resuming from a checkpoint.

message_compute

Optional message to display when computing the checkpoint.

Value

The loaded or computed checkpoint.


makeInbreeding A function to create inbred mates in the simulated pedigree data.frame. Inbred mates can be created by specifying their IDs or the generation the inbred mate should be created. When specifying the generation, inbreeding between siblings or 1st cousin needs to be specified. This is a supplementary function for simulatePedigree.

Description

makeInbreeding A function to create inbred mates in the simulated pedigree data.frame. Inbred mates can be created by specifying their IDs or the generation the inbred mate should be created. When specifying the generation, inbreeding between siblings or 1st cousin needs to be specified. This is a supplementary function for simulatePedigree.

Usage

makeInbreeding(
  ped,
  ID_mate1 = NA_integer_,
  ID_mate2 = NA_integer_,
  verbose = FALSE,
  gen_inbred = 2,
  type_inbred = "sib"
)

Arguments

ped

A data.frame in the same format as the output of simulatePedigree.

ID_mate1

A vector of ID of the first mate. If not provided, the function will randomly select two individuals from the second generation.

ID_mate2

A vector of ID of the second mate.

verbose

logical. If TRUE, print progress through stages of algorithm

gen_inbred

A vector of generation of the twin to be imputed.

type_inbred

A character vector indicating the type of inbreeding. "sib" for sibling inbreeding and "cousin" for cousin inbreeding.

Details

This function creates inbred mates in the simulated pedigree data.frame. This function's purpose is to evaluate the effect of inbreeding on model fitting and parameter estimation. In case it needs to be said, we do not condone inbreeding in real life. But we recognize that it is a common practice in some fields to create inbred strains for research purposes.

Value

Returns a data.frame with some inbred mates.


makeTwins A function to impute twins in the simulated pedigree data.frame. Twins can be imputed by specifying their IDs or by specifying the generation the twin should be imputed. This is a supplementary function for simulatePedigree.

Description

makeTwins A function to impute twins in the simulated pedigree data.frame. Twins can be imputed by specifying their IDs or by specifying the generation the twin should be imputed. This is a supplementary function for simulatePedigree.

Usage

makeTwins(
  ped,
  ID_twin1 = NA_integer_,
  ID_twin2 = NA_integer_,
  gen_twin = 2,
  verbose = FALSE,
  zygosity = "MZ"
)

Arguments

ped

A data.frame in the same format as the output of simulatePedigree.

ID_twin1

A vector of ID of the first twin.

ID_twin2

A vector of ID of the second twin.

gen_twin

A vector of generation of the twin to be imputed.

verbose

logical. If TRUE, print progress through stages of algorithm

zygosity

A character string indicating the zygosity of the twins. Default is "MZ" for monozygotic twins.

Value

Returns a data.frame with MZ twins information added as a new column.


Assign momID and dadID based on family mapping

Description

This function assigns mother and father IDs to individuals in the data frame based on the mapping of family IDs to parent IDs.

Usage

mapFAMC2parents(df_temp, family_to_parents)

Arguments

df_temp

A data frame containing individual information.

family_to_parents

A list mapping family IDs to parent IDs.

Value

A data frame with added momID and dad_ID columns.


Create a Mapping from Family IDs to Parent IDs

Description

This function scans the data frame and creates a mapping of family IDs to the corresponding parent IDs.

Usage

mapFAMS2parents(df_temp)

Arguments

df_temp

A data frame produced by readGedcom().

Value

A list mapping family IDs to parent information.


Mark and Assign children

Description

This subfunction marks individuals in a generation as potential sons, daughters, or parents based on their relationships and assigns unique couple IDs. It processes the assignment of roles and relationships within and between generations in a pedigree simulation.

Usage

markPotentialChildren(
  df_Ngen,
  i,
  Ngen,
  sizeGens,
  CoupleF,
  code_male = "M",
  code_female = "F"
)

Arguments

df_Ngen

A data frame for the current generation being processed. It must include columns for individual IDs ('id'), spouse IDs ('spID'), sex ('sex'), and any previously assigned roles ('ifparent', 'ifson', 'ifdau').

i

Integer, the index of the current generation being processed.

Ngen

Integer, the total number of generations in the simulation.

sizeGens

Numeric vector, containing the size (number of individuals) of each generation.

CoupleF

Integer, IT MIGHT BE the number of couples in the current generation.

code_male

The value to use for males. Default is "M"

code_female

The value to use for females. Default is "F"

Value

Modifies 'df_Ngen' in place by updating or adding columns related to individual roles ('ifparent', 'ifson', 'ifdau') and couple IDs ('coupleId'). The updated data frame is also returned for integration into the larger pedigree data frame ('df_Fam').


nullToNA

Description

nullToNA

Usage

null2NA(x)

nullToNA(x)

Arguments

x

vector of any length

Value

replaces null values in a vector to NA


Parse a GEDCOM Individual Block

Description

Processes a block of GEDCOM lines corresponding to a single individual.

Usage

parseIndividualBlock(block, pattern_rows, all_var_names, verbose = FALSE)

Arguments

block

A character vector containing the GEDCOM lines for one individual.

pattern_rows

A list with counts of lines matching specific GEDCOM tags.

all_var_names

A character vector of variable names.

verbose

Logical indicating whether to print progress messages.

Value

A named list representing the parsed record for the individual, or NULL if no ID is found.


Parse a Full Name Line

Description

Extracts full name information from a GEDCOM "NAME" line and updates the record accordingly.

Usage

parseNameLine(line, record)

Arguments

line

A character string containing the name line.

record

A named list representing the individual's record.

Value

The updated record with parsed name information.


infer relationship from tree template

Description

infer relationship from tree template

Usage

parseTreeRelationships(tree_long, tree_paths = NULL)

Arguments

tree_long

A data frame containing the tree structure in long format.

tree_paths

Optional. traceTreePaths output. If NULL, it will be calculated.

Value

A data frame containing the relationships between family members.


Take a pedigree and turn it into an additive genetics relatedness matrix

Description

Take a pedigree and turn it into an additive genetics relatedness matrix

Usage

ped2add(
  ped,
  max_gen = 25,
  sparse = TRUE,
  verbose = FALSE,
  gc = FALSE,
  flatten_diag = FALSE,
  standardize_colnames = TRUE,
  transpose_method = "tcrossprod",
  adjacency_method = "direct",
  saveable = FALSE,
  resume = FALSE,
  save_rate = 5,
  save_rate_gen = save_rate,
  save_rate_parlist = 1e+05 * save_rate,
  save_path = "checkpoint/",
  ...
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

max_gen

the maximum number of generations to compute (e.g., only up to 4th degree relatives). The default is 25. However it can be set to infinity. 'Inf' uses as many generations as there are in the data.

sparse

logical. If TRUE, use and return sparse matrices from Matrix package

verbose

logical. If TRUE, print progress through stages of algorithm

gc

logical. If TRUE, do frequent garbage collection via gc to save memory

flatten_diag

logical. If TRUE, overwrite the diagonal of the final relatedness matrix with ones

standardize_colnames

logical. If TRUE, standardize the column names of the pedigree dataset

transpose_method

character. The method to use for computing the transpose. Options are "tcrossprod", "crossprod", or "star"

adjacency_method

character. The method to use for computing the adjacency matrix. Options are "loop", "indexed", direct or beta

saveable

logical. If TRUE, save the intermediate results to disk

resume

logical. If TRUE, resume from a checkpoint

save_rate

numeric. The rate at which to save the intermediate results

save_rate_gen

numeric. The rate at which to save the intermediate results by generation. If NULL, defaults to save_rate

save_rate_parlist

numeric. The rate at which to save the intermediate results by parent list. If NULL, defaults to save_rate*1000

save_path

character. The path to save the checkpoint files

...

additional arguments to be passed to ped2com

Details

The algorithms and methodologies used in this function are further discussed and exemplified in the vignette titled "examplePedigreeFunctions". For more advanced scenarios and detailed explanations, consult this vignette.


Take a pedigree and turn it into an extended environmental relatedness matrix

Description

Take a pedigree and turn it into an extended environmental relatedness matrix

Usage

ped2ce(ped, ...)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

...

additional arguments to be passed to ped2com

Details

The algorithms and methodologies used in this function are further discussed and exemplified in the vignette titled "examplePedigreeFunctions". For more advanced scenarios and detailed explanations, consult this vignette.


Take a pedigree and turn it into a common nuclear environmental matrix

Description

Take a pedigree and turn it into a common nuclear environmental matrix

Usage

ped2cn(
  ped,
  max_gen = 25,
  sparse = TRUE,
  verbose = FALSE,
  gc = FALSE,
  flatten_diag = FALSE,
  standardize_colnames = TRUE,
  transpose_method = "tcrossprod",
  saveable = FALSE,
  resume = FALSE,
  save_rate = 5,
  adjacency_method = "direct",
  save_rate_gen = save_rate,
  save_rate_parlist = 1000 * save_rate,
  save_path = "checkpoint/",
  ...
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

max_gen

the maximum number of generations to compute (e.g., only up to 4th degree relatives). The default is 25. However it can be set to infinity. 'Inf' uses as many generations as there are in the data.

sparse

logical. If TRUE, use and return sparse matrices from Matrix package

verbose

logical. If TRUE, print progress through stages of algorithm

gc

logical. If TRUE, do frequent garbage collection via gc to save memory

flatten_diag

logical. If TRUE, overwrite the diagonal of the final relatedness matrix with ones

standardize_colnames

logical. If TRUE, standardize the column names of the pedigree dataset

transpose_method

character. The method to use for computing the transpose. Options are "tcrossprod", "crossprod", or "star"

saveable

logical. If TRUE, save the intermediate results to disk

resume

logical. If TRUE, resume from a checkpoint

save_rate

numeric. The rate at which to save the intermediate results

adjacency_method

character. The method to use for computing the adjacency matrix. Options are "loop", "indexed", direct or beta

save_rate_gen

numeric. The rate at which to save the intermediate results by generation. If NULL, defaults to save_rate

save_rate_parlist

numeric. The rate at which to save the intermediate results by parent list. If NULL, defaults to save_rate*1000

save_path

character. The path to save the checkpoint files

...

additional arguments to be passed to ped2com

Details

The algorithms and methodologies used in this function are further discussed and exemplified in the vignette titled "examplePedigreeFunctions". For more advanced scenarios and detailed explanations, consult this vignette.


Take a pedigree and turn it into a relatedness matrix

Description

Take a pedigree and turn it into a relatedness matrix

Usage

ped2com(
  ped,
  component,
  max_gen = 25,
  sparse = TRUE,
  verbose = FALSE,
  gc = FALSE,
  flatten_diag = FALSE,
  standardize_colnames = TRUE,
  transpose_method = "tcrossprod",
  adjacency_method = "direct",
  isChild_method = "classic",
  saveable = FALSE,
  resume = FALSE,
  save_rate = 5,
  save_rate_gen = save_rate,
  save_rate_parlist = 1e+05 * save_rate,
  update_rate = 100,
  save_path = "checkpoint/",
  adjBeta_method = NULL,
  ...
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

component

character. Which component of the pedigree to return. See Details.

max_gen

the maximum number of generations to compute (e.g., only up to 4th degree relatives). The default is 25. However it can be set to infinity. 'Inf' uses as many generations as there are in the data.

sparse

logical. If TRUE, use and return sparse matrices from Matrix package

verbose

logical. If TRUE, print progress through stages of algorithm

gc

logical. If TRUE, do frequent garbage collection via gc to save memory

flatten_diag

logical. If TRUE, overwrite the diagonal of the final relatedness matrix with ones

standardize_colnames

logical. If TRUE, standardize the column names of the pedigree dataset

transpose_method

character. The method to use for computing the transpose. Options are "tcrossprod", "crossprod", or "star"

adjacency_method

character. The method to use for computing the adjacency matrix. Options are "loop", "indexed", direct or beta

isChild_method

character. The method to use for computing the isChild matrix. Options are "classic" or "partialparent"

saveable

logical. If TRUE, save the intermediate results to disk

resume

logical. If TRUE, resume from a checkpoint

save_rate

numeric. The rate at which to save the intermediate results

save_rate_gen

numeric. The rate at which to save the intermediate results by generation. If NULL, defaults to save_rate

save_rate_parlist

numeric. The rate at which to save the intermediate results by parent list. If NULL, defaults to save_rate*1000

update_rate

numeric. The rate at which to print progress

save_path

character. The path to save the checkpoint files

adjBeta_method

numeric The method to use for computing the building the adjacency_method matrix when using the "beta" build

...

additional arguments to be passed to ped2com

Details

The algorithms and methodologies used in this function are further discussed and exemplified in the vignette titled "examplePedigreeFunctions". For more advanced scenarios and detailed explanations, consult this vignette.


Segment Pedigree into Extended Families

Description

This function adds an extended family ID variable to a pedigree by segmenting that dataset into independent extended families using the weakly connected components algorithm.

Usage

ped2fam(
  ped,
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  famID = "famID",
  ...
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

personID

character. Name of the column in ped for the person ID variable

momID

character. Name of the column in ped for the mother ID variable

dadID

character. Name of the column in ped for the father ID variable

famID

character. Name of the column to be created in ped for the family ID variable

...

additional arguments to be passed to ped2com

Details

The general idea of this function is to use person ID, mother ID, and father ID to create an extended family ID such that everyone with the same family ID is in the same (perhaps very extended) pedigree. That is, a pair of people with the same family ID have at least one traceable relation of any length to one another.

This function works by turning the pedigree into a mathematical graph using the igraph package. Once in graph form, the function uses weakly connected components to search for all possible relationship paths that could connect anyone in the data to anyone else in the data.

Value

A pedigree dataset with one additional column for the newly created extended family ID


Turn a pedigree into a graph

Description

Turn a pedigree into a graph

Usage

ped2graph(
  ped,
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  directed = TRUE,
  adjacent = c("parents", "mothers", "fathers"),
  ...
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

personID

character. Name of the column in ped for the person ID variable

momID

character. Name of the column in ped for the mother ID variable

dadID

character. Name of the column in ped for the father ID variable

directed

Logical scalar. Default is TRUE. Indicates whether or not to create a directed graph.

adjacent

Character. Relationship that defines adjacency in the graph: parents, mothers, or fathers

...

additional arguments to be passed to ped2com

Details

The general idea of this function is to represent a pedigree as a graph using the igraph package.

Once in graph form, several common pedigree tasks become much simpler.

The adjacent argument allows for different kinds of graph structures. When using parents for adjacency, the graph shows all parent-child relationships. When using mother for adjacency, the graph only shows mother-child relationships. Similarly when using father for adjacency, only father-child relationships appear in the graph. Construct extended families from the parent graph, maternal lines from the mothers graph, and paternal lines from the fathers graph.

Value

A graph


Add a maternal line ID variable to a pedigree

Description

Add a maternal line ID variable to a pedigree

Usage

ped2maternal(
  ped,
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  matID = "matID",
  ...
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

personID

character. Name of the column in ped for the person ID variable

momID

character. Name of the column in ped for the mother ID variable

dadID

character. Name of the column in ped for the father ID variable

matID

Character. Maternal line ID variable to be created and added to the pedigree

...

additional arguments to be passed to ped2com

Details

Under various scenarios it is useful to know which people in a pedigree belong to the same maternal lines. This function first turns a pedigree into a graph where adjacency is defined by mother-child relationships. Subsequently, the weakly connected components algorithm finds all the separate maternal lines and gives them an ID variable.

See Also

[ped2fam()] for creating extended family IDs, and [ped2paternal()] for creating paternal line IDs


Take a pedigree and turn it into a mitochondrial relatedness matrix

Description

Take a pedigree and turn it into a mitochondrial relatedness matrix

Usage

ped2mit(
  ped,
  max_gen = 25,
  sparse = TRUE,
  verbose = FALSE,
  gc = FALSE,
  flatten_diag = FALSE,
  standardize_colnames = TRUE,
  transpose_method = "tcrossprod",
  adjacency_method = "direct",
  saveable = FALSE,
  resume = FALSE,
  save_rate = 5,
  save_rate_gen = save_rate,
  save_rate_parlist = 1e+05 * save_rate,
  save_path = "checkpoint/",
  ...
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

max_gen

the maximum number of generations to compute (e.g., only up to 4th degree relatives). The default is 25. However it can be set to infinity. 'Inf' uses as many generations as there are in the data.

sparse

logical. If TRUE, use and return sparse matrices from Matrix package

verbose

logical. If TRUE, print progress through stages of algorithm

gc

logical. If TRUE, do frequent garbage collection via gc to save memory

flatten_diag

logical. If TRUE, overwrite the diagonal of the final relatedness matrix with ones

standardize_colnames

logical. If TRUE, standardize the column names of the pedigree dataset

transpose_method

character. The method to use for computing the transpose. Options are "tcrossprod", "crossprod", or "star"

adjacency_method

character. The method to use for computing the adjacency matrix. Options are "loop", "indexed", direct or beta

saveable

logical. If TRUE, save the intermediate results to disk

resume

logical. If TRUE, resume from a checkpoint

save_rate

numeric. The rate at which to save the intermediate results

save_rate_gen

numeric. The rate at which to save the intermediate results by generation. If NULL, defaults to save_rate

save_rate_parlist

numeric. The rate at which to save the intermediate results by parent list. If NULL, defaults to save_rate*1000

save_path

character. The path to save the checkpoint files

...

additional arguments to be passed to ped2com

Details

The algorithms and methodologies used in this function are further discussed and exemplified in the vignette titled "examplePedigreeFunctions". For more advanced scenarios and detailed explanations, consult this vignette.


Add a paternal line ID variable to a pedigree

Description

Add a paternal line ID variable to a pedigree

Usage

ped2paternal(
  ped,
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  patID = "patID",
  ...
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

personID

character. Name of the column in ped for the person ID variable

momID

character. Name of the column in ped for the mother ID variable

dadID

character. Name of the column in ped for the father ID variable

patID

Character. Paternal line ID variable to be created and added to the pedigree

...

additional arguments to be passed to ped2com

Details

Under various scenarios it is useful to know which people in a pedigree belong to the same paternal lines. This function first turns a pedigree into a graph where adjacency is defined by father-child relationships. Subsequently, the weakly connected components algorithm finds all the separate paternal lines and gives them an ID variable.

See Also

[ped2fam()] for creating extended family IDs, and [ped2maternal()] for creating maternal line IDs


Assign Parent

Description

Assign Parent

Usage

populateParents(df, child, parent)

Arguments

df

A data frame containing the relationships.

child

The ID of the child.

parent

The ID of the parent.

Value

A data frame with updated parent information.


Post-process GEDCOM Data Frame

Description

This function optionally adds parent information, combines duplicate columns, and removes empty columns from the GEDCOM data frame.

Usage

postProcessGedcom(
  df_temp,
  remove_empty_cols = TRUE,
  combine_cols = TRUE,
  add_parents = TRUE,
  skinny = TRUE,
  verbose = FALSE
)

Arguments

df_temp

A data frame produced by readGedcom().

remove_empty_cols

Logical indicating whether to remove columns that are entirely missing.

combine_cols

Logical indicating whether to combine columns with duplicate values.

add_parents

Logical indicating whether to add parent information.

skinny

Logical indicating whether to slim down the data frame.

verbose

Logical indicating whether to print progress messages.

Value

The post-processed data frame.


Fictional pedigree data on a wizarding family

Description

A dataset created for educational and illustrative use, containing a fictional pedigree modeled after characters from the Harry Potter series. This data is structured for use in software demonstrations involving pedigree diagrams, inheritance structures, and kinship modeling. This dataset is not intended to represent any real individuals or families. It includes no narrative content or protected expression from the original works and is provided solely for educational purposes. This dataset is not endorsed by or affiliated with the creators or copyright holders of the Harry Potter series.

Usage

data(potter)

Format

A data frame (and ped object) with 36 rows and 10 variables

Details

The variables are as follows:

IDs in the 100s momIDs and dadIDs are for people not in the dataset.


Function to prepare the pedigree for summarization This function prepares the pedigree for summarization by ensuring that the necessary IDs are present and that the pedigree is built correctly.

Description

Function to prepare the pedigree for summarization This function prepares the pedigree for summarization by ensuring that the necessary IDs are present and that the pedigree is built correctly.

Usage

prepSummarizePedigrees(
  ped,
  type,
  verbose = FALSE,
  famID,
  personID,
  momID,
  dadID,
  matID,
  patID
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

type

Character vector. Specifies which summaries to compute. Options: '"fathers"', '"mothers"', '"families"'. Default includes all three.

verbose

Logical, if TRUE, print progress messages.

famID

character. Name of the column to be created in ped for the family ID variable

personID

character. Name of the column in ped for the person ID variable

momID

character. Name of the column in ped for the mother ID variable

dadID

character. Name of the column in ped for the father ID variable

matID

Character. Maternal line ID variable to be created and added to the pedigree

patID

Character. Paternal line ID variable to be created and added to the pedigree


Process Event Lines (Birth or Death)

Description

Extracts event details (e.g., date, place, cause, latitude, longitude) from a block of GEDCOM lines. For "birth": expect DATE on line i+1, PLAC on i+2, LATI on i+4, LONG on i+5. For "death": expect DATE on line i+1, PLAC on i+2, CAUS on i+3, LATI on i+4, LONG on i+5.

Usage

processEventLine(event, block, i, record, pattern_rows)

Arguments

event

A character string indicating the event type ("birth" or "death").

block

A character vector of GEDCOM lines.

i

The current line index where the event tag is found.

record

A named list representing the individual's record.

pattern_rows

A list with counts of GEDCOM tag occurrences.

Value

The updated record with parsed event information.#


Process Parents Information from GEDCOM Data

Description

Adds parent IDs to the individuals based on family relationship data.

Usage

processParents(df_temp, datasource)

Arguments

df_temp

A data frame produced by readGedcom().

datasource

Character string indicating the data source ("gedcom" or "wiki").

Value

The updated data frame with parent IDs added.


Convert Sparse Relationship Matrices to Kinship Links for one Matrix

Description

Convert Sparse Relationship Matrices to Kinship Links for one Matrix

Usage

process_one(
  matrix,
  rel_name,
  ids,
  nc,
  rel_pairs_file,
  writetodisk,
  write_buffer_size,
  drop_upper_triangular,
  update_rate,
  verbose,
  gc,
  include_all_links = TRUE,
  ...
)

Arguments

rel_pairs_file

File path to write related pairs to (CSV format).

writetodisk

Logical. If TRUE, writes the related pairs to disk; if FALSE, returns a data frame.

write_buffer_size

Number of related pairs to write to disk at a time.

drop_upper_triangular

Logical. If TRUE, drops the upper triangular portion of the matrix.

update_rate

Numeric. Frequency (in iterations) at which progress messages are printed.

verbose

Logical. If TRUE, prints progress messages.

gc

Logical. If TRUE, performs garbage collection via gc to free memory.

include_all_links

Logical. If TRUE, all links are included in the output.

...

Additional arguments to be passed to com2links


Process a GEDCOM Tag

Description

Extracts and assigns a value to a specified field in 'vars' if the pattern is present. Returns both the updated variable list and a flag indicating whether the tag was matched.

Usage

process_tag(
  tag,
  field_name,
  pattern_rows,
  line,
  vars,
  extractor = NULL,
  mode = "replace"
)

Arguments

tag

The GEDCOM tag (e.g., "SEX", "CAST", etc.).

field_name

The name of the variable to assign to in 'vars'.

pattern_rows

Output from 'countPatternRows()'.

line

The GEDCOM line to parse.

vars

The current list of variables to update.

Value

A list with updated 'vars' and a 'matched' flag.


Read a GEDCOM File

Description

This function reads a GEDCOM file and parses it into a structured data frame of individuals.

Usage

readGedcom(
  file_path,
  verbose = FALSE,
  add_parents = TRUE,
  remove_empty_cols = TRUE,
  combine_cols = TRUE,
  skinny = FALSE,
  update_rate = 1000,
  post_process = TRUE,
  ...
)

readGed(
  file_path,
  verbose = FALSE,
  add_parents = TRUE,
  remove_empty_cols = TRUE,
  combine_cols = TRUE,
  skinny = FALSE,
  update_rate = 1000,
  post_process = TRUE,
  ...
)

readgedcom(
  file_path,
  verbose = FALSE,
  add_parents = TRUE,
  remove_empty_cols = TRUE,
  combine_cols = TRUE,
  skinny = FALSE,
  update_rate = 1000,
  post_process = TRUE,
  ...
)

Arguments

file_path

The path to the GEDCOM file.

verbose

A logical value indicating whether to print messages.

add_parents

A logical value indicating whether to add parents to the data frame.

remove_empty_cols

A logical value indicating whether to remove columns with all missing values.

combine_cols

A logical value indicating whether to combine columns with duplicate values.

skinny

A logical value indicating whether to return a skinny data frame.

update_rate

numeric. The rate at which to print progress

post_process

A logical value indicating whether to post-process the data frame.

...

Additional arguments to be passed to the function.

Value

A data frame containing information about individuals, with the following potential columns: - 'id': ID of the individual - ‘momID': ID of the individual’s mother - ‘dadID': ID of the individual’s father - 'sex': Sex of the individual - 'name': Full name of the individual - 'name_given': First name of the individual - 'name_surn': Last name of the individual - 'name_marriedsurn': Married name of the individual - 'name_nick': Nickname of the individual - 'name_npfx': Name prefix - 'name_nsfx': Name suffix - 'birth_date': Birth date of the individual - 'birth_lat': Latitude of the birthplace - 'birth_long': Longitude of the birthplace - 'birth_place': Birthplace of the individual - 'death_caus': Cause of death - 'death_date': Death date of the individual - 'death_lat': Latitude of the place of death - 'death_long': Longitude of the place of death - 'death_place': Place of death of the individual - 'attribute_caste': Caste of the individual - 'attribute_children': Number of children of the individual - 'attribute_description': Description of the individual - 'attribute_education': Education of the individual - 'attribute_idnumber': Identification number of the individual - 'attribute_marriages': Number of marriages of the individual - 'attribute_nationality': Nationality of the individual - 'attribute_occupation': Occupation of the individual - 'attribute_property': Property owned by the individual - 'attribute_religion': Religion of the individual - 'attribute_residence': Residence of the individual - 'attribute_ssn': Social security number of the individual - 'attribute_title': Title of the individual - 'FAMC': ID(s) of the family where the individual is a child - 'FAMS': ID(s) of the family where the individual is a spouse


Read Wiki Family Tree

Description

Read Wiki Family Tree

Usage

readWikifamilytree(text = NULL, verbose = FALSE, file_path = NULL, ...)

Arguments

text

A character string containing the text of a family tree in wiki format.

verbose

A logical value indicating whether to print messages.

file_path

The path to the file containing the family tree.

...

Additional arguments (not used).

Value

A list containing the summary, members, structure, and relationships of the family tree.


Recodes Sex Variable in a Pedigree Dataframe

Description

This function serves as is primarily used internally, by plotting functions etc. It sets the 'repair' flag to TRUE automatically and forwards any additional parameters to 'checkSex'.

Usage

recodeSex(
  ped,
  verbose = FALSE,
  code_male = NULL,
  code_na = NULL,
  code_female = NULL,
  recode_male = "M",
  recode_female = "F",
  recode_na = NA_character_
)

Arguments

ped

A dataframe representing the pedigree data with a 'sex' column.

verbose

A logical flag indicating whether to print progress and validation messages to the console.

code_male

The current code used to represent males in the 'sex' column.

code_na

The current value used for missing values.

code_female

The current code used to represent females in the 'sex' column. If both are NULL, no recoding is performed.

recode_male

The value to use for males. Default is "M"

recode_female

The value to use for females. Default is "F"

recode_na

The value to use for missing values. Default is NA_character_

Details

The validation process identifies: - The unique sex codes present in the dataset. - Whether individuals listed as fathers or mothers have inconsistent sex codes. - Instances where an individual's recorded sex does not align with their parental role.

If 'repair = TRUE', the function standardizes sex coding by: - Assigning individuals listed as fathers the most common male code in the dataset. - Assigning individuals listed as mothers the most common female code.

This function uses the terms 'male' and 'female' in a biological context, referring to chromosomal and other biologically-based characteristics necessary for constructing genetic pedigrees. The biological aspect of sex used in genetic analysis (genotype) is distinct from the broader, richer concept of gender identity (phenotype).

We recognize the importance of using language and methodologies that affirm and respect the full spectrum of gender identities. The developers of this package express unequivocal support for folx in the transgender and LGBTQ+ communities.

Value

A modified version of the input data.frame ped, containing an additional or modified 'sex_recode' column where the 'sex' values are recoded according to code_male. NA values in the 'sex' column are preserved.


Repair Missing IDs

Description

This function repairs missing IDs in a pedigree.

Usage

repairIDs(ped, verbose = FALSE)

Arguments

ped

A dataframe representing the pedigree data with columns 'ID', 'dadID', and 'momID'.

verbose

A logical flag indicating whether to print progress and validation messages to the console.

Value

A corrected pedigree


Repair Parent IDs

Description

This function repairs parent IDs in a pedigree.

Usage

repairParentIDs(
  ped,
  verbose = FALSE,
  famID = "famID",
  personID = "ID",
  momID = "momID",
  dadID = "dadID"
)

Arguments

ped

A dataframe representing the pedigree data with columns 'ID', 'dadID', and 'momID'.

verbose

A logical flag indicating whether to print progress and validation messages to the console.

famID

Character. Column name for family IDs.

personID

Character. Column name for individual IDs.

momID

Character. Column name for maternal IDs.

dadID

Character. Column name for paternal IDs.

Value

A corrected pedigree


Repairs Sex Coding in a Pedigree Dataframe

Description

This function serves as a wrapper around 'checkSex' to specifically handle the repair of the sex coding in a pedigree dataframe.

Usage

repairSex(ped, verbose = FALSE, code_male = NULL, code_female = NULL)

Arguments

ped

A dataframe representing the pedigree data with a 'sex' column.

verbose

A logical flag indicating whether to print progress and validation messages to the console.

code_male

The current code used to represent males in the 'sex' column.

code_female

The current code used to represent females in the 'sex' column. If both are NULL, no recoding is performed.

Details

The validation process identifies: - The unique sex codes present in the dataset. - Whether individuals listed as fathers or mothers have inconsistent sex codes. - Instances where an individual's recorded sex does not align with their parental role.

If 'repair = TRUE', the function standardizes sex coding by: - Assigning individuals listed as fathers the most common male code in the dataset. - Assigning individuals listed as mothers the most common female code.

This function uses the terms 'male' and 'female' in a biological context, referring to chromosomal and other biologically-based characteristics necessary for constructing genetic pedigrees. The biological aspect of sex used in genetic analysis (genotype) is distinct from the broader, richer concept of gender identity (phenotype).

We recognize the importance of using language and methodologies that affirm and respect the full spectrum of gender identities. The developers of this package express unequivocal support for folx in the transgender and LGBTQ+ communities.

Value

A modified version of the input data.frame ped, containing an additional or modified 'sex_recode' column where the 'sex' values are recoded according to code_male. NA values in the 'sex' column are preserved.

See Also

checkSex

Examples

## Not run: 
ped <- data.frame(ID = c(1, 2, 3), sex = c("M", "F", "M"))
repairSex(ped, code_male = "M", verbose = TRUE)

## End(Not run)

Resample Elements of a Vector

Description

This function performs resampling of the elements in a vector 'x'. It randomly shuffles the elements of 'x' and returns a vector of the resampled elements. If 'x' is empty, it returns 'NA_integer_'.

Usage

resample(x, ...)

Arguments

x

A vector containing the elements to be resampled. If 'x' is empty, the function will return 'NA_integer_'.

...

Additional arguments passed to 'sample.int', such as 'size' for the number of items to sample and 'replace' indicating whether sampling should be with replacement.

Value

A vector of resampled elements from 'x'. If 'x' is empty, returns 'NA_integer_'. The length and type of the returned vector depend on the input vector 'x' and the additional arguments provided via '...'.


rmvn

Description

rmvn

Usage

rmvn(n, sigma)

Arguments

n

Sample Size

sigma

Covariance matrix

Value

Generates multivariate normal data from a covariance matrix (sigma) of length n


Royal pedigree data from 1992

Description

A dataset created by Denis Reid from the Royal Families of Europe.

Usage

data(royal92)

Format

A data frame with 3110 observations

Details

The variables are as follows: id,momID,dadID,name,sex,birth_date,death_date,attribute_title


Simulate Pedigrees This function simulates "balanced" pedigrees based on a group of parameters: 1) k - Kids per couple; 2) G - Number of generations; 3) p - Proportion of males in offspring; 4) r - Mating rate.

Description

Simulate Pedigrees This function simulates "balanced" pedigrees based on a group of parameters: 1) k - Kids per couple; 2) G - Number of generations; 3) p - Proportion of males in offspring; 4) r - Mating rate.

Usage

simulatePedigree(
  kpc = 3,
  Ngen = 4,
  sexR = 0.5,
  marR = 2/3,
  rd_kpc = FALSE,
  balancedSex = TRUE,
  balancedMar = TRUE,
  verbose = FALSE,
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  spouseID = "spouseID",
  code_male = "M",
  code_female = "F"
)

SimPed(...)

Arguments

kpc

Number of kids per couple. An integer >= 2 that determines how many kids each fertilized mated couple will have in the pedigree. Default value is 3. Returns an error when kpc equals 1.

Ngen

Number of generations. An integer >= 2 that determines how many generations the simulated pedigree will have. The first generation is always a fertilized couple. The last generation has no mated individuals.

sexR

Sex ratio of offspring. A numeric value ranging from 0 to 1 that determines the proportion of males in all offspring in this pedigree. For instance, 0.4 means 40 percent of the offspring will be male.

marR

Mating rate. A numeric value ranging from 0 to 1 which determines the proportion of mated (fertilized) couples in the pedigree within each generation. For instance, marR = 0.5 suggests 50 percent of the offspring in a specific generation will be mated and have their offspring.

rd_kpc

logical. If TRUE, the number of kids per mate will be randomly generated from a poisson distribution with mean kpc. If FALSE, the number of kids per mate will be fixed at kpc.

balancedSex

Not fully developed yet. Always TRUE in the current version.

balancedMar

Not fully developed yet. Always TRUE in the current version.

verbose

logical If TRUE, message progress through stages of algorithm

personID

character. Name of the column in ped for the person ID variable

momID

character. Name of the column in ped for the mother ID variable

dadID

character. Name of the column in ped for the father ID variable

spouseID

The name of the column that will contain the spouse ID in the output data frame. Default is "spID".

code_male

The value to use for males. Default is "M"

code_female

The value to use for females. Default is "F"

...

Additional arguments to be passed to other functions.

Value

A data.frame with each row representing a simulated individual. The columns are as follows:


sliceFamilies

Description

Slices up families by additive relatedness, creating CSV files grouped by degree of relatedness. Operates on a potentially large file by reading in chunks and binning links by additive relatedness.

Usage

sliceFamilies(
  outcome_name = "AD_demo",
  biggest = TRUE,
  bin_width = 0.1,
  degreerelatedness = 12,
  chunk_size = 2e+07,
  max_lines = 1e+13,
  addRel_ceiling = 1.5,
  input_file = NULL,
  folder_prefix = "data",
  progress_csv = "progress.csv",
  progress_status = "progress.txt",
  data_directory = NULL,
  verbose = FALSE,
  error_handling = FALSE,
  file_column_names = c("ID1", "ID2", "addRel", "mitRel", "cnuRel")
)

Arguments

outcome_name

Name of the outcome variable (used for naming input/output files)

biggest

Logical; whether to process the "biggest" family dataset (TRUE) or all-but-biggest (FALSE)

bin_width

Width of additive relatedness bins (default is 0.10)

degreerelatedness

Maximum degree of relatedness to consider (default 12)

chunk_size

Number of lines to read in each chunk (default 2e7)

max_lines

Max number of lines to process from input file (default 1e13)

addRel_ceiling

Numeric. Maximum relatedness value to bin to. Default is 1.5

input_file

Path to the input CSV file. If NULL, defaults to a specific file based on 'biggest' flag.

folder_prefix

Prefix for the output folder (default "data")

progress_csv

Path to a CSV file for tracking progress (default "progress.csv")

progress_status

Path to a text file for logging progress status (default "progress.txt")

data_directory

Directory where output files will be saved. If NULL, it is constructed based on 'outcome_name' and 'folder_prefix'.

verbose

Logical; whether to print progress messages (default FALSE)

error_handling

Logical. Should more aggressive error handing be attemptted? Default is false

file_column_names

Names of the columns in the input file (default c("ID1", "ID2", "addRel", "mitRel", "cnuRel"))

Value

NULL. Writes CSV files to disk and updates progress logs.


Split GEDCOM Lines into Individual Blocks

Description

This function partitions the GEDCOM file (as a vector of lines) into a list of blocks, where each block corresponds to a single individual starting with an "@ INDI" line.

Usage

splitIndividuals(lines, verbose = FALSE)

Arguments

lines

A character vector of lines from the GEDCOM file.

verbose

Logical indicating whether to output progress messages.

Value

A list of character vectors, each representing one individual.


Standardize Column Names in a Dataframe (Internal)

Description

This internal function standardizes the column names of a given dataframe. It utilizes regular expressions and the 'tolower()' function to match column names against a list of predefined standard names. The approach is case-insensitive and allows for flexible matching of column names.

Usage

standardizeColnames(df, verbose = FALSE)

Arguments

df

A dataframe whose column names need to be standardized.

verbose

A logical indicating whether to print progress messages.

Value

A dataframe with standardized column names.


Summarize the families in a pedigree

Description

Summarize the families in a pedigree

Usage

summarizeFamilies(
  ped,
  famID = "famID",
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  matID = "matID",
  patID = "patID",
  byr = NULL,
  founder_sort_var = NULL,
  include_founder = FALSE,
  n_biggest = 5,
  n_oldest = 5,
  skip_var = NULL,
  five_num_summary = FALSE,
  verbose = FALSE,
  network_checks = FALSE
)

summariseFamilies(
  ped,
  famID = "famID",
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  matID = "matID",
  patID = "patID",
  byr = NULL,
  founder_sort_var = NULL,
  include_founder = FALSE,
  n_biggest = 5,
  n_oldest = 5,
  skip_var = NULL,
  five_num_summary = FALSE,
  verbose = FALSE,
  network_checks = FALSE
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

famID

character. Name of the column to be created in ped for the family ID variable

personID

character. Name of the column in ped for the person ID variable

momID

character. Name of the column in ped for the mother ID variable

dadID

character. Name of the column in ped for the father ID variable

matID

Character. Maternal line ID variable to be created and added to the pedigree

patID

Character. Paternal line ID variable to be created and added to the pedigree

byr

Character. Optional column name for birth year. Used to determine the oldest lineages.

founder_sort_var

Character. Column used to determine the founder of each lineage. Defaults to 'byr' (if available) or 'personID' otherwise.

include_founder

Logical. If 'TRUE', includes the founder (originating member) of each lineage in the output.

n_biggest

Integer. Number of largest lineages to return (sorted by count).

n_oldest

Integer. Number of oldest lineages to return (sorted by birth year).

skip_var

Character vector. Variables to exclude from summary calculations.

five_num_summary

Logical. If 'TRUE', includes the first quartile (Q1) and third quartile (Q3) in addition to the minimum, median, and maximum values.

verbose

Logical, if TRUE, print progress messages.

network_checks

Logical. If 'TRUE', performs network checks on the pedigree data.

See Also

[summarizePedigrees ()]


Function to summarize the originating members for each line

Description

This function summarizes the originating members for each line in a pedigree. It is supposed to be used internally by the summarize_pedigree function.

Usage

summarizeFounder(ped_dt, group_var, sort_var, foo_summary_dt, verbose)

Arguments

sort_var

A character string specifying the column name to sort by.

verbose

Logical, if TRUE, print progress messages.


Summarize the maternal lines in a pedigree

Description

Summarize the maternal lines in a pedigree

Usage

summarizeMatrilines(
  ped,
  famID = "famID",
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  matID = "matID",
  patID = "patID",
  byr = NULL,
  include_founder = FALSE,
  founder_sort_var = NULL,
  n_biggest = 5,
  n_oldest = 5,
  skip_var = NULL,
  five_num_summary = FALSE,
  verbose = FALSE,
  network_checks = FALSE
)

summariseMatrilines(
  ped,
  famID = "famID",
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  matID = "matID",
  patID = "patID",
  byr = NULL,
  include_founder = FALSE,
  founder_sort_var = NULL,
  n_biggest = 5,
  n_oldest = 5,
  skip_var = NULL,
  five_num_summary = FALSE,
  verbose = FALSE,
  network_checks = FALSE
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

famID

character. Name of the column to be created in ped for the family ID variable

personID

character. Name of the column in ped for the person ID variable

momID

character. Name of the column in ped for the mother ID variable

dadID

character. Name of the column in ped for the father ID variable

matID

Character. Maternal line ID variable to be created and added to the pedigree

patID

Character. Paternal line ID variable to be created and added to the pedigree

byr

Character. Optional column name for birth year. Used to determine the oldest lineages.

include_founder

Logical. If 'TRUE', includes the founder (originating member) of each lineage in the output.

founder_sort_var

Character. Column used to determine the founder of each lineage. Defaults to 'byr' (if available) or 'personID' otherwise.

n_biggest

Integer. Number of largest lineages to return (sorted by count).

n_oldest

Integer. Number of oldest lineages to return (sorted by birth year).

skip_var

Character vector. Variables to exclude from summary calculations.

five_num_summary

Logical. If 'TRUE', includes the first quartile (Q1) and third quartile (Q3) in addition to the minimum, median, and maximum values.

verbose

Logical, if TRUE, print progress messages.

network_checks

Logical. If 'TRUE', performs network checks on the pedigree data.

See Also

[summarizePedigrees ()]


Summarize the paternal lines in a pedigree

Description

Summarize the paternal lines in a pedigree

Usage

summarizePatrilines(
  ped,
  famID = "famID",
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  matID = "matID",
  patID = "patID",
  byr = NULL,
  founder_sort_var = NULL,
  include_founder = FALSE,
  n_biggest = 5,
  n_oldest = 5,
  skip_var = NULL,
  five_num_summary = FALSE,
  verbose = FALSE,
  network_checks = FALSE
)

summarisePatrilines(
  ped,
  famID = "famID",
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  matID = "matID",
  patID = "patID",
  byr = NULL,
  founder_sort_var = NULL,
  include_founder = FALSE,
  n_biggest = 5,
  n_oldest = 5,
  skip_var = NULL,
  five_num_summary = FALSE,
  verbose = FALSE,
  network_checks = FALSE
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

famID

character. Name of the column to be created in ped for the family ID variable

personID

character. Name of the column in ped for the person ID variable

momID

character. Name of the column in ped for the mother ID variable

dadID

character. Name of the column in ped for the father ID variable

matID

Character. Maternal line ID variable to be created and added to the pedigree

patID

Character. Paternal line ID variable to be created and added to the pedigree

byr

Character. Optional column name for birth year. Used to determine the oldest lineages.

founder_sort_var

Character. Column used to determine the founder of each lineage. Defaults to 'byr' (if available) or 'personID' otherwise.

include_founder

Logical. If 'TRUE', includes the founder (originating member) of each lineage in the output.

n_biggest

Integer. Number of largest lineages to return (sorted by count).

n_oldest

Integer. Number of oldest lineages to return (sorted by birth year).

skip_var

Character vector. Variables to exclude from summary calculations.

five_num_summary

Logical. If 'TRUE', includes the first quartile (Q1) and third quartile (Q3) in addition to the minimum, median, and maximum values.

verbose

Logical, if TRUE, print progress messages.

network_checks

Logical. If 'TRUE', performs network checks on the pedigree data.

See Also

[summarizePedigrees ()]


Summarize Pedigree Data

Description

This function summarizes pedigree data, by computing key summary statistics for all numeric variables and identifying the originating member (founder) for each family, maternal, and paternal lineage.

Usage

summarizePedigrees(
  ped,
  famID = "famID",
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  matID = "matID",
  patID = "patID",
  type = c("fathers", "mothers", "families"),
  byr = NULL,
  include_founder = FALSE,
  founder_sort_var = NULL,
  n_keep = 5,
  n_biggest = n_keep,
  n_oldest = n_keep,
  skip_var = NULL,
  five_num_summary = FALSE,
  network_checks = FALSE,
  verbose = FALSE
)

summarisePedigrees(
  ped,
  famID = "famID",
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  matID = "matID",
  patID = "patID",
  type = c("fathers", "mothers", "families"),
  byr = NULL,
  include_founder = FALSE,
  founder_sort_var = NULL,
  n_keep = 5,
  n_biggest = n_keep,
  n_oldest = n_keep,
  skip_var = NULL,
  five_num_summary = FALSE,
  network_checks = FALSE,
  verbose = FALSE
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

famID

character. Name of the column to be created in ped for the family ID variable

personID

character. Name of the column in ped for the person ID variable

momID

character. Name of the column in ped for the mother ID variable

dadID

character. Name of the column in ped for the father ID variable

matID

Character. Maternal line ID variable to be created and added to the pedigree

patID

Character. Paternal line ID variable to be created and added to the pedigree

type

Character vector. Specifies which summaries to compute. Options: '"fathers"', '"mothers"', '"families"'. Default includes all three.

byr

Character. Optional column name for birth year. Used to determine the oldest lineages.

include_founder

Logical. If 'TRUE', includes the founder (originating member) of each lineage in the output.

founder_sort_var

Character. Column used to determine the founder of each lineage. Defaults to 'byr' (if available) or 'personID' otherwise.

n_keep

Integer. Number of lineages to keep in the output for each type of summary.

n_biggest

Integer. Number of largest lineages to return (sorted by count).

n_oldest

Integer. Number of oldest lineages to return (sorted by birth year).

skip_var

Character vector. Variables to exclude from summary calculations.

five_num_summary

Logical. If 'TRUE', includes the first quartile (Q1) and third quartile (Q3) in addition to the minimum, median, and maximum values.

network_checks

Logical. If 'TRUE', performs network checks on the pedigree data.

verbose

Logical, if TRUE, print progress messages.

Details

The function calculates standard descriptive statistics, including the count of individuals in each lineage, means, medians, minimum and maximum values, and standard deviations. Additionally, if 'five_num_summary = TRUE', the function includes the first and third quartiles (Q1, Q3) to provide a more detailed distributional summary. Users can also specify variables to exclude from the analysis via 'skip_var'.

Beyond summary statistics, the function identifies the founding member of each lineage based on the specified sorting variable ('founder_sort_var'), defaulting to birth year ('byr') when available or 'personID' otherwise. Users can retrieve the largest and oldest lineages by setting 'n_biggest' and 'n_oldest', respectively.

Value

A data.frame (or list) containing summary statistics for family, maternal, and paternal lines, as well as the 5 oldest and biggest lines.


Trace paths between individuals in a family tree grid

Description

Trace paths between individuals in a family tree grid

Usage

traceTreePaths(tree_long, deduplicate = TRUE)

Arguments

tree_long

A data.frame with columns: Row, Column, Value, id

deduplicate

Logical, if TRUE, will remove duplicate paths

Value

A data.frame with columns: from_id, to_id, direction, path_length, intermediates


modified tryCatch function

Description

modified tryCatch function

Usage

tryNA(x)

try_na(x)

Arguments

x

vector of any length

Value

Fuses the nullToNA function with efunc


validate_and_convert_matrix

Description

This function validates and converts a matrix to a specific format.

Usage

validate_and_convert_matrix(
  mat,
  name,
  ensure_symmetric = FALSE,
  force_binary = FALSE
)

Arguments

mat

The matrix to be validated and converted.

name

The name of the matrix for error messages.

ensure_symmetric

Logical indicating whether to ensure the matrix is symmetric.

force_binary

Logical indicating whether to force the matrix to be binary.

Value

The validated and converted matrix.


vech Create the half-vectorization of a matrix

Description

vech Create the half-vectorization of a matrix

Usage

vech(x)

Arguments

x

a matrix, the half-vectorization of which is desired

Details

This function returns the vectorized form of the lower triangle of a matrix, including the diagonal. The upper triangle is ignored with no checking that the provided matrix is symmetric.

Value

A vector containing the lower triangle of the matrix, including the diagonal.

Examples


vech(matrix(c(1, 0.5, 0.5, 1), nrow = 2, ncol = 2))