High-dimensional Mediation Analysis

hima is a wrapper function designed to perform various HIMA methods for estimating and testing high-dimensional mediation effects. hima can automatically select the appropriate HIMA method based on the outcome and mediator data type.

Usage

hima(
  formula,
  data.pheno,
  data.M,
  mediator.type = c("gaussian", "negbin", "compositional"),
  penalty = c("DBlasso", "MCP", "SCAD", "lasso"),
  quantile = FALSE,
  efficient = FALSE,
  scale = TRUE,
  sigcut = 0.05,
  contrast = NULL,
  subset = NULL,
  verbose = FALSE,
  ...
)

Arguments

formula: an object of class formula representing the overall effect model to be fitted, specified as outcome ~ exposure + covariates. The "exposure" variable (the variable of interest) must be listed first on the right-hand side of the formula.
data.pheno: a data frame containing the exposure, outcome, and covariates specified in the formula. Variable names in data.pheno must match those in the formula. When scale = TRUE, the exposure and covariates will be scaled (the outcome retains its original scale).
data.M: a data.frame or matrix of high-dimensional mediators, with rows representing samples and columns representing mediator variables. When scale = TRUE, data.M will be scaled.
mediator.type: a character string indicating the data type of the high-dimensional mediators (data.M). Options are: 'gaussian' (default): for continuous mediators. 'negbin': for count data mediators modeled using the negative binomial distribution (e.g., RNA-seq data). 'compositional': for compositional data mediators (e.g., microbiome data).
penalty: a character string specifying the penalty method to apply in the model. Options are: 'DBlasso': De-biased LASSO (default). 'MCP': Minimax Concave Penalty. 'SCAD': Smoothly Clipped Absolute Deviation. 'lasso': Least Absolute Shrinkage and Selection Operator. Note: Survival HIMA and microbiome HIMA can only be performed with 'DBlasso'. Quantile HIMA and efficient HIMA cannot use 'DBlasso'; they always apply 'MCP'.
quantile: logical. Indicates whether to use quantile HIMA (hima_quantile). Default is FALSE. Applicable only for classic HIMA with a continuous outcome and mediator.type = 'gaussian'. If TRUE, specify the desired quantile(s) using the tau parameter; otherwise, the default tau = 0.5 (i.e., median) is used.
efficient: logical. Indicates whether to use efficient HIMA (hima_efficient). Default is FALSE. Applicable only for classic HIMA with a continuous outcome and mediator.type = 'gaussian'.
scale: logical. Determines whether the function scales the data (exposure, mediators, and covariates). Default is TRUE. Note: For simulation studies, set scale = FALSE to avoid estimate compression (i.e., shrinkage of estimates toward zero due to scaling).
sigcut: numeric. The significance cutoff for selecting mediators. Default is 0.05.
contrast: a named list of contrasts to be applied to factor variables in the covariates (cannot be the variable of interest).
subset: an optional vector specifying a subset of observations to use in the analysis.
verbose: logical. Determines whether the function displays progress messages. Default is FALSE.
...: reserved passing parameter (or for future use).

Value

A data.frame containing mediation testing results of selected mediators.

ID:: Mediator ID/name.
alpha:: Coefficient estimates of exposure (X) –> mediators (M) (adjusted for covariates).
beta:: Coefficient estimates of mediators (M) –> outcome (Y) (adjusted for covariates and exposure).
alpha*beta:: The estimated indirect (mediation) effect of exposure on outcome through each mediator.
Relative Importance:: The proportion of each mediator's mediation effect relative to the sum of the absolute mediation effects of all significant mediators.
p-value:: The joint p-value assessing the significance of each mediator's indirect effect, calculated based on the corresponding statistical approach.
tau:: The quantile level of the outcome (applicable only when using the quantile mediation model).

References

1. Zhang H, Zheng Y, Zhang Z, Gao T, Joyce B, Yoon G, Zhang W, Schwartz J, Just A, Colicino E, Vokonas P, Zhao L, Lv J, Baccarelli A, Hou L, Liu L. Estimating and Testing High-dimensional Mediation Effects in Epigenetic Studies. Bioinformatics. 2016. DOI: 10.1093/bioinformatics/btw351. PMID: 27357171; PMCID: PMC5048064

2. Zhang H, Zheng Y, Hou L, Zheng C, Liu L. Mediation Analysis for Survival Data with High-Dimensional Mediators. Bioinformatics. 2021. DOI: 10.1093/bioinformatics/btab564. PMID: 34343267; PMCID: PMC8570823

3. Zhang H, Chen J, Feng Y, Wang C, Li H, Liu L. Mediation Effect Selection in High-dimensional and Compositional Microbiome data. Stat Med. 2021. DOI: 10.1002/sim.8808. PMID: 33205470; PMCID: PMC7855955

4. Zhang H, Chen J, Li Z, Liu L. Testing for Mediation Effect with Application to Human Microbiome Data. Stat Biosci. 2021. DOI: 10.1007/s12561-019-09253-3. PMID: 34093887; PMCID: PMC8177450

5. Perera C, Zhang H, Zheng Y, Hou L, Qu A, Zheng C, Xie K, Liu L. HIMA2: High-dimensional Mediation Analysis and Its Application in Epigenome-wide DNA Methylation Data. BMC Bioinformatics. 2022. DOI: 10.1186/s12859-022-04748-1. PMID: 35879655; PMCID: PMC9310002

6. Zhang H, Hong X, Zheng Y, Hou L, Zheng C, Wang X, Liu L. High-Dimensional Quantile Mediation Analysis with Application to a Birth Cohort Study of Mother–Newborn Pairs. Bioinformatics. 2024. DOI: 10.1093/bioinformatics/btae055. PMID: 38290773; PMCID: PMC10873903

7. Bai X, Zheng Y, Hou L, Zheng C, Liu L, Zhang H. An Efficient Testing Procedure for High-dimensional Mediators with FDR Control. Statistics in Biosciences. 2024. DOI: 10.1007/s12561-024-09447-4.

Examples

if (FALSE) { # \dontrun{
# Note: In the following examples, M1, M2, and M3 are true mediators.

# Example 1 (continuous outcome - linear HIMA):
head(ContinuousOutcome$PhenoData)

e1 <- hima(Outcome ~ Treatment + Sex + Age,
  data.pheno = ContinuousOutcome$PhenoData,
  data.M = ContinuousOutcome$Mediator,
  mediator.type = "gaussian",
  penalty = "MCP", # Can be "DBlasso" for hima_dblasso
  scale = FALSE
) # Disabled only for simulation data
summary(e1)

# Efficient HIMA (only applicable to mediators and outcomes that are
# both continuous and normally distributed.)
e1e <- hima(Outcome ~ Treatment + Sex + Age,
  data.pheno = ContinuousOutcome$PhenoData,
  data.M = ContinuousOutcome$Mediator,
  mediator.type = "gaussian",
  efficient = TRUE,
  penalty = "MCP", # Efficient HIMA does not support DBlasso
  scale = FALSE
) # Disabled only for simulation data
summary(e1e)

# Example 2 (binary outcome - logistic HIMA):
head(BinaryOutcome$PhenoData)

e2 <- hima(Disease ~ Treatment + Sex + Age,
  data.pheno = BinaryOutcome$PhenoData,
  data.M = BinaryOutcome$Mediator,
  mediator.type = "gaussian",
  penalty = "MCP",
  scale = FALSE
) # Disabled only for simulation data
summary(e2)

# Example 3 (time-to-event outcome - survival HIMA):
head(SurvivalData$PhenoData)

e3 <- hima(Surv(Time, Status) ~ Treatment + Sex + Age,
  data.pheno = SurvivalData$PhenoData,
  data.M = SurvivalData$Mediator,
  mediator.type = "gaussian",
  penalty = "DBlasso",
  scale = FALSE
) # Disabled only for simulation data
summary(e3)

# Example 4 (compositional data as mediator, e.g., microbiome):
head(MicrobiomeData$PhenoData)

e4 <- hima(Outcome ~ Treatment + Sex + Age,
  data.pheno = MicrobiomeData$PhenoData,
  data.M = MicrobiomeData$Mediator,
  mediator.type = "compositional",
  penalty = "DBlasso"
) # Scaling is always enabled internally for hima_microbiome
summary(e4)

#' # Example 5 (quantile mediation anlaysis - quantile HIMA):
head(QuantileData$PhenoData)

# Note that the function will prompt input for quantile level.
e5 <- hima(Outcome ~ Treatment + Sex + Age,
  data.pheno = QuantileData$PhenoData,
  data.M = QuantileData$Mediator,
  mediator.type = "gaussian",
  quantile = TRUE,
  penalty = "MCP", # Quantile HIMA does not support DBlasso
  scale = FALSE, # Disabled only for simulation data
  tau = c(0.3, 0.5, 0.7)
) # Specify multiple quantile level
summary(e5)
} # }