hima is a wrapper function designed to perform various HIMA methods for estimating and testing high-dimensional mediation effects.
hima can automatically select the appropriate HIMA method based on the outcome and mediator data type.
Usage
hima(
formula,
data.pheno,
data.M,
mediator.type = c("gaussian", "negbin", "compositional"),
penalty = c("DBlasso", "MCP", "SCAD", "lasso"),
quantile = FALSE,
efficient = FALSE,
longitudinal = FALSE,
id.var = NULL,
scale = TRUE,
sigcut = 0.05,
contrast = NULL,
subset = NULL,
verbose = FALSE,
parallel = FALSE,
ncore = 1,
...
)Arguments
- formula
an object of class
formularepresenting the overall effect model to be fitted, specified asoutcome ~ exposure + covariates. The "exposure" variable (the variable of interest) must be listed first on the right-hand side of the formula. For survival outcomes specified usingSurv(), the exposure should be the first variable after the~.- data.pheno
a data frame containing the exposure, outcome, and covariates specified in the formula. Variable names in
data.phenomust match those in the formula. Whenscale = TRUE, the exposure and covariates will be scaled (the outcome retains its original scale).- data.M
a
data.frameormatrixof high-dimensional mediators, with rows representing samples and columns representing mediator variables. Whenscale = TRUE,data.Mwill be scaled.- mediator.type
a character string indicating the data type of the high-dimensional mediators (
data.M). Options are:'gaussian'(default): for continuous mediators.'negbin': for count data mediators modeled using the negative binomial distribution (e.g., RNA-seq data).'compositional': for compositional data mediators (e.g., microbiome data).- penalty
a character string specifying the penalty method to apply in the model. Options are:
'DBlasso': De-biased LASSO (default).'MCP': Minimax Concave Penalty.'SCAD': Smoothly Clipped Absolute Deviation.'lasso': Least Absolute Shrinkage and Selection Operator. Note: Survival HIMA and microbiome HIMA can only be performed with'DBlasso'. Quantile HIMA and efficient HIMA cannot use'DBlasso'; they always apply'MCP'.- quantile
logical. Indicates whether to use quantile HIMA (
hima_quantile). Default isFALSE. Applicable only for classic HIMA with a continuous outcome andmediator.type = 'gaussian'. IfTRUE, specify the desired quantile(s) using thetauparameter; otherwise, the defaulttau = 0.5(i.e., median) is used.- efficient
logical. Indicates whether to use efficient HIMA (
hima_efficient). Default isFALSE. Applicable only for classic HIMA with a continuous outcome andmediator.type = 'gaussian'.- longitudinal
logical. Indicates whether to run the longitudinal survival mediation model
hima_survival_long(requires aSurv(tstart, tstop, status)outcome). Default =FALSE.- id.var
Character string specifying the column name in
data.phenothat stores subject identifiers. Required whenlongitudinal = TRUE.- scale
logical. Determines whether the function scales the data (exposure, mediators, and covariates). Default is
TRUE. Note: For simulation studies, setscale = FALSEto avoid estimate compression (i.e., shrinkage of estimates toward zero due to scaling).- sigcut
numeric. The significance cutoff for selecting mediators. Default is
0.05.- contrast
a named list of contrasts to be applied to factor variables in the covariates (cannot be the variable of interest).
- subset
an optional vector specifying a subset of observations to use in the analysis.
- verbose
logical. Determines whether the function displays progress messages. Default is
FALSE.- parallel
logical. Enable parallel computing feature? Default =
FALSE.- ncore
number of cores to run parallel computing Valid when
parallel = TRUE.- ...
reserved passing parameter (or for future use).
Value
A data.frame containing mediation testing results of selected mediators.
- ID:
Mediator ID/name.
- alpha:
Coefficient estimates of exposure (X) –> mediators (M) (adjusted for covariates).
- beta:
Coefficient estimates of mediators (M) –> outcome (Y) (adjusted for covariates and exposure).
- alpha*beta:
The estimated indirect (mediation) effect of exposure on outcome through each mediator.
- rimp:
Relative importance- the proportion of each mediator's mediation effect relative to the sum of the absolute mediation effects of all significant mediators.
- p-value:
The joint p-value assessing the significance of each mediator's indirect effect, calculated based on the corresponding statistical approach.
- tau:
The quantile level of the outcome (applicable only when using the quantile mediation model).
References
1. Zhang H, Zheng Y, Hou L, Liu L, HIMA: An R Package for High-Dimensional Mediation Analysis. Journal of Data Science. 2025. DOI: 10.6339/25-JDS1192
2. Zhang H, Zheng Y, Zhang Z, Gao T, Joyce B, Yoon G, Zhang W, Schwartz J, Just A, Colicino E, Vokonas P, Zhao L, Lv J, Baccarelli A, Hou L, Liu L. Estimating and Testing High-dimensional Mediation Effects in Epigenetic Studies. Bioinformatics. 2016. DOI: 10.1093/bioinformatics/btw351. PMID: 27357171; PMCID: PMC5048064
3. Zhang H, Zheng Y, Hou L, Zheng C, Liu L. Mediation Analysis for Survival Data with High-Dimensional Mediators. Bioinformatics. 2021. DOI: 10.1093/bioinformatics/btab564. PMID: 34343267; PMCID: PMC8570823
4. Zhang H, Chen J, Feng Y, Wang C, Li H, Liu L. Mediation Effect Selection in High-dimensional and Compositional Microbiome data. Stat Med. 2021. DOI: 10.1002/sim.8808. PMID: 33205470; PMCID: PMC7855955
5. Zhang H, Chen J, Li Z, Liu L. Testing for Mediation Effect with Application to Human Microbiome Data. Stat Biosci. 2021. DOI: 10.1007/s12561-019-09253-3. PMID: 34093887; PMCID: PMC8177450
6. Perera C, Zhang H, Zheng Y, Hou L, Qu A, Zheng C, Xie K, Liu L. HIMA2: High-dimensional Mediation Analysis and Its Application in Epigenome-wide DNA Methylation Data. BMC Bioinformatics. 2022. DOI: 10.1186/s12859-022-04748-1. PMID: 35879655; PMCID: PMC9310002
7. Zhang H, Hong X, Zheng Y, Hou L, Zheng C, Wang X, Liu L. High-Dimensional Quantile Mediation Analysis with Application to a Birth Cohort Study of Mother–Newborn Pairs. Bioinformatics. 2024. DOI: 10.1093/bioinformatics/btae055. PMID: 38290773; PMCID: PMC10873903
8. Bai X, Zheng Y, Hou L, Zheng C, Liu L, Zhang H. An Efficient Testing Procedure for High-dimensional Mediators with FDR Control. Statistics in Biosciences. 2024. DOI: 10.1007/s12561-024-09447-4.
9. Liu L, Zhang H, Zheng Y, Gao T, Zheng C, Zhang K, Hou L, Liu L. High-dimensional mediation analysis for longitudinal mediators and survival outcomes. Briefings in Bioinformatics. 2025. DOI: 10.1093/bib/bbaf206. PMID: 40350699 PMCID: PMC12066418
Examples
if (FALSE) { # \dontrun{
# Note: In the following examples, M1, M2, and M3 are true mediators.
# Example 1 (continuous outcome - linear HIMA):
data(ContinuousOutcome)
pheno_data <- ContinuousOutcome$PhenoData
mediator_data <- ContinuousOutcome$Mediator
e1 <- hima(Outcome ~ Treatment + Sex + Age,
data.pheno = pheno_data,
data.M = mediator_data,
mediator.type = "gaussian",
penalty = "MCP", # Can be "DBlasso" for hima_dblasso
scale = FALSE, # Disabled only for simulation data
verbose = TRUE
)
summary(e1)
# Efficient HIMA (only applicable to mediators and outcomes that are
# both continuous and normally distributed.)
e1e <- hima(Outcome ~ Treatment + Sex + Age,
data.pheno = pheno_data,
data.M = mediator_data,
mediator.type = "gaussian",
efficient = TRUE,
penalty = "MCP", # Efficient HIMA does not support DBlasso
scale = FALSE, # Disabled only for simulation data
verbose = TRUE
)
summary(e1e)
# Example 2 (binary outcome - logistic HIMA):
data(BinaryOutcome)
pheno_data <- BinaryOutcome$PhenoData
mediator_data <- BinaryOutcome$Mediator
e2 <- hima(Disease ~ Treatment + Sex + Age,
data.pheno = pheno_data,
data.M = mediator_data,
mediator.type = "gaussian",
penalty = "MCP",
scale = FALSE, # Disabled only for simulation data
verbose = TRUE
)
summary(e2)
# Example 3 (time-to-event outcome - survival HIMA):
data(SurvivalData)
pheno_data <- SurvivalData$PhenoData
mediator_data <- SurvivalData$Mediator
e3 <- hima(Surv(Time, Status) ~ Treatment + Sex + Age,
data.pheno = pheno_data,
data.M = mediator_data,
mediator.type = "gaussian",
penalty = "DBlasso",
scale = FALSE, # Disabled only for simulation data
verbose = TRUE
) # Parallel computing feature is recommended
summary(e3)
# Longitudinal mediator + survival HIMA:
data(SurvivalLongData)
pheno_data <- SurvivalLongData$PhenoData
mediator_data <- SurvivalLongData$Mediator
e3long <- hima(Surv(Tstart, Tstop, Status) ~ Treatment + Sex + Age,
data.pheno = pheno_data,
data.M = mediator_data,
mediator.type = "gaussian",
penalty = "lasso",
longitudinal = TRUE,
id.var = "ID",
scale = FALSE, # Disabled only for simulation data
verbose = TRUE
) # Parallel computing feature is recommended
summary(e3long)
# Example 4 (compositional data as mediator, e.g., microbiome):
data(MicrobiomeData)
pheno_data <- MicrobiomeData$PhenoData
mediator_data <- MicrobiomeData$Mediator
e4 <- hima(Outcome ~ Treatment + Sex + Age,
data.pheno = pheno_data,
data.M = mediator_data,
mediator.type = "compositional",
penalty = "DBlasso",
verbose = TRUE
) # Scaling is always enabled internally for hima_microbiome
summary(e4)
#' # Example 5 (quantile mediation analysis - quantile HIMA):
data(QuantileData)
pheno_data <- QuantileData$PhenoData
mediator_data <- QuantileData$Mediator
# Note that the function will prompt input for quantile level.
e5 <- hima(Outcome ~ Treatment + Sex + Age,
data.pheno = pheno_data,
data.M = mediator_data,
mediator.type = "gaussian",
quantile = TRUE,
penalty = "MCP", # Quantile HIMA does not support DBlasso
scale = FALSE, # Disabled only for simulation data
tau = c(0.3, 0.5, 0.7),
verbose = TRUE
) # Specify multiple quantile level
summary(e5)
} # }