Matching-Adjusted Indirect Comparison (MAIC)

Author

Xiaoge Zhang

Published

June 5, 2026

Bridging the Gap When Patient-Level Data is Only Partially Available

1 Executive Summary: Bridging Data Asymmetry in HTA

In real-world Health Technology Assessment (HTA) submissions to bodies like NICE, we often face a critical data asymmetry when trying to compare a new drug with its competitors:

  1. Our Drug (Drug A): We have conducted a randomized controlled trial (Drug A vs. Standard of Care) and possess the full Individual Patient Data (IPD).
  2. Competitor Drug (Drug B): The competitor has also run a trial against the same Standard of Care, but we only have access to the published Aggregate Data (AgD) from their literature.

While the evidence network is technically connected via the common comparator (Standard of Care), a standard Network Meta-Analysis (NMA) relies on the assumption that the patient populations across these trials are similar. In reality, they almost always differ (e.g., Trial A enrolled younger, healthier patients than Trial B). Directly pooling the relative effects without adjusting for these population differences yields biased estimates.

If we had IPD for both trials, we would perform a gold-standard Full IPD NMA. However, constrained by the competitor’s aggregate data, we must use a workaround.

Matching-Adjusted Indirect Comparison (MAIC) is a statistical workaround designed precisely for this “asymmetric data” scenario (specifically Anchored MAIC). It utilizes the flexibility of our IPD to adjust for population differences. MAIC assigns a statistical weight to every individual patient in Trial A such that the weighted baseline characteristics perfectly match the published aggregate characteristics of Trial B. Once the populations are aligned, we compare the relative treatment effects against the common comparator.

In essence, MAIC is a “pseudo-IPD” method. It steps down to the lowest common denominator (aggregate level) to facilitate a fair, covariate-adjusted comparison that would otherwise be impossible due to data limitations.

2 The Minimal Mock Case: Unanchored MAIC

2.1 The Scenario

We are evaluating a new oncology drug with a continuous outcome (e.g., reduction in tumor size).

  • Drug A (Our Drug): We have row-by-row IPD for \(N=200\) patients from a single-arm Phase II trial.
  • Drug B (Competitor): We only have a published paper showing aggregate baseline characteristics and an average outcome.

2.2 The Problem (Baseline Imbalance)

From the published paper for Drug B, we know the mean age of their patients was exactly 65.0 years. In our trial for Drug A, the mean age is 55.0 years. Because age is a known prognostic factor (older patients generally have worse outcomes), simply comparing the average outcome of A vs. B would unfairly favor Drug A, as our population is systematically younger and healthier.

2.3 The Goal

We need to calculate a weight \(w_i\) for each patient \(i\) in Trial A so that the weighted mean age of Trial A exactly equals 65.0, allowing for a fair “apples-to-apples” comparison.

3 Methodological Foundation: The Method of Moments

Unlike Bayesian NMA, the standard MAIC weight calculation is purely frequentist. The weights are modeled as an exponential function of the baseline covariates to ensure they remain strictly positive:

\[w_i = \exp(\alpha_0 + x_i^T \alpha_1)\]

Where:

  • \(w_i\) is the final weight assigned to patient \(i\) in Trial A.
  • \(x_i\) represents the centered covariates for patient \(i\) in Trial A (centered around the published aggregate mean of Trial B).
  • \(x_i^T\): The transpose of the covariate vector \(x_i\). This matrix notation represents the dot product of covariates and coefficients. For a single covariate, this simply implies scalar multiplication \(x_i \cdot \alpha_1\).
  • \(\alpha_1\) is the vector of coefficients we need to estimate via optimization.
  • \(\alpha_0\) is a scaling constant (not an optimization parameter). It is calculated after finding the optimal \(\alpha_1\) to ensure that the sum of the weights equals the original sample size \(N_A\) (i.e., \(\sum w_i = N_A\)). Mathematically, \(\exp(\alpha_0) = N_A / \sum \exp(x_i^T \alpha_1)\).

We find the optimal \(\alpha_1\) by minimizing an objective function (using the Method of Moments). Specifically, we minimize the sum of the unnormalized weights:

\[Q(\alpha_1) = \sum_{i=1}^N \exp(x_i^T \alpha_1)\]

By finding the \(\alpha_1\) that minimizes this convex function, its gradient becomes zero, which mathematically guarantees that the weighted mean of the centered covariates in Trial A equals zero (and thus matches Trial B).

3.1 Entropy Balancing vs. Inverse Probability Weighting (IPW)

While MAIC is often categorized broadly as a propensity score method, its mathematical engine is distinct from traditional Inverse Probability Weighting (IPW):

  • IPW (Propensity Score Weighting):
    • Mechanism: Estimates the probability of a patient belonging to Trial A vs. Trial B (usually via logistic regression) and applies weights proportional to 1/PS.
    • Limitations: Requires Individual Patient Data (IPD) for both trials. Furthermore, it does not guarantee exact moment matching in small samples; it only achieves balance asymptotically.
  • Entropy Balancing (The MAIC Approach):
    • Mechanism: Directly searches for weights that satisfy the moment conditions (e.g., sample mean of A = published mean of B) while keeping the weights as close to uniform as possible (maximizing the entropy of the weights).
    • Advantages: Only requires IPD for the index trial and Aggregate Data (AgD) for the target trial. It guarantees perfect alignment of the specified moments by design.

3.2 Effective Sample Size (ESS) Diagnosis

After calculating the MAIC weights, the most critical diagnostic metric to evaluate is the Effective Sample Size (ESS):

\[ESS = \frac{(\sum w_i)^2}{\sum w_i^2}\]

In HTA audits, the ESS is scrutinized under the following principles:

  1. The Threshold of Validity: A severe drop in ESS (e.g., from 200 patients down to an ESS of 20, or <15% of the original sample size) indicates that the populations of Trial A and Trial B have very little overlap.
  2. Weight Distribution: If the ESS is small, the matching relies heavily on just a few outlier patients in Trial A who happen to look like the average patient in Trial B. These patients will receive massive weights, making the final comparison highly unstable and unrepresentative.
  3. Reporting Requirements: Manufacturers are required to report both the original sample size and the ESS, alongside histograms of the weight distribution to prove that no single patient dominates the analysis.

4 Custom R Implementation: Calculating Weights from Scratch

To demonstrate a deep understanding of the underlying mathematics, we will manually implement the objective function and use R’s optim() to solve for the weights, rather than hiding behind a black-box package.

View the code
set.seed(123)

# ---------------------------------------------------------
# 1. Simulate Mock Data
# ---------------------------------------------------------
N_A <- 200
# Drug A (IPD): Mean age = 55, SD = 8
age_A <- rnorm(N_A, mean = 55, sd = 8)
# True outcome for Drug A (True effect = 10, minus 0.5 per year of age)
y_A <- rnorm(N_A, mean = 10 - 0.5 * (age_A - 55), sd = 2)
df_A <- data.frame(id = 1:N_A, age = age_A, y = y_A)

# Drug B (AgD published data)
mean_age_B <- 65.0
# True outcome for B if they were age 65 would be, say, 3.0
mean_y_B_published <- 3.0

# Naive comparison (Unadjusted)
mean_y_A_naive <- mean(df_A$y)

# ---------------------------------------------------------
# 2. Custom MAIC Weight Calculation (Method of Moments)
# ---------------------------------------------------------
# Step A: Center the IPD covariates around the AgD targets
# This makes the math easier: we want the weighted mean of the centered covariate to be 0.
df_A$age_centered <- df_A$age - mean_age_B

# Step B: Define the Objective Function to minimize
# The objective is sum(exp(X * alpha1))
obj_fn <- function(alpha1, X) {
    sum(exp(X %*% alpha1))
}

# Step C: Define the Gradient Function (for faster/stable optimization)
# The gradient is sum(X * exp(X * alpha1))
grad_fn <- function(alpha1, X) {
    colSums(X * as.vector(exp(X %*% alpha1)))
}

# Step D: Optimize to find alpha1
X_matrix <- as.matrix(df_A$age_centered)
opt_res <- optim(
    par = 0, # Initial guess
    fn = obj_fn,
    gr = grad_fn,
    X = X_matrix,
    method = "BFGS"
)
alpha1 <- opt_res$par

# Step E: Calculate individual weights
unnormalized_w <- exp(X_matrix %*% alpha1)
# Normalize so sum(w) = N_A (optional, but standard for interpretability)
w <- unnormalized_w / sum(unnormalized_w) * N_A
df_A$weight <- as.vector(w)

# ---------------------------------------------------------
# 3. Validation & Outcome Comparison
# ---------------------------------------------------------
# Check covariate balance
weighted_mean_age_A <- sum(df_A$age * df_A$weight) / sum(df_A$weight)

# Calculate Effective Sample Size (ESS)
ESS <- (sum(df_A$weight)^2) / sum(df_A$weight^2)

# Calculate MAIC-adjusted outcome for Drug A
mean_y_A_maic <- sum(df_A$y * df_A$weight) / sum(df_A$weight)

4.1 Results & Validation

After running the custom optimization, we can verify that the weights successfully matched the baseline characteristics:

Metric Drug A (Naive IPD) Drug A (MAIC Weighted) Drug B (AgD Target)
Mean Age 54.9 65 65
Effective Sample Size (ESS) 200 41 N/A
Mean Outcome \(y\) 10.12 5.37 3

Notice that the naive comparison (10.12 vs 3) is highly biased because Trial A patients are much younger. After MAIC weighting, the mean age of Trial A is artificially shifted to exactly 65, and the adjusted outcome drops to 5.37. This provides a much fairer “apples-to-apples” comparison against Drug B, albeit at the cost of statistical power (ESS dropped to 41).

5 Industry Standard Alternative: maicplus Package

While calculating the weights via optim() demonstrates the mathematical rigor behind the method, industry practitioners typically use dedicated R packages like maicplus (developed by the Roche team and available on CRAN) to streamline the process, handle multiple covariates, and automatically compute standard errors.

Here is how the above analysis would be executed using the maicplus package (syntax example):

View the code
library(maicplus)

# Add ARM column (required by maicplus by default)
df_A$ARM <- "A"

# Use the manually centered variable 'age_centered' created earlier
maic_res <- estimate_weights(
    data = df_A,
    centered_colnames = "age_centered"
)

# View Effective Sample Size (ESS) calculated by the package
cat("ESS calculated by maicplus: ", maic_res$ess, "\n")
ESS calculated by maicplus:  40.984 
View the code
# Calculate MAIC-adjusted outcome using the package's scaled weights
mean_y_A_maicplus <- sum(maic_res$data$y * maic_res$data$scaled_weights) / sum(maic_res$data$scaled_weights)
cat("Adjusted mean outcome: ", mean_y_A_maicplus, "\n")
Adjusted mean outcome:  5.370661 

6 Outcome Analysis: The Final Comparison

Now that we have the matched weights and the adjusted outcome for Drug A, we can complete the indirect comparison.

6.1 Point Estimate of the Treatment Effect

In our unanchored scenario, the comparative treatment effect (\(\Delta\)) is simply the difference between the MAIC-adjusted mean outcome of Drug A and the published mean outcome of Drug B.

View the code
# Point Estimate
treatment_effect <- mean_y_A_maic - mean_y_B_published
cat("Point Estimate of Treatment Effect (A vs B): ", treatment_effect, "\n")
Point Estimate of Treatment Effect (A vs B):  2.370661 

6.2 Quantifying Uncertainty via Bootstrapping

We cannot use standard t-test formulas to calculate the standard error of this treatment effect because the weights themselves are random variables estimated from the data.

In real-world HTA submissions, practitioners use Bootstrapping to calculate the 95% Confidence Interval. This involves repeatedly resampling the IPD data with replacement, recalculating the MAIC weights each time, and generating a distribution of the treatment effect.

Here is a minimal demonstration of the bootstrapping workflow (using 100 iterations for speed, though 2000+ is standard in practice):

View the code
n_boot <- 100
boot_effects <- numeric(n_boot)

for (b in 1:n_boot) {
    # Step A: Resample IPD with replacement
    boot_idx <- sample(1:N_A, size = N_A, replace = TRUE)
    df_A_boot <- df_A[boot_idx, ]

    # Step B: Recalculate weights (using our custom optim approach)
    X_boot <- as.matrix(df_A_boot$age_centered)
    opt_boot <- optim(par = 0, fn = obj_fn, gr = grad_fn, X = X_boot, method = "BFGS")

    w_boot <- exp(X_boot %*% opt_boot$par)
    w_boot <- w_boot / sum(w_boot) * N_A

    # Step C: Calculate weighted outcome for A and the Delta
    mean_y_A_boot <- sum(df_A_boot$y * w_boot) / sum(w_boot)
    boot_effects[b] <- mean_y_A_boot - mean_y_B_published
}

# Calculate 95% CI from percentiles
ci_lower <- quantile(boot_effects, 0.025)
ci_upper <- quantile(boot_effects, 0.975)
se_boot <- sd(boot_effects)
cat("Bootstrapped SE: ", se_boot, "\n")
Bootstrapped SE:  0.2711255 
View the code
cat("Bootstrapped 95% CI: [", ci_lower, ", ", ci_upper, "]\n")
Bootstrapped 95% CI: [ 1.822798 ,  2.895264 ]

6.3 Final Results

With the point estimate and the bootstrapped standard error, we can now present the complete MAIC indirect comparison results as they would appear in an HTA dossier:

Metric Estimate 95% CI Lower 95% CI Upper
Naive Difference 7.12 N/A N/A
MAIC Adjusted Difference 2.37 1.82 2.9

7 Sensitivity Analysis in MAIC

HTA agencies (such as NICE) rarely accept a single MAIC result. Because the method relies on the unverified assumption that all effect modifiers have been accounted for, rigorous sensitivity analyses are mandatory:

  1. Covariate Selection Scenarios: Modellers must present results using different sets of matching covariates:
    • Base Case: Matching only on covariates agreed upon in the Statistical Analysis Plan (SAP) or clinical consensus.
    • Scenario A: Adding borderline prognostic factors.
    • Scenario B: Dropping covariates with high missingness.
  2. Matching Higher Moments: Standard MAIC matches the means (first moment). Sensitivity analysis involves matching both the means and the variances (second moment, by including squared terms of covariates in the matching algorithm) to ensure the distribution shapes are comparable.
  3. The Bias-Variance Trade-off Curve: As more covariates are added to the matching algorithm, bias decreases but the weights become more extreme, causing the ESS to plummet (variance inflates). Presenting a curve of Treatment Effect vs. ESS helps reviewers judge the robustness of the matching strategy.