Network Meta-Regression (NMR) & Random Effects

Author

Xiaoge Zhang

Published

June 5, 2026

Handling Heterogeneity and Ecological Bias with Aggregate Data (AD)

1 Executive Summary: The Heterogeneity Problem

In the previous guide, we used a Fixed Effects (FE) model, which assumes that the true effect of a drug is exactly identical across all studies in the universe. In reality, patient populations, dosing protocols, and baseline severity vary.

To handle this, we upgrade our model in two ways:

  1. Random Effects (RE): We allow the true effect to vary around a mean, capturing unexplainable heterogeneity (\(\tau^2\)).
  2. Network Meta-Regression (NMR): We attempt to explain some of that heterogeneity by introducing Covariates (Effect Modifiers).

In this guide, we use standard study-level Aggregate Data (AD).

2 Theoretical Framework: RE & NMR

2.1 The Random Effects Assumption

Instead of \(\delta_{jk} = d_{bk}\) (Fixed Effect), we now assume:

\[\delta_{jk} \sim \text{Normal}(d_{bk}, \tau^2)\]

Where \(\tau^2\) is the between-study variance. If \(\tau = 0\), the RE model collapses back into an FE model.

2.2 Network Meta-Regression (The Linear Predictor)

We expand the logit link function to include covariates as Effect Modifiers (interaction terms). Let \(I_{jk}\) be an indicator variable that equals \(0\) for the study’s baseline control arm, and \(1\) for the active treatment arm:

\[\text{logit}(p_{jk}) = \mu_j + \delta_{jk} + \left[ \beta_1 (X_{1j} - \bar{X}_1) + \beta_2 (X_{2j} - \bar{X}_2) \right] \times I_{jk}\]

Because \(I_{jk} = 0\) for the control arm and \(1\) for the treatment arm, the covariate term does not cancel out when calculating the relative effect (Log-Odds Ratio) between the active drug and the control:

\[\text{logit}(p_{1,trt}) - \text{logit}(p_{1,b}) = \delta_{1,trt} + \beta_1 (X_{11} - \bar{X}_1) + \beta_2 (X_{21} - \bar{X}_2)\]

2.3 Identification of Effect Modifiers (HTA Audit Point)

In HTA audits, you cannot arbitrarily include covariates in a regression model. The identification of valid effect modifiers typically follows three core principles:

  1. Clinical and Biological Plausibility: The covariate must be justified by clinical experts or previous literature as a potential factor influencing treatment effect (e.g., baseline disease severity, prior treatment history).
  2. Imbalance across Studies: If a covariate is balanced across all studies in the network (e.g., mean age is 55-57 years in all trials), it cannot explain heterogeneity or inconsistency and does not need to be included in the regression. It is the imbalance across studies that warrants adjustment.
  3. Statistical Support: Adding the covariate should ideally lead to a meaningful reduction in the model’s DIC or between-study heterogeneity (\(\tau\)).

3 The Mock Case: Severe Plaque Psoriasis

3.1 The Clinical Scenario

  • Disease: Severe Plaque Psoriasis.
  • Treatments:
    • A: Methotrexate (Standard of Care / Control)
    • B: TNF-\(\alpha\) inhibitor (e.g., Adalimumab mock)
    • C: IL-17 inhibitor (e.g., Secukinumab mock)

3.2 The Aggregate Covariates

We identified two potential Effect Modifiers:

  1. PASI_Base (Continuous): The average baseline PASI score (Psoriasis Area and Severity Index). Centered around 20.
  2. Prior_Bio (Continuous Proportion): The proportion of patients in the trial who previously failed a biologic therapy (0.0 to 1.0).

3.3 The Mock Data

To estimate a Random Effect, we need at least one comparison with multiple studies (S1 and S2). We introduce 4 studies:

Study Arm 1 (Ctrl) Arm 2 (Active) \(n_1\) \(r_1\) \(n_2\) \(r_2\) PASI_Base (\(X_1\)) Prior_Bio (\(X_2\))
S1 A B 100 25 100 50 22 0.20
S2 A B 100 30 100 45 18 0.10
S3 A C 100 20 100 60 24 0.40
S4 B C 100 40 100 55 20 0.25

Network Mean Covariates: \(\bar{X}_1\) = 21, \(\bar{X}_2 =\) 0.2375.

4 MCMC Forensic Audit: JAGS & Stan Implementation

4.1 BUGS/JAGS Implementation

Notice how we define the tau (standard deviation) and convert it to prec (precision) for the normal distribution in BUGS.

model {
  for (j in 1:NS) {
    # Baseline for arm 1
    w[j,1] <- 0
    theta[j,1] <- 0
    mu[j] ~ dnorm(0, 0.0001) # Prior for study baseline
    
    for (k in 1:na[j]) {
      r[j,k] ~ dbin(p[j,k], n[j,k])
     
      
1      logit(p[j,k]) <- mu[j] + theta[j,k] +
2                       beta1 * (cov1[j] - mean_cov1) * (t[j,k] != t[j,1]) +
                       beta2 * (cov2[j] - mean_cov2) * (t[j,k] != t[j,1])
    }
    
    for (k in 2:na[j]) {      
3      theta[j,k] ~ dnorm(md[j,k], prec)
      md[j,k] <- d[t[j,k]] - d[t[j,1]]
    }
  }
  
  # Priors
  d[1] <- 0 # Reference treatment (A)
  for (k in 2:NT) { 
    d[k] ~ dnorm(0, 0.0001) 
  }
  
  # Heterogeneity Prior (Uniform)
  tau ~ dunif(0, 5)  # tau (standard deviation)
  prec <- 1 / tau^2
  
4  beta1 ~ dnorm(0, 0.0001)
  beta2 ~ dnorm(0, 0.0001)
}
1
Logit with Common Beta Regression
2
(t[j,k] != t[j,1]) is a dynamic 0/1 indicator: 1 for active drug, 0 for control. If k=1, which indicates the control group, (t[j,1] != t[j,1]) returns false (0). This avoids having to define a separate 0/1 matrix in the R data list.
3
Random Effects: theta (\(\delta_{jk}\) in model) is drawn from Normal with mean md, precision prec
4
Regression Coefficient Priors

To run this model in R using rjags:

View the code
library(rjags)

jags_data <- list(
    NS = 4,
    NT = 3,
    r = matrix(c(25, 50, 30, 45, 20, 60, 40, 55), nrow = 4, byrow = TRUE),
    n = matrix(c(100, 100, 100, 100, 100, 100, 100, 100), nrow = 4, byrow = TRUE),
    t = matrix(c(1, 2, 1, 2, 1, 3, 2, 3), nrow = 4, byrow = TRUE),
    na = rep(2, 4),
    cov1 = c(22, 18, 24, 20),
    cov2 = c(0.20, 0.10, 0.40, 0.25),
    mean_cov1 = 21,
    mean_cov2 = 0.2375
)

jags_model <- jags.model(
    file = "RE_NMR_model.txt",
    data = jags_data,
    n.chains = 3,
    quiet = TRUE
)
samples <- coda.samples(
    jags_model,
    variable.names = c("d", "beta1", "beta2", "tau"),
    n.iter = 10000,
    progress.bar = "none"
)
summary(samples)

Iterations = 1001:11000
Thinning interval = 1 
Number of chains = 3 
Sample size per chain = 10000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

         Mean     SD Naive SE Time-series SE
beta1  0.2617  1.419 0.008195        0.38283
beta2 -9.1127 45.495 0.262664       17.89867
d[1]   0.0000  0.000 0.000000        0.00000
d[2]   0.6012  2.480 0.014319        0.33051
d[3]   2.0360  2.717 0.015685        0.24324
tau    2.1816  1.444 0.008339        0.09669

2. Quantiles for each variable:

           2.5%      25%      50%    75%  97.5%
beta1   -2.0077  -0.6683 0.068121  0.920  3.703
beta2 -126.9550 -21.4861 0.009174 20.404 54.639
d[1]     0.0000   0.0000 0.000000  0.000  0.000
d[2]    -5.7075  -0.3014 0.919227  1.888  5.044
d[3]    -2.8987   0.6857 1.715151  2.972  8.994
tau      0.0614   0.9397 1.990541  3.381  4.843

4.2 Stan Implementation (Non-Centered Parameterization)

Stan struggles with centered Random Effects when data is sparse (creating “funnel” geometries in the posterior). We use Non-Centered Parameterization to ensure fast, divergence-free sampling.

data {
1  int<lower=1> NS;
  int<lower=1> NT;
  int r[NS, 2];    
  int n[NS, 2];    
  int t[NS, 2];    // Treatment index for each arm
  real cov1[NS];
  real cov2[NS];
}

transformed data {
  real mean_cov1 = mean(cov1);
  real mean_cov2 = mean(cov2);
}

parameters {
  real mu[NS];          
  real d[NT];           // Global treatment effects
2  real<lower=0> tau;
  real z[NS, 2];        // Standard normal noise for non-centered RE
  real beta1;
  real beta2;
}

transformed parameters {
  real theta[NS, 2];
  for (j in 1:NS) {
    theta[j, 1] = 0;
    for (k in 2:2) {      
3      theta[j, k] = (d[t[j, k]] - d[t[j, 1]]) + tau * z[j, k];
    }
  }
}

model {
  // Priors  
  // for (k in 2:NT) {
  //   d[2] ~ normal(0, 100);
  //   d[3] ~ normal(0, 100);}
  mu ~ normal(0, 100);
  d[1] ~ normal(0, 0.001); // Fix reference to 0
  d[2] ~ normal(0, 100);
  d[3] ~ normal(0, 100);
  tau ~ uniform(0, 5);
  beta1 ~ normal(0, 100);
  beta2 ~ normal(0, 100);
  
  for (j in 1:NS) {
    for (k in 2:2) {
      z[j, k] ~ std_normal(); // Draw noise
    }
  }

  // Likelihood
  for (j in 1:NS) {
    for (k in 1:2) {
4      real is_active = (t[j, k] != t[j, 1]) ? 1.0 : 0.0;
      real linpred = mu[j] + theta[j, k] + 
                     beta1 * (cov1[j] - mean_cov1) * is_active + 
                     beta2 * (cov2[j] - mean_cov2) * is_active;
      r[j, k] ~ binomial_logit(n[j, k], linpred);
    }
  }
}
1
<lower=1> is a constraint (data check): must be >= 1
2
<lower=0> constraint stops sampler from guessing invalid negative SD
3
Non-centered parameterization: mean + sd * standard_normal
4
Ternary operator: condition ? value_if_true : value_if_false. Returns 1.0 for active arms and 0.0 for control arms. Shorthand for if-else.

To run this model in R using rstan:

View the code
library(rstan)
stan_data <- list(
    NS = 4,
    NT = 3,
    r = matrix(c(25, 50, 30, 45, 20, 60, 40, 55), nrow = 4, byrow = TRUE),
    n = matrix(c(100, 100, 100, 100, 100, 100, 100, 100), nrow = 4, byrow = TRUE),
    t = matrix(c(1, 2, 1, 2, 1, 3, 2, 3), nrow = 4, byrow = TRUE),
    cov1 = c(22, 18, 24, 20),
    cov2 = c(0.20, 0.10, 0.40, 0.25)
)
fit <- stan(
    file = "RE_NMR_model.stan",
    data = stan_data,
    chains = 3,
    iter = 2000,
    warmup = 1000,
    refresh = 0
)
print(fit, pars = c("d", "beta1", "beta2", "tau"))
Inference for Stan model: anon_model.
3 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=3000.

       mean se_mean    sd    2.5%    25%   50%   75% 97.5% n_eff Rhat
d[1]   0.00    0.00  0.00    0.00   0.00  0.00  0.00  0.00  1829 1.00
d[2]   0.85    0.22  2.51   -4.94  -0.35  0.84  1.99  6.45   126 1.04
d[3]   1.66    0.13  2.28   -3.57   0.54  1.65  2.82  6.47   312 1.00
beta1  0.27    0.15  1.81   -3.21  -0.67  0.15  1.02  4.70   141 1.03
beta2 -4.22    3.59 43.45 -111.03 -24.28 -1.60 20.43 78.06   146 1.03
tau    2.35    0.09  1.44    0.08   1.10  2.27  3.57  4.85   239 1.02

Samples were drawn using NUTS(diag_e) at Tue May 12 20:11:27 2026.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

4.3 Experiment: Doubling the Number of Studies (N=8)

To test the hypothesis that increasing the number of studies will reduce the discrepancy between JAGS (Gibbs + Centered) and Stan (NUTS + Non-Centered), we simulate a larger dataset by duplicating our 4 studies. This keeps the data patterns identical but doubles the effective sample size at the study level from \(N=4\) to \(N=8\).

4.3.1 Running Expanded JAGS

View the code
jags_data_x2 <- list(
    NS = 8,
    NT = 3,
    r = matrix(c(
        25, 50, 30, 45, 20, 60, 40, 55,
        25, 50, 30, 45, 20, 60, 40, 55
    ), nrow = 8, byrow = TRUE),
    n = matrix(rep(100, 16), nrow = 8, byrow = TRUE),
    t = matrix(c(
        1, 2, 1, 2, 1, 3, 2, 3,
        1, 2, 1, 2, 1, 3, 2, 3
    ), nrow = 8, byrow = TRUE),
    na = rep(2, 8),
    cov1 = c(22, 18, 24, 20, 22, 18, 24, 20),
    cov2 = c(0.20, 0.10, 0.40, 0.25, 0.20, 0.10, 0.40, 0.25),
    mean_cov1 = 21,
    mean_cov2 = 0.2375
)

jags_model_x2 <- jags.model(
    file = "RE_NMR_model.txt",
    data = jags_data_x2,
    n.chains = 3,
    quiet = TRUE
)
samples_x2 <- coda.samples(
    jags_model_x2,
    variable.names = c("d", "beta1", "beta2", "tau"),
    n.iter = 10000,
    progress.bar = "none"
)
summary(samples_x2)

Iterations = 1001:11000
Thinning interval = 1 
Number of chains = 3 
Sample size per chain = 10000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

         Mean     SD Naive SE Time-series SE
beta1  0.1564 0.2364 0.001365        0.02213
beta2 -1.7485 5.9557 0.034385        0.60473
d[1]   0.0000 0.0000 0.000000        0.00000
d[2]   0.8783 0.3016 0.001741        0.02345
d[3]   1.6444 0.2940 0.001698        0.01532
tau    0.2043 0.2246 0.001297        0.01598

2. Quantiles for each variable:

            2.5%      25%     50%    75%   97.5%
beta1  -0.297205  0.01144  0.1583 0.2937  0.6264
beta2 -13.790012 -5.31775 -1.8778 1.8480 10.3974
d[1]    0.000000  0.00000  0.0000 0.0000  0.0000
d[2]    0.287208  0.70372  0.8805 1.0504  1.4778
d[3]    1.055541  1.47174  1.6470 1.8153  2.2137
tau     0.004264  0.06429  0.1400 0.2574  0.8704

4.3.2 Running Expanded Stan

View the code
stan_data_x2 <- list(
    NS = 8,
    NT = 3,
    r = matrix(c(
        25, 50, 30, 45, 20, 60, 40, 55,
        25, 50, 30, 45, 20, 60, 40, 55
    ), nrow = 8, byrow = TRUE),
    n = matrix(rep(100, 16), nrow = 8, byrow = TRUE),
    t = matrix(c(
        1, 2, 1, 2, 1, 3, 2, 3,
        1, 2, 1, 2, 1, 3, 2, 3
    ), nrow = 8, byrow = TRUE),
    cov1 = c(22, 18, 24, 20, 22, 18, 24, 20),
    cov2 = c(0.20, 0.10, 0.40, 0.25, 0.20, 0.10, 0.40, 0.25)
)

fit_x2 <- stan(
    file = "RE_NMR_model.stan",
    data = stan_data_x2,
    chains = 3,
    iter = 2000,
    warmup = 1000,
    refresh = 0
)
print(fit_x2, pars = c("d", "beta1", "beta2", "tau"))
Inference for Stan model: anon_model.
3 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=3000.

       mean se_mean   sd   2.5%   25%   50%  75% 97.5% n_eff Rhat
d[1]   0.00    0.00 0.00   0.00  0.00  0.00 0.00  0.00  3202    1
d[2]   0.86    0.01 0.30   0.27  0.68  0.86 1.04  1.43  2836    1
d[3]   1.65    0.01 0.29   1.07  1.48  1.66 1.83  2.22  2801    1
beta1  0.17    0.00 0.23  -0.27  0.02  0.17 0.31  0.62  2664    1
beta2 -2.18    0.11 5.91 -13.81 -5.76 -2.20 1.65  9.24  2725    1
tau    0.19    0.00 0.19   0.01  0.06  0.14 0.25  0.74  2203    1

Samples were drawn using NUTS(diag_e) at Tue May 12 20:11:48 2026.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Notice if the means of d[2], d[3], and tau are now closer between the two engines compared to the 4-study case.

4.3.3 Comparison Table

NoteComparison Table
Parameter (Median) N=4 (JAGS) N=4 (Stan) N=8 (JAGS) N=8 (Stan)
d[2] 0.92 0.84 0.88 0.86
d[3] 1.72 1.65 1.65 1.66
tau 1.99 2.27 0.14 0.14

4.4 Drug Ranking and SUCRA Calculation Logic

To answer the strategic question “Which drug is the best?”, we calculate the rank probabilities and the SUCRA (Surface Under the Cumulative Ranking curve).

In the MCMC sampling of Bayesian NMA, each iteration outputs a specific set of values for all parameters. For our constructed comparison matrix: - SoC (A) is hardcoded to the baseline value of \(0\) in the R code. - Drug 2 (B) outputs a sampled value in a given iteration (e.g., \(0.84\)). - Drug 3 (C) outputs a sampled value in the same iteration (e.g., \(1.67\)).

We compare these three values horizontally in each iteration. Since a larger value represents better efficacy (larger Log-Odds means higher response rate), the drug with the largest value in that iteration receives Rank 1. After summarizing thousands of iterations, the proportion of times a drug achieves the first place is its Rank 1 probability.

SUCRA goes a step further, measuring the average probability of a drug “beating” all other drugs. Its calculation formula is the sum of the first \(T-1\) cumulative ranking probabilities divided by \(T-1\) (where \(T\) is the total number of drugs). A SUCRA closer to \(1\) indicates better efficacy.

Below, we use the posterior samples from the \(N=8\) Stan model for the calculation.

View the code
# Construct full matrix (SoC fixed at 0, plus samples of d[2] and d[3])
effects_mat <- cbind(
    `SoC (A)` = 0,
    `Drug 2 (B)` = stan_n8_mat[, "d[2]"],
    `Drug 3 (C)` = stan_n8_mat[, "d[3]"]
)

# Calculate ranks row by row (each iteration). Larger values rank higher (1 is best)
# R's rank defaults to ascending order, so we rank the negative values
ranks_mat <- t(apply(-effects_mat, 1, rank))

# Calculate the probability of each drug at each rank
n_iters <- nrow(ranks_mat)
n_tx <- ncol(ranks_mat)
rank_probs <- matrix(0, nrow = n_tx, ncol = n_tx)
rownames(rank_probs) <- colnames(effects_mat)
colnames(rank_probs) <- paste0("Rank_", 1:n_tx)

for (i in 1:n_tx) {
    for (j in 1:n_tx) {
        rank_probs[i, j] <- sum(ranks_mat[, i] == j) / n_iters
    }
}

# Calculate SUCRA
sucra <- rep(0, n_tx)
names(sucra) <- colnames(effects_mat)
for (i in 1:n_tx) {
    cum_probs <- cumsum(rank_probs[i, ])
    # SUCRA formula: sum of first T-1 cumulative probabilities / (T-1)
    sucra[i] <- sum(cum_probs[1:(n_tx - 1)]) / (n_tx - 1)
}

# Combine and display
ranking_summary <- cbind(rank_probs, SUCRA = sucra)
knitr::kable(round(ranking_summary, 3), caption = "Rank Probabilities and SUCRA")
Rank Probabilities and SUCRA
Rank_1 Rank_2 Rank_3 SUCRA
SoC (A) 0.000 0.006 0.994 0.003
Drug 2 (B) 0.039 0.955 0.006 0.516
Drug 3 (C) 0.961 0.039 0.000 0.981

4.5 Consistency Check

A core assumption of NMA is consistency: direct and indirect evidence should agree within any closed loop. Our network forms a closed loop among treatments A, B, and C:

  • A vs B: Studies 1 & 2
  • A vs C: Study 3
  • B vs C: Study 4

4.5.1 Loop Inconsistency Model

To test consistency, we fit an Inconsistency Model by adding an inconsistency parameter \(\omega\) (omega) to the B vs C comparison in Study 4. In a consistent network, \(\omega\) should be close to 0.

data {
  int<lower=1> NS; 
  int<lower=1> NT;
  int r[NS, 2];    
  int n[NS, 2];    
  int t[NS, 2];    
  real cov1[NS];
  real cov2[NS];
}

transformed data {
  real mean_cov1 = mean(cov1);
  real mean_cov2 = mean(cov2);
}

parameters {
  real mu[NS];          
  real d[NT];           
  real<lower=0> tau;    
  real z[NS, 2];        
  real beta1;
  real beta2;
  real omega;           // <1>
}

transformed parameters {
  real theta[NS, 2];
  for (j in 1:NS) {
    theta[j, 1] = 0;
    for (k in 2:2) {
      real inc = (j == 4) ? omega : 0.0; // <2>
      theta[j, k] = (d[t[j, k]] - d[t[j, 1]]) + inc + tau * z[j, k];
    }
  }
}

model {
  // Priors
  mu ~ normal(0, 100);
  for (k in 2:NT) {
    d[k] ~ normal(0, 100);
  }
  d[1] ~ normal(0, 0.001); 
  
  tau ~ uniform(0, 5);
  beta1 ~ normal(0, 100);
  beta2 ~ normal(0, 100);
  omega ~ normal(0, 100); 
  
  for (j in 1:NS) {
    for (k in 2:2) {
      z[j, k] ~ std_normal(); 
    }
  }

  // Likelihood
  for (j in 1:NS) {
    for (k in 1:2) {
      real is_active = (t[j, k] != t[j, 1]) ? 1.0 : 0.0;
      real linpred = mu[j] + theta[j, k] + 
                     beta1 * (cov1[j] - mean_cov1) * is_active + 
                     beta2 * (cov2[j] - mean_cov2) * is_active;
      r[j, k] ~ binomial_logit(n[j, k], linpred);
    }
  }
}
  1. omega is the inconsistency parameter for the B vs C loop.
  2. If j == 4 (Study 4 which compares B vs C), we add omega to the relative effect.
View the code
# Run the inconsistency model
fit_inc <- stan(
    file = "RE_NMR_inconsistency_model.stan",
    data = stan_data, # Use the same 4-study data
    chains = 3,
    iter = 2000,
    warmup = 1000,
    refresh = 0
)

# Extract omega
print(fit_inc, pars = c("omega", "d", "tau"))
Inference for Stan model: anon_model.
3 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=3000.

      mean se_mean    sd   2.5%   25%  50%   75% 97.5% n_eff Rhat
omega 2.05    1.05 11.76 -20.23 -5.92 1.61 10.13 26.03   126 1.01
d[1]  0.00    0.00  0.00   0.00  0.00 0.00  0.00  0.00  1645 1.00
d[2]  1.83    0.51  5.98  -9.38 -2.29 1.72  5.97 13.29   138 1.01
d[3]  0.37    0.69  8.69 -16.94 -5.38 0.66  6.16 16.62   161 1.01
tau   2.53    0.09  1.42   0.15  1.29 2.56  3.75  4.86   229 1.03

Samples were drawn using NUTS(diag_e) at Tue May 12 20:12:09 2026.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

If the 95% Credible Interval of omega includes 0, we fail to reject the consistency assumption, meaning the direct and indirect evidence are statistically consistent.

4.5.2 Node Splitting

Node splitting is another method to assess inconsistency in a specific comparison by separating the direct and indirect evidence.

Mathematical Logic: For a specific comparison between treatment \(k\) and \(i\):

  1. Direct Evidence: We estimate the treatment effect \(d_{ki}^{\text{dir}}\) using only the trials directly comparing \(k\) and \(i\).
  2. Indirect Evidence: We estimate the treatment effect \(d_{ki}^{\text{ind}}\) using the rest of the network (excluding the direct trials for \(k\) vs \(i\)), leveraging the consistency constraints (e.g., \(d_{ki}^{\text{ind}} = d_{1i} - d_{1k}\)).
  3. Inconsistency Factor: We calculate the difference \(\omega_{ki} = d_{ki}^{\text{dir}} - d_{ki}^{\text{ind}}\).
  4. Hypothesis Test: We test the null hypothesis \(H_0: \omega_{ki} = 0\). If the 95% credible interval of \(\omega_{ki}\) excludes 0, or the Bayesian \(p\)-value is small (e.g., \(< 0.05\)), we conclude there is significant inconsistency for this node.

This process is repeated for all nodes (comparisons) in the network where both direct and indirect evidence exist.

4.5.3 Unrelated Mean Effects (UME) Model

The UME model is a global method to detect inconsistency by relaxing the consistency assumption across the entire network.

Logic & Implementation:

  • In a Consistency Model, we assume \(d_{BC} = d_{AC} - d_{AB}\). We only estimate basic parameters (relative to a common reference).
  • In a UME Model (also known as the independent effects model), we treat ALL comparisons as independent. We do not enforce the transitivity relation. For a network with \(T\) treatments, instead of estimating \(T-1\) basic parameters, we estimate all possible pairwise comparisons directly from the data.
  • Implementation: In Stan/JAGS, we remove the constraint delta[3, 2] = d_AC - d_AB and instead estimate delta[3, 2] as an independent parameter (e.g., d_BC).
  • Audit Focus: We compare the fit of the Consistency model and the UME model using DIC. If the UME model has a substantially lower DIC (e.g., \(> 5\) points), it indicates that the consistency assumption is violated in the network.

4.5.4 Inconsistency Management Decision Tree

When inconsistency is detected (via Node Splitting or UME), HTA guidelines (like NICE DSU Technical Support Document 4) recommend the following decision tree:

  1. Step 1: Check for Data Errors
    • Verify data extraction (arm-level events and sample sizes).
    • Ensure consistent definition of outcomes across trials.
  2. Step 2: Assess Clinical and Methodological Heterogeneity (Effect Modifiers)
    • Are there differences in patient baseline characteristics (e.g., age, disease severity)?
    • Are there differences in study design (e.g., blinding, follow-up duration)?
    • If effect modifiers are distributed unevenly across comparisons, Network Meta-Regression (NMR) should be used to adjust for them (as demonstrated in Section 2).
  3. Step 3: Sensitivity Analysis or Subgroup Analysis
    • Can the network be split into more homogeneous subgroups?
    • Can outlier studies (e.g., small studies with extreme effects) be removed in a sensitivity analysis?
  4. Step 4: Use a More Flexible Model
    • If heterogeneity cannot be fully explained by covariates, a Random Effects model should be used to account for the unexplained variance.
  5. Step 5: Abandon NMA (Last Resort)
    • If inconsistency is severe and cannot be resolved or explained, a full network synthesis may be misleading. Recur to presenting separate pairwise meta-analyses or a qualitative synthesis.

5 Results & Interpretation: Covariate Adjustment

After extracting the posteriors, we evaluate if \(\beta_1\) and \(\beta_2\) are significant (e.g., if their 95% CrI crosses 0).

  • If \(\beta_1 > 0\): Higher baseline PASI means the active drugs perform even better relative to SoC.
  • If \(\beta_2 < 0\): More patients with prior biologic failure means the active drugs’ relative advantage shrinks (a harder-to-treat population).

The final League Table will now represent the comparative efficacy at the average covariate levels of the network.

6 Strategic Q&A: The Limits of AD NMR

No. While NMR attempts to adjust the intercept based on the study average, it suffers from the Ecological Fallacy. Adjusting based on a trial’s average 60% rate is a crude proxy. It assumes every patient has a “60% probability” of being a prior user, completely ignoring the patient-level joint distribution (e.g., maybe all the prior biologic users are also the oldest patients). To truly adjust for this without bias, you need to use MAIC to reweight your IPD to match their baseline characteristics, or use ML-NMR to integrate over the full patient parameter space.

This is a classic comparison in HTA audits regarding evidence synthesis:

  1. Aggregate-level (Study level) Regression:
    • Data: Uses the mean value of covariates reported in the publication (e.g., the average age of patients in trial A is 55 years).
    • Limitation: Subject to the Ecological Fallacy. We can only explore between-study heterogeneity. It assumes the relationship at the study level holds at the patient level, which is often not true.
  2. IPD-level (Patient level) Regression:
    • Data: Uses the individual characteristics of each patient.
    • Advantage: Can accurately estimate how individual characteristics affect treatment response, completely avoiding the Ecological Fallacy. It can simultaneously explore both within-study and between-study interactions.
  3. The Reality in HTA:
    • Usually, a manufacturer has IPD for their own trials but only Aggregate Data (AD) extracted from publications for competitor trials. This data asymmetry prevents running a full IPD NMA and leads to the need for population adjustment methods like MAIC or ML-NMR (which will be covered in files 06 and 08).

Guide Authored by: Antigravity (Advanced Agentic Coding)