Survival NMA & Other Data Types

Author

Xiaoge Zhang

Published

June 5, 2026

Extending NMA to Continuous, Count, and Time-to-Event (Survival) Data

1 Bridging the Gap: Continuous and Count Data

In our previous guides, we dealt exclusively with Binomial Data (responders out of total patients) using a Binomial likelihood and a Logit link. Before diving into Survival NMA, let’s briefly map out the likelihoods and link functions for other standard data types. The core Bayesian MCMC engine remains exactly the same; we only swap out the statistical distribution mapping.

1.1 Continuous Data: Change from Baseline (CfB) vs. Final Values

In clinical trials for conditions like hypertension, continuous outcomes are typically reported in two ways:

Change from Baseline (CfB): The absolute difference between post-treatment and pre-treatment values (e.g., \(-15\) mmHg).
Final Values: The absolute value measured at the end of the study (e.g., \(120\) mmHg).

In NMA, both are modeled using a Normal Likelihood with an Identity Link. However, the choice between them impacts the interpretation and requires specific handling of variance.

Contrast-Based NMA for Aggregate Data (AD)

When extracting data from publications, we usually obtain the sample mean (\(y_{jk}\)) and its standard error (\(se_{jk}\)) for each arm \(k\) in study \(j\).

Likelihood: \(y_{jk} \sim \text{Normal}(\theta_{jk}, se_{jk}^2)\)
Linear Predictor: \(\theta_{jk} = \mu_j + \delta_{jk}\) (where \(\delta_{j1} = 0\))

The effect measure synthesized across the network is the Mean Difference (MD).

The “Change from Baseline” Caveat

If some studies report CfB and others report Final Values, they can often be combined in the same NMA because the relative effect (difference between arms) should theoretically be the same.

However, the variance (SD) of CfB is usually smaller than the SD of Final Values because it removes between-patient baseline variability. If the SD for CfB is missing, it cannot be simply imputed from the Final Value SD without accounting for the baseline-post correlation (\(\rho\)).

1.2 Rate / Count Data (e.g., Number of Asthma Exacerbations)

In trials for chronic diseases (like asthma or COPD), we often count the number of events (exacerbations) over a period of time. This is modeled using a Poisson Likelihood with a Log Link.

Likelihood: \(r_{jk} \sim \text{Poisson}(\lambda_{jk} \times E_{jk})\)
Link Function: \(\log(\lambda_{jk}) = \mu_j + \delta_{jk}\)

(Where \(r_{jk}\) is the observed number of events, \(E_{jk}\) is the total exposure time (e.g., person-years) in arm \(k\) of study \(j\), and \(\lambda_{jk}\) is the underlying event rate).

The effect measure synthesized here is the Rate Ratio (RR) or Incidence Rate Ratio (IRR).

The Overdispersion Problem (Poisson vs. Negative Binomial)

A strict assumption of the Poisson distribution is that the mean equals the variance. In real clinical trials, patient heterogeneity often causes the variance to exceed the mean, a phenomenon known as Overdispersion.

The Risk: If overdispersion is present but ignored in a Poisson model, the standard errors will be underestimated, leading to artificially narrow credible intervals and false claims of statistical significance.
The Solution in NMA:
1. Upgrade to a Negative Binomial likelihood, which adds a dispersion parameter to allow the variance to exceed the mean.
2. Or, add a random effect term \(\epsilon_{jk}\) to the log-linear predictor to account for the extra-Poisson variation.

2 Executive Summary: The Survival Challenge in HTA

Time-to-Event Data (e.g., Overall Survival (OS) or Progression-Free Survival (PFS)) is the lifeblood of Oncology Health Technology Assessment (HTA).

It poses a unique challenge: Censoring. We don’t know the exact time of death for patients who are still alive at the end of the trial. Furthermore, HTA agencies (like NICE in the UK) require modeling lifetimes (e.g., over a 30-year time horizon), forcing us to extrapolate survival curves far beyond the observed trial data.

There are two primary ways to handle survival data in NMA, depending on the granularity of the available data and the validity of the proportional hazards assumption:

Contrast-Based Approach (Synthesizing Hazard Ratios):
- What it means: This approach focuses on the relative treatment effect (the contrast) between trial arms, typically reported as a Hazard Ratio (HR). You extract the reported HR and its 95% Confidence Interval from each study.
- The Logic: You treat the log-transformed Hazard Ratio (\(\log(HR)\)) and its standard error as a continuous outcome. The NMA then synthesizes these relative effects across the network using a standard Normal likelihood, just like you would with Mean Differences in continuous data.
- When to use: When you only have access to published aggregate data (no KM curves) and the Proportional Hazards assumption holds.
Arm-Based Approach (Parametric Curve Fitting):
- What it means: This approach bypasses the reported HRs and models the absolute survival time in each treatment arm directly.
- The Logic: You use algorithms (like Guyot’s method) to digitize published Kaplan-Meier curves and reconstruct pseudo-Individual Patient Data (IPD). You then fit full parametric survival distributions (e.g., Weibull, Gompertz, or Fractional Polynomials) to the data. The treatment effect is modeled as a shift in the parameters of these distributions.
- When to use: When proportional hazards are violated (e.g., crossing KM curves in immunotherapy) and you need full survival curves to extrapolate over a long time horizon for cost-effectiveness models.

3 The Minimal Mock Case: Advanced Melanoma

To ground both approaches in a realistic clinical context, we define a shared simulation scenario representative of modern oncology health technology assessments.

Disease: Advanced Melanoma (metastatic or unresectable).
Treatments:
1. Dacarbazine (Chemotherapy, the standard of care anchor).
2. Ipilimumab (CTLA-4 inhibitor, representing early immunotherapy).
3. Nivolumab (PD-1 inhibitor, representing advanced immunotherapy).
The Evidence Network: 4 randomized controlled trials (RCTs) forming a connected network:
- Study 1: Trt 1 vs. Trt 2
- Study 2: Trt 1 vs. Trt 2
- Study 3: Trt 1 vs. Trt 3
- Study 4: Trt 2 vs. Trt 3
In Approach 1, we assume we only extracted the reported Hazard Ratios from these trials.
In Approach 2, we assume we used algorithms like Guyot’s method to reconstruct the individual patient data (IPD) from the published Kaplan-Meier curves.

3.1 Approach 1: Contrast-Based Survival NMA (Hazard Ratios)

When extracting data from publications, we often only get the reported Hazard Ratio (HR) and its 95% Confidence Interval.

Mathematically, a reported \(\log(HR)\) and its standard error (\(se\)) can be treated as Continuous Data.

Likelihood: Normal
Link Function: Identity

Let \(y_{jk}\) be the observed \(\log(HR)\) of arm \(k\) relative to the baseline arm \(1\) in study \(j\).

\[y_{jk} \sim \text{Normal}(\delta_{jk}, se_{jk}^2)\]

The Stan Model

We define the model for contrast-based NMA. It is a simple Normal likelihood model.

STAN Contrast_HR_NMA.stan

data {
  int<lower=1> NS;
  real y[NS];
  real<lower=0> se[NS];
  int t1[NS];
  int t2[NS];
}
parameters {
  real d[3]; // 3 treatments
}
model {
  d[1] ~ normal(0, 0.001); // Reference
  d[2] ~ normal(0, 5);
  d[3] ~ normal(0, 5);
  
  for (i in 1:NS) {
    y[i] ~ normal(d[t2[i]] - d[t1[i]], se[i]);
  }
}

Simulation and Execution in R

Here we provide a full working example. We assume all studies are 2-arm trials for simplicity. The observed data is the \(\log(HR)\) and its standard error.

Click to view R Simulation and Stan Execution

View the code

library(rstan)

# Simulate Contrast-Based Data (Log Hazard Ratios)
set.seed(42)
# We assume 4 studies comparing 3 treatments
# True effects relative to trt 1: d[2] = -0.3, d[3] = -0.55
contrast_data <- list(
    NS = 4,
    y = c(-0.25, -0.35, -0.55, -0.25), # observed log(HR)
    se = c(0.1, 0.12, 0.15, 0.1), # standard errors
    t1 = c(1, 1, 1, 2), # treatment 1 (reference)
    t2 = c(2, 2, 3, 3) # treatment 2
)

# Run Stan
fit_contrast <- stan(
    file = "Contrast_HR_NMA.stan",
    data = contrast_data,
    chains = 3,
    iter = 2000,
    warmup = 1000,
    refresh = 0
)

# Print results
print(fit_contrast, pars = "d")

Inference for Stan model: anon_model.
3 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=3000.

      mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
d[1]  0.00       0 0.00  0.00  0.00  0.00  0.00  0.00  3566 1.00
d[2] -0.30       0 0.07 -0.43 -0.35 -0.30 -0.25 -0.16  1014 1.00
d[3] -0.55       0 0.10 -0.73 -0.61 -0.55 -0.48 -0.35   988 1.01

Samples were drawn using NUTS(diag_e) at Mon May 11 20:21:54 2026.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Based on the Contrast-Based NMA (Approach 1), the estimated Hazard Ratios (Median) relative to Treatment 1 are:

Treatment 2 vs 1: 0.74
Treatment 3 vs 1: 0.58

3.2 Approach 2: Arm-Based Parametric Survival NMA (Weibull)

To demonstrate a highly complete implementation, we will build a Weibull Parametric Survival NMA using simulated Individual Patient Data (IPD). This approach allows us to model the full survival curve, which is the gold standard for NICE submissions when proportional hazards are questioned.

The Stan Model

We define the log-likelihood for the Weibull distribution manually to handle right-censoring.

STAN Weibull_Survival_NMA.stan

data {
  int<lower=1> N;       // Total number of patients
  int<lower=1> NS;      // Number of studies
  int<lower=1> NT;      // Number of treatments
  real time[N];         // Time to event or censoring
  int event[N];        // 1 if event, 0 if censored
  int study[N];        // Study index
  int treatment[N];    // Treatment index
}

parameters {
  real mu[NS];          // Study baselines for log-scale
  real d[NT];           // Relative treatment effects
  real<lower=0> shape; // Common shape parameter
}

transformed parameters {
  real log_scale[N];
  for (i in 1:N) {
    log_scale[i] = mu[study[i]] + d[treatment[i]];
  }
}

model {
  // Priors
  mu ~ normal(0, 5);
  d[1] ~ normal(0, 0.001); // Fix reference to 0
  d[2] ~ normal(0, 5);
  d[3] ~ normal(0, 5);
  shape ~ exponential(1);

  // Likelihood for Weibull Survival
  for (i in 1:N) {
1    real lambda = exp(log_scale[i]);
    if (event[i] == 1) {
      // Event: PDF = hazard * survival
2      target += log_scale[i] + log(shape) + (shape - 1) * log(time[i]) - lambda * pow(time[i], shape);
    } else {
      // Censored: Survival
3      target += - lambda * pow(time[i], shape);
    }
  }
}

1: Lambda calculation: We exponentiate the log_scale to get the scale parameter \(\lambda\) for the Weibull hazard.
2: Log-PDF for Events: When an event occurs, the likelihood contribution is the Probability Density Function (PDF), which is \(f(t) = h(t)S(t)\). In log-scale, this is \(\ln(h(t)) + \ln(S(t))\). The first part is the log-hazard, and the second part (- lambda * pow(time[i], shape)) is the log-survival.
3: Log-Survival for Censored: When data is censored, we only know the patient survived up to time \(t\). So the likelihood contribution is just the Survival Function \(S(t)\). In log-scale, this is \(\ln(S(t)) = -\lambda t^\gamma\).

Simulation and Execution in R

We simulate the IPD and run the Stan model.

Click to view R Simulation and Stan Execution

View the code

library(rstan)
set.seed(42)

# Simulate IPD for 4 studies using a vectorized approach (Idiomatic R)
n_per_arm <- 50

# 1. Define the study designs in a clean data frame
study_designs <- data.frame(
    study = c(1, 2, 3, 4),
    t1 = c(1, 1, 1, 2),
    t2 = c(2, 2, 3, 3),
    b1 = c(0.1, 0.12, 0.1, 0.05), # Hazard scale (lambda) for Arm 1
    b2 = c(0.05, 0.06, 0.02, 0.02) # Hazard scale (lambda) for Arm 2
)

# 2. Expand to a patient-level data frame
# Arm 1 for all studies
arm1_df <- data.frame(
    study = rep(study_designs$study, each = n_per_arm),
    treatment = rep(study_designs$t1, each = n_per_arm),
    lambda = rep(study_designs$b1, each = n_per_arm)
)
# Arm 2 for all studies
arm2_df <- data.frame(
    study = rep(study_designs$study, each = n_per_arm),
    treatment = rep(study_designs$t2, each = n_per_arm),
    lambda = rep(study_designs$b2, each = n_per_arm)
)
# Combine and sort by study to maintain grouping
ipd_df <- rbind(arm1_df, arm2_df)
1ipd_df <- ipd_df[order(ipd_df$study), ]

# 3. Generate Weibull survival times
2ipd_df$time <- rweibull(
    n = nrow(ipd_df),
    shape = 1.2,
    scale = (1 / ipd_df$lambda)^(1 / 1.2)
)

# 4. Add administrative censoring at 24 months
3ipd_df$event <- as.integer(ipd_df$time < 24)
ipd_df$time[ipd_df$time > 24] <- 24

# Extract vectors for Stan data list
times <- ipd_df$time
events <- ipd_df$event
study_vec <- ipd_df$study
treatment_vec <- ipd_df$treatment

stan_data <- list(
    N = length(times),
    NS = 4,
    NT = 3,
    time = times,
    event = events,
    study = study_vec,
    treatment = treatment_vec
)

# Run Stan
fit_surv <- stan(
    file = "Weibull_Survival_NMA.stan",
    data = stan_data,
    chains = 3,
    iter = 2000,
    warmup = 1000,
    refresh = 0
)

print(fit_surv, pars = c("d", "shape"))

1: Sorting to maintain grouping: We combine the two arms and sort by study ID. This ensures the data is ordered as Study 1 Arm 1, Study 1 Arm 2, Study 2 Arm 1, etc., matching the structure of standard IPD datasets.
2: Vectorized data generation: R’s rweibull is vectorized over its parameters. By passing the entire lambda vector, we generate all 400 survival times in a single call, avoiding slow for-loops.
3: Administrative Censoring: Patients who do not experience the event before 24 months are censored (event = 0), and their observed time is set to the maximum follow-up of 24 months.

Inference for Stan model: anon_model.
3 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=3000.

       mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
d[1]   0.00       0 0.00  0.00  0.00  0.00  0.00  0.00  2807    1
d[2]  -0.95       0 0.14 -1.21 -1.04 -0.95 -0.85 -0.67  1585    1
d[3]  -1.82       0 0.19 -2.22 -1.95 -1.82 -1.69 -1.45  1813    1
shape  1.15       0 0.05  1.05  1.11  1.15  1.18  1.25  1248    1

Samples were drawn using NUTS(diag_e) at Mon May 11 19:30:45 2026.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Based on the Arm-Based Parametric NMA (Approach 2), the estimated Hazard Ratios (Median) relative to Treatment 1 are:

Treatment 2 vs 1: 0.39
Treatment 3 vs 1: 0.16

4 Advanced Survival NMA: Non-Proportional Hazards (FP vs. Splines)

When the Proportional Hazards (PH) assumption is violated (e.g., crossing Kaplan-Meier curves frequently seen in immunotherapy), standard models like Weibull or Contrast-Based HR synthesis are no longer appropriate. HTA guidelines (such as NICE DSU Technical Support Document 18) recommend moving to more flexible models like Fractional Polynomials (FP) or Restricted Cubic Splines.

4.1 Fractional Polynomials (FP)

Fractional Polynomials model the hazard function by adding powers of time.

Mathematical Logic: Instead of assuming a constant log-hazard ratio, the treatment effect is modeled as a function of time: \(\ln(h_k(t)) = \mu_j + \beta_0 + \beta_1 t^p + \beta_2 t^q\). The powers \(p\) and \(q\) are chosen from a predefined set (e.g., \(-2, -1, -0.5, 0, 0.5, 1, 2, 3\)).
Selection Criteria:
- Clinical Plausibility of Extrapolation: This is the most critical criterion in HTA. Fractional polynomials can sometimes produce implausible shapes (e.g., hazard rates dropping to zero or shooting to infinity) in the extrapolated period.
- Statistical Fit: Use DIC or WAIC to compare different combinations of powers.

4.2 Restricted Cubic Splines

Splines offer even greater flexibility by dividing the time axis into segments separated by “knots”.

Mathematical Logic: Within each segment, a cubic polynomial is fitted. Constraints are applied at the knots to ensure the overall curve is smooth (continuous up to the second derivative). “Restricted” means the curve is constrained to be linear beyond the boundary knots, which helps stabilize extrapolation.
Selection Criteria:
- Number and Location of Knots: More knots allow for more complex shapes but increase the risk of overfitting. NICE DSU 18 suggests using 1 to 3 knots based on visual inspection and fit statistics.
- Risk of Overfitting: If the spline fits the within-trial data too perfectly, the extrapolation can become highly unstable and sensitive to small changes in knot placement.

Summary of Choice:

Use Fractional Polynomials if the hazard shape is relatively simple and monotonic, and you want to align with standard NICE DSU templates.
Use Splines if the hazard has complex turning points or delayed treatment effects, but exercise extreme caution during extrapolation.

5 Connecting to Economic Models (HTA Focus)

Why do we go through the pain of curve fitting? Because Cost-Effectiveness Models (CEMs) require the extrapolation of survival time over a lifetime horizon to calculate mean time spent in each health state.

5.1 Partitioned Survival Models (PSM)

In a 3-state PSM (Progression-Free, Progressed Disease, Dead), the health states are driven directly by two survival curves:

PFS Curve: Defines the proportion of patients in the “Progression-Free” state.
OS Curve: Defines the proportion of patients alive.
Progressed Disease State: Calculated as the area between the OS and PFS curves.

By running a Parametric Survival NMA, we extract the posterior distributions of the curve parameters (e.g., Weibull scale and shape, or FP coefficients). We calculate the survival probability \(S(t)\) at each cycle (e.g., monthly) for a lifetime horizon (e.g., 40 years).

5.2 Markov State-Transition Models

If using a Markov Model instead of a PSM, the survival parameters must be transformed into discrete Transition Probabilities for each cycle.

For a specific cycle from time \(t_1\) to \(t_2\), the probability of transitioning from “Alive” to “Dead” is derived from the survival function:

\(P(\text{event between } t_1 \text{ and } t_2) = 1 - \exp(-(H(t_2) - H(t_1)))\)

Where \(H(t)\) is the cumulative hazard at time \(t\). For a Weibull model, \(H(t) = \lambda t^\gamma\).

5.3 Probabilistic Sensitivity Analysis (PSA) and Uncertainty Propagation

A critical failure in many HTA submissions is failing to propagate the uncertainty from the NMA into the economic model.

Incorrect: Using only the median or mean estimates of the parameters in the economic model.
Correct: Exporting the full matrix of MCMC posterior samples (e.g., 1000 iterations of all parameters) from Stan/JAGS. In the economic model’s PSA, for each iteration of the PSA loop, you use the corresponding row of parameters from the NMA posterior. This ensures that the correlation between parameters (e.g., between scale and shape) and the full parameter uncertainty are perfectly preserved in the cost-effectiveness results.

Guide Authored by: Antigravity (Advanced Agentic Coding)