Fitting Standard and Flexible Parametric Models for Economic Evaluation
Author
Xiaoge Zhang
Published
May 20, 2026
1 Executive Summary
Survival extrapolation is one of the most critical skills for a HEOR statistician. Clinical trials usually have limited follow-up, but cost-effectiveness models (like Markov or Partitioned Survival Models) require survival probabilities projected over a lifetime horizon (e.g., 20 or 30 years).
In this chapter, we will:
Generate a mock survival dataset with non-proportional hazards (mimicking immuno-oncology data).
Fit 6 standard parametric models (Exponential, Weibull, Gompertz, Log-normal, Log-logistic, and Generalized Gamma).
Fit a Flexible Parametric Model (FPM) using restricted cubic splines.
Compare models using statistical criteria (AIC/BIC) and discuss the golden rule of HTA: Clinical Plausibility > Statistical Fit.
2 Mock Data Generation
We simulate a trial with 400 patients, where the true survival distribution follows a Log-normal shape (hazard increases first, then decreases). This non-monotonic hazard is typical in many disease areas and often cannot be captured well by simple models like the Exponential or Weibull.
Code
library(flexsurv)library(survival)library(ggplot2)library(dplyr)set.seed(2026)n <-400# Generate Log-normal survival timest_ctrl <-rlnorm(n /2, meanlog =1.5, sdlog =0.8)t_trt <-rlnorm(n /2, meanlog =2.0, sdlog =0.6)df <-data.frame(time =c(t_ctrl, t_trt),status =1, # All experience the event initiallytrt =rep(c("Control", "Treatment"), each = n /2))# Add administrative censoring at t = 15 (end of trial follow-up)df$status[df$time >15] <-0df$time[df$time >15] <-15# Create survival objectsurv_obj <-Surv(df$time, df$status)
Let’s look at the Kaplan-Meier curve of our simulated data:
We will use the flexsurv package to fit the models. flexsurv is the industry standard because it allows for easy extrapolation and handles complex distributions.
3.1 1. Standard Parametric Models
We fit the 6 standard distributions recommended by NICE DSU TSD 14.
When standard models fail to capture complex hazard shapes, we use restricted cubic splines on the log cumulative hazard scale. Here we use 1 internal knot (k = 1).
Code
m_spline <-flexsurvspline(surv_obj ~ trt, data = df, k =1)
4 Model Comparison (AIC/BIC)
We extract the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to evaluate statistical fit. Lower values indicate better fit.
As expected, the Log-normal model (which matches the true distribution we simulated) and the FPM (Spline) show the best statistical fit.
5 Survival Extrapolation (Extending to 30 Years)
Now we use the fitted models to predict survival up to 30 years, doubling the trial’s follow-up period. This is where the models will diverge significantly.
Code
# Generate a sequence of time points up to 30 yearst_extrap <-seq(0, 30, by =0.5)# Extract survival predictions for the Treatment groupget_surv <-function(model, times) { sum_obj <-summary(model, t = times, newdata =data.frame(trt ="Treatment"))return(sum_obj[[1]]$est)}plot(km_fit[2],col ="black", lwd =2, conf.int =FALSE,xlim =c(0, 30), ylim =c(0, 1),xlab ="Time (Years)", ylab ="Survival Probability",main ="Survival Extrapolation to 30 Years (Treatment Group)")lines(t_extrap, get_surv(m_exp, t_extrap), col ="orange", lty =2)lines(t_extrap, get_surv(m_weibull, t_extrap), col ="blue", lty =2)lines(t_extrap, get_surv(m_lnorm, t_extrap), col ="green", lty =1, lwd =2) # True modellines(t_extrap, get_surv(m_spline, t_extrap), col ="red", lty =1)legend("topright",legend =c("KM (Trial Data)", "Exponential", "Weibull", "Log-normal (True)", "FPM (Spline)"),col =c("black", "orange", "blue", "green", "red"),lty =c(1, 2, 2, 1, 1), lwd =c(2, 1, 1, 2, 1))
6 Audit Point: Clinical Plausibility > Statistical Fit
When submitting survival extrapolations to HTA bodies like NICE, the model with the lowest AIC/BIC is not automatically accepted. Reviewers look for clinical plausibility:
The Tail Behavior: Models like Log-normal and Log-logistic have heavy tails. They may predict that a significant proportion of patients are “cured” or live for a very long time. If clinicians state that this disease is universally fatal by year 10, a Log-normal model with 20% survival at year 20 will be rejected, regardless of its superior AIC score.
Extrapolation Risk: FPMs (Splines) are excellent at fitting the trial data (interpolation), but they can behave wildly outside the data range (extrapolation). Always inspect the hazard plots of spline models to ensure they do not exhibit non-biological oscillations in the long term.
In practice, a structured assessment involving: - Visual inspection of fit. - Statistical criteria (AIC/BIC). - Clinical expert opinion on long-term survival. …is mandatory for any credible HTA submission.