R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Survival Analysis in R

Survival analysis, also known as time-to-event analysis, is used to analyze the expected duration of time until a specific event of interest occurs. Common applications include analyzing time-to-failure for machinery, time until purchase in marketing, or time-to-death in medical studies.

In this tutorial, we'll use R to perform basic survival analysis using the survival package.

1. Installation and Loading the Package:

install.packages("survival")
library(survival)

2. The Survival Object:

The Surv() function creates a survival object, which represents the data in a format suitable for survival analysis.

# Sample data
time <- c(2, 3, 5, 8, 12)  # time till event or censoring
event <- c(1, 1, 0, 1, 0)  # 1 if event occurred, 0 if censored

# Create a survival object
s_obj <- Surv(time, event)

3. Kaplan-Meier Estimation:

The survfit() function fits survival curves using the Kaplan-Meier method.

fit <- survfit(s_obj ~ 1)
print(fit)

# Plotting the survival curve
plot(fit)

4. Cox Proportional Hazards Model:

For analyzing the relationship between survival time and one or more predictor variables, we use the Cox proportional hazards model.

Let's use the lung dataset from the survival package.

# Load dataset
data(lung)

# Fit the model
cox_model <- coxph(Surv(time, status) ~ age + sex + ph.ecog, data=lung)

# Summary of the model
summary(cox_model)

5. Checking Proportional Hazards Assumption:

We can visually inspect whether the proportional hazards assumption holds by plotting the scaled Schoenfeld residuals.

install.packages("ggplot2")
library(ggplot2)

# Compute residuals
resid <- residuals(cox_model, type="scaledsch")

# Plot
ggplot(data.frame(time=cox_model$time, resid=resid), aes(x=time, y=resid)) + 
  geom_point() + 
  geom_smooth(se=FALSE)

A horizontal line in the plot suggests the assumption holds, while a non-horizontal trend suggests it might be violated.

6. Predicting with the Cox Model:

You can make predictions using the Cox model:

# Predict survival for new data
newdata <- data.frame(age=65, sex=1, ph.ecog=1)
predict(cox_model, newdata, type="expected")

7. Other Considerations:

  • Right Censoring: In survival analysis, it's common for the event of interest not to occur for all subjects during the study period. Such subjects are "censored".

  • Multiple Events: Some subjects may experience the event more than once. There are extensions of basic survival analysis to handle such "repeated events".

  • Time-varying Covariates: If covariates change over time, they're "time-varying". Handling such covariates requires advanced methods.

Conclusion:

Survival analysis is a powerful technique for analyzing time-to-event data. With R and the survival package, you have the tools to explore, model, and predict such data effectively.

  1. Introduction to Survival Analysis with R:

    • Survival analysis is a statistical approach for analyzing time-to-event data, often used in medical or reliability studies.
    # Example: Creating a survival object
    library(survival)
    survival_object <- Surv(time = c(5, 10, 15), event = c(1, 1, 0))
    
  2. Kaplan-Meier Survival Curves in R:

    • Kaplan-Meier curves estimate survival probabilities over time.
    # Example: Kaplan-Meier curve
    km_curve <- survfit(survival_object ~ 1)
    plot(km_curve, main = "Kaplan-Meier Survival Curve")
    
  3. Cox Proportional Hazards Model in R:

    • Cox PH model assesses the impact of covariates on survival.
    # Example: Cox Proportional Hazards Model
    cox_model <- coxph(Surv(time, event) ~ age + treatment, data = my_data)
    
  4. R Survival Analysis for Clinical Data:

    • Apply survival analysis to clinical data, considering time and event variables.
    # Example: Analyzing clinical survival data
    survival_object <- Surv(time = clinical_data$follow_up_time, event = clinical_data$outcome)
    
  5. Time-to-Event Analysis in R:

    • Analyze time-to-event data using survival analysis techniques.
    # Example: Time-to-event analysis
    km_curve <- survfit(Surv(time, event) ~ 1, data = time_data)
    
  6. Survival Analysis Plotting in R:

    • Plot survival curves and other relevant visualizations.
    # Example: Plotting survival curves
    plot(km_curve, main = "Survival Analysis", xlab = "Time", ylab = "Survival Probability")
    
  7. Log-Rank Test in R:

    • Assess differences in survival curves using the log-rank test.
    # Example: Log-rank test
    logrank_test <- survdiff(Surv(time, event) ~ group, data = survival_data)
    
  8. Comparing Survival Curves in R:

    • Compare multiple survival curves visually and statistically.
    # Example: Comparing survival curves
    multi_km_curve <- survfit(Surv(time, event) ~ group, data = survival_data)
    
  9. Handling Censored Data in Survival Analysis with R:

    • Address censored data points using the Surv object.
    # Example: Handling censored data
    survival_object <- Surv(time = my_data$time, event = my_data$status)
    
  10. Stratified Survival Analysis in R:

    • Perform survival analysis within strata or subgroups.
    # Example: Stratified survival analysis
    stratified_km_curve <- survfit(Surv(time, event) ~ strata(group), data = survival_data)
    
  11. Parametric Survival Models in R:

    • Fit parametric survival models like exponential or Weibull distribution.
    # Example: Fitting a Weibull survival model
    weibull_model <- survreg(Surv(time, event) ~ covariate, data = survival_data, dist = "weibull")
    
  12. Multivariate Survival Analysis in R:

    • Assess the impact of multiple covariates on survival.
    # Example: Multivariate survival analysis
    coxph_model <- coxph(Surv(time, event) ~ covariate1 + covariate2, data = survival_data)
    
  13. Survival Analysis with Competing Risks in R:

    • Analyze survival data in the presence of competing risks.
    # Example: Competing risks survival analysis
    library(cmprsk)
    competing_risks_model <- cuminc(time, status, group)