2

I would like to know how to calculate the survival probabilities in pyspark with the AFTSurvivalRegression method. I have seen this example on the web:

from pyspark.ml.regression import AFTSurvivalRegression
from pyspark.ml.linalg import Vectors

training = spark.createDataFrame([
    (1.218, 1.0, Vectors.dense(1.560, -0.605)),
    (2.949, 0.0, Vectors.dense(0.346, 2.158)),
    (3.627, 0.0, Vectors.dense(1.380, 0.231)),
    (0.273, 1.0, Vectors.dense(0.520, 1.151)),
    (4.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", "features"])
quantileProbabilities = [0.3, 0.6]
aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
                            quantilesCol="quantiles")

model = aft.fit(training)

# Print the coefficients, intercept and scale parameter for AFT survival regression
print("Coefficients: " + str(model.coefficients))
print("Intercept: " + str(model.intercept))
print("Scale: " + str(model.scale))
model.transform(training).show(truncate=False)

But with this I can only predict the survival times. I also can get quantile probabilities but I do not know exactly how them work. My question is how can I get the probability of one person will survive at specific time?

Robert Sun
  • 61
  • 5
  • I'm not a pythonisto, but I answered this for an R-questioner a couple of years ago: I imagine you have the ability to make a vector of evenly space quantiles and then get a predicted time for each. If so, then you should be able to follow this R code as an outline: https://stackoverflow.com/questions/27408862/how-to-predict-survival-probabilities-in-r – IRTFM Jul 25 '20 at 00:00

0 Answers0