STAT 486, Winter 2021
The final exam is due on April 23 by 11pm.
Please submit to OnQ, Final Exam, Stat 486.
You must work independently for this final exam.
If you need to cite existing work (books, articles, on-line resources), please indicate the sources in a list of references.
Part I. Problems (20%)
Define your notation if necessary and give sufficient details in solving the problems.
- Failure time can sometimes be described by gamma distribution which is defined on thepositive real line and has density function
where α > 0 and λ > 0 are parameters. The generalized gamma distribution introduces an additional parameter θ allowing additional flexibility in selecting a hazard function. The generalized gamma distribution has density function
This distribution reduces to the exponential when θ = α = 1, to the Weibull when α = 1, to the gamma when θ = 1. Develop methods for model checking (for exponential, Weibull and gamma models) using the generalized gamma distribution.
(Additional Remark: It can be shown that the generalized gamma distribution approaches the log normal distribution when α → ∞ (though the proof is non-trivial). So it can even be used to check the fit of log normal model!)
- One of the goals of recent research on antiretroviral therapy for treating HIV-infected patients is to compare the new triple-drug combinations to the existing two-drug regimens. The triple-drug regimens are expected to maximize antiviral activity, maintain long-term efficacy and reduce drug resistance. A randomized study was conducted to compare a two-drug regimen (AZT + zalcitabine (ddC)) versus a triple-drug regimen (AZT + ddC + saquinavir). The data below give the time T from administration of treatment (in days) until the CD4 count (a type of white blood cell measure indicating progression of HIV disease) reached a prespecified level for the two groups. The asterisks (*) denote censoring times.
Two-drug group: 85 32 38* 45 4* 84 49 180* 87 75 102 39 12 11 80 35 6
Triple-drug group: 22 2 48 85 160 238 56* 94* 51* 12 171 80 180 4 90 180* 3
We analyze the data by fitting a log normal model for the time T, that is,
Y = logT = β0 + β1z + σW,
where z = 0 if a patient is given the two-drug treatment and z = 1 if given the triple-drug treatment, and W ∼ Normal (0, 1). The following results are given in the R output.
survreg(formula = Surv(time, status) ~ as.factor(grp), data = hivdata, dist = “lognormal”)
Value Std. Error z p
(Intercept) 3.915 0.354 11.068 1.79e-28 as.factor(grp)2 0.115 0.498 0.231 8.17e-01 Log(scale) 0.330 0.138 2.389 1.69e-02
Log Normal distribution n= 34
(Intercept) as.factor(grp)2 Log(scale) (Intercept) 0.125085046 -0.124199736 0.003209618 as.factor(grp)2 -0.124199736 0.247612853 0.002051849 Log(scale) 0.003209618 0.002051849 0.019075012
Notice that in R, the model is fitted with parameters β0, β1 and φ = logσ.
- Estimate ST(60|z = 0) and ST(60|z = 1), the probabilities that the CD4 counts reach the prespecified level in at least 60 days for patients receiving the two-drug and triple-drug treatments respectively.
- Build 95% confidence intervals for ST(60|z = 0) and ST(60|z = 1) respectively. Can you conclude that the two types of treatments result in different probabilities of T ≥ 60?
Part II. Analysis of Ovarian Cancer Data (80%)
The data “ovarian.txt” posted are obtained from a randomized Phase III clinical trial with 352 eligible ovarian cancer patients who have relapsed after surgery. After the first recurrence (relapse), the patients were randomized and received either the standard treatment (cyclo and cisplatin) or the new treatment (cyclo and carboplatin) under investigation. The main purpose of the clinical trial is to study if the investigational treatment extends the postprogression survival time compared to the standard treatment. The post-progression survival time (in days) is the response variable, which is the time from first recurrence to death.
Some other variables are also collected on patients. These include variables collected before administering the treatment (the pretreatment prognosticators), which may affect the patients’ prognostics of survival time, and variables collected at the fist recurrence. Detailed description of the variables in the data set is given on the last two pages. It is also important to study how these variables affect the post-progression survival time and build an appropriate model (or models) to describe their association. Analyze the data and write a report for your study and analysis.
Suggestions and Requirements:
In your analysis, please give some priorities to semi-parametric and non-parametric methods such as Kaplan-Meier estimates, (weighted) log-rank tests and the Cox regression model. Support your analysis with appropriate graphical exploration, model checking and residual analysis.
Good scientific writing and clear explanation are highly valued. You can describe the problems under study, the data and variables; describe your initial exploration and the models you consider, include selected results and figures for model fit, model assessment and comparison; and interpret your final model (or models) and explain what knowledge is obtained from your analysis in the application context. You are encouraged to do some self learning on ovarian cancer to understand the unfamiliar terminologies in the data description file.
An example of report for a course project is posted below.
Write your report as an article that explains your thoughts, it should not look like patches of analysis output. Your report should NOT include code or raw output from R or SAS. Include tables in the report to summarize the analysis and results if needed. Please attach the code and output at the end of your report as a record and proof of your independent work.
Please aim for a clear and concise report. The suggested length is no more than 5 pages of text. Tables and/or figures should be inserted in the report (but do not count for length).
Total marks: 80;
40 marks on statistical analysis; 40 marks on report writing.
* Description of variables in the data set
** The pretreatment prognosticators:
** age–Age at allocation;
** in_res_d–Residual disease after surgery for this malignancy:
** 1-<2 cm
** 2-2 to 10 cm
** 3->10 cm
** 4-not measurable
** 5-not stated
** staging–Pre-treatment clinical stage:
** 2-stage IIB
** 3-stage III
** 4-stage IV ** .-unknown;
** nclass–Clinical classification of disease:
** 0-no evidence of disease
** 1-measurable disease present
** 2-non-measurable disease present
** in_ht–Initial height;
** in_wt–Initial weight; ** in_surf–Initial surface area; ** perform–Performance status:
** 0-able to carry out normal activity
** 1-ambulatory, able to carry out light work
** 2-ambulatory, capable of self-care only
** 3-capable of limited self care
** 4-completely disabled
** nt_srce–tumour source:
** 0-primary tumour
** 1-metastatic tumour only submitted
** 3-secondary involvement of ovary to be ruled out
** grade–tumour grade:
** 0-borderline malignancy
** 1-grade 1 of 3
** 2-grade 2 of 3
** 3-grade 3 of 3
** Variables at first recurrence:
** surg_pr–Time from the gynecological surgery to first recurrence; ** treat–treatment the patient has received:
** 0-standard (cyclo and cisplatin)
** 1-investigational(cyclo and carboplatin);
** Survival time variables:
** rec_dth: time from recurrence to death
** event: =1 patients died; =0 patient alive