The assessment is open book. It closely mirrors in content exam questions in previous years that you would have completed in person. As such, you should be able to complete it within the two hours normally assigned for such exams, though you are of course free to spend more time on it if you wish.


The assessment must be submitted to the Turnitin link called SOCS0017 Assessment, Assessment submission, which is now appearing on your SOCS0017 Moodle page.  Please ensure that you use your candidate number as the Turnitin submission title.







Question 1: 40 marks

Question 2: 20 marks

Question 3: 20 marks

Question 4: 20 marks


You should reference the work as you would an exam (author, year) but with a short bibliography included at the end of the work (the bibliography should not exceed one page).  The bibliography does not count towards the word count.  Word limits are provided per question below (3000 words overall). Your work will be subject to the usual word limit policies.


While, similarly, the usual lateness penalties apply, please be assured that we understand that you may be facing extraordinary and unexpected disruption at the moment, including possibly illness or caring responsibilities. For those of you who will need an extension, we will be sympathetic and we will do everything we can to help you. Extenuating circumstances applications must be submitted using the extenuating circumstances form. Full details of how to apply for extenuating circumstances is available here.


If you have any questions about the content of your assessment, you may contact your module convenor.



Question 1. Compulsory Question (40 marks) MAXIMUM WORD LENGTH 1200 WORDS


Researchers are interested in measuring the impact of breastfeeding on early childhood cognitive outcomes at age 7. The outcome they all have is a well-known reading test which has a mean of 100 and standard deviation of 15.

Two of the studies used data from a national birth cohort and look at the impact of breastfeeding (breastfed for at least 90 days) on reading score at age 7.

Study 1: Uses a regression approach and controls for maternal age, father and mother’s socio-economic status and parental education.

Study 2: Uses the same data as study 1 but uses an instrumental variable (IV) approach where the instrument for breast-feeding is admission to hospital at the weekend. They argue that women who are admitted to hospital at the weekend are less likely to breastfeed due to less support being available in hospitals and this can be used as instrument to get the causal impact of breastfeeding. In the first stage regression exposure to weekend has a coefficient of -0.035 (and is significant at conventional levels with a standard error of 0.015) on the probability of breastfeeding for at least 90 days.

Study 3: Is a randomised control trial of pre-term babies whose mothers cannot breastfeed. Babies are randomly assigned to a treatment group (fed donated breast milk) and a control group (given formula).


Results: Impact of breastfeeding on reading scores at age 7

  Study 1 Study 2 Study 3
Breastfeeding 0.855 (0.205) 6.945 (3.013) 3.115 (1.500)
Sample size 6,105 6,105 589

Note: Standard errors are in brackets


  1. What do you think are the strengths and weaknesses of Study 1? What does their study find?                                                                  (10 marks)
  2. What do you think are the strengths and weaknesses of Study 2? Do you think their instrument is credible? What does their study find?                     (10 marks)
  3. What do you think are the strengths and weakness of Study 3? What does their study find?                           (10 marks)
  4. What study do you think provides the best evidence of breast-feeding on reading at age 7 and why?                (5 marks)
  5. Can you think of another quasi-experimental method you could use for the data used in Studies 1 and 2 to estimate the impact of breast-feeding on reading scores? What are the advantages and disadvantages of your proposed method compared to the methods used in Studies 1 and 2?     (5 marks)







Question 2 (20 marks) MAXIMUM WORD LENGTH 600 WORDS


In 2007, Ohio designed an information pack to inform potential in-state college students about the costs and benefits of college. The policy was intended to improve college enrolment rates.


To evaluate the policy, researchers first decide to compare the college enrolment rates of students in Ohio before and after the policy. They found that enrolment went up by 7 percentage points in Ohio between 2006 and 2007.


  1. Why might this before and after estimate of the impact of the policy not be robust?                                                                                                (3 marks)


To evaluate the policy causally researchers collected enrolment information from all colleges in Ohio and all colleges in the neighbouring states to Ohio for a number of years. The researchers provide you with the following table of their findings.

Year Ohio Neighbouring States
  % Enrolled % Enrolled
2004 54 59
2005 58 61
2006 60 63
2007 67 65
2008 70 67
2009 72 69



  1. What impact did the information pack have on student enrolment rates? Show your working and state your assumptions. (7 marks)


You show your findings to a colleague and she is worried that your analysis may not be valid due to it violating the “common trends” assumption.

  1. What does your colleague mean by this? (3 marks)
  2. Is the common trends assumption supported by the data and if not, is there anything you can do to adjust your estimate of the impact of the programme?                                                                            (4 marks)
  3. Another colleague tells you that in 2007 Ohio university tuition fees decreased by 20% from the previous year. Should you be concerned about this, given your analysis and how does this change your conclusions?

(3 marks)




Question 3 (20 marks) MAXIMUM WORD LENGTH 600 WORDS


In 2012, 500 patients with type 2 diabetes who had poor control of their diabetes were offered personal trainers and dieticians for 12 months to help them improve their fitness and diet and to see its impact on HBA1c levels (a measure of long-term blood glucose levels). Those with HBA1c readings between 10.0 to 10.2 were offered the treatment and compared to those with slightly better control who had readings between 9.7 to 9.9. The normal range for HBA1c is 3.8 to 6.5 and somebody is considered to have type 2 diabetes with HBA1c readings above 6.5.

Researchers are interested in understanding whether this program reduced HBA1c readings for participants in the program and decide to use a regression discontinuity design (RDD).

HBA1c reading before program Received treatment? HBA1c reading after program
9.7% no 9.7%
9.8% no 9.8%
9.9% no 9.9%
10.0% yes 8.0%
10.1% yes 8.2%
10.2% yes 8.4%


  1. Explain the intuition and assumptions behind using this research design to answer this question. (4 marks)
  2. Using information from the table, what is the impact of the program on HBA1c? Explain your answer and the assumptions you have made in arriving at your estimate(s). (7 marks)
  3. What if you are told that 10% of those who were eligible for the program did not take part in it at all? How would that affect your estimate (if at all) and how would you interpret your estimate? (3 marks)
  4. For each of the following situations, state whether the circumstances described would invalidate the chosen research design (yes or no). In each case explain your reasons why.
    1. Some patients with HBA1c readings of 9.9 were allowed on the program (2 marks)
    2. Patients were unaware of the HBA1c cut-off score for the program


  • Patients who just failed to get on the program were offered free use of the hospital gym. (2 marks)





Question 4 (20 marks) MAXIMUM WORD LENGTH 600 WORDS

You are interested in the impact of a programme in which children in schools are given intensive reading lessons for 5 hours a week when they are 11. The program commenced in 2002 and the regression below compares those who undertook tests in the two years before the program was introduced (PolicyOn=0) and those who took tests for the three years after the policy was introduced (PolicyOn=1). The programme took place in all schools in certain school districts, chosen at random (treatment=1) and outcomes compared with those in control areas who were not chosen for the program (treatment=0). You decide to do a difference-in-difference analysis and the results of your difference-in-difference regression are given in the following table. The outcome variable is the percentile reading score at age 11.

Variable Estimate Standard Error
Constant 49.2 0.70
Treatment -0.5 0.45
PolicyOn 1.5 0.44
Treatment*PolicyOn 2.4 0.49
  1. Explain what each of the estimates in the table tell us and the overall estimated impact of the program. How confident are you in the approach taken and what steps could you take to check and potentially improve the robustness of the findings? (10 marks)
  2. You wish to perform a simple cost-benefit analysis to establish whether the intervention is worth it for early career outcomes. Follow-up data from age 23 to 27 shows that individuals undertaking the reading program each year earned £3,000 per year more than those who did not undertake the program from age 23 to age 27. The cost of the program at age 11 was £2,000 per student. Was this a worthwhile investment for the government (assume a government discount rate of 3.5%)? Do you think there is anything else that should be considered in the cost-benefit analysis?                                                (10 marks)