Problem Set 8: The Effect of Cognitive Behavioral Therapy on Crime and Violence

This problem set is due on November 29, 2023 at 11:59pm.

You can find instructions for obtaining and submitting problem sets here.

You can find the GitHub Classroom link to download the template repository on the Ed Board

Background

Can noncognitive skills and preferences like patience and identity be changed in adults? And can improving these skills lower crime and violence? A recent paper attempted to answer these questions with a randomized field experiment in Liberia. This exercise is based on:

Blattman, Christopher, Julian C. Jamison, and Margaret Sheridan. 2017. “Reducing Crime and Violence: Experimental Evidence from Cognitive Behavioral Therapy in Liberia.” American Economic Review, 107(4): 1165-1206.

In this experiment, the researchers randomly assigned two different treatments to all respondents. Half of respondents were randomly assigned to eight weeks of cognitive behavioral therapy designed to reduce self-destructive beliefs or behaviors and promote positive ones. The sessions focused on encouraging goal-setting and self-control with simple behaviors and then built up to dealing with more realistic situations while learning to control emotions. The goal was to promote noncognitive skills that might reduce tendencies toward criminal activity or violence. Note that not everyone assigned to treatment actually attended all of the meetings. The second treatment was a randomly assigned cash prize of $200. The outcomes of interest are index measures of anti-social behavior like theft, carrying weapons, or selling drugs. An index combines all of these different variables into one continuous measure. The researchers also have baseline measures of some of these variables.

A total of 999 respondents were randomized in the study. The data file for this study is data/cbt.csv and contains the following variables:

Name	Description
`partid`	participant ID number
`cashassigned`	participant assigned to cash treatment (1) or not (0)
`tpassigned`	participant assigned to CBT treatment (1) or not (0)
`attend_80`	did the participant attend 80% CBT meetings (1) or not (0)
`steals_basline`	has the participant committed theft in the last 2 weeks before treatment
`homeless_basline`	is the participant sleeping on the streets before treatment
`year_born`	year of birth of participant
`fam_asb_st`	index score of anti-social behaviors 2 weeks posttreatment
`carryweapon`	does participant carry a weapon, 2 weeks posttreatment
`fam_asb_lt`	index score of anti-social behaviors 12 months posttreatment

Question 1 (9 points)

Experiments are designed so that treatment assignment is unrelated to any background characteristics of the participants. But treatment and control groups will differ a bit by random chance even if they are the same on average. Let’s see if the randomization created groups that are fairly balanced by doing a balance test.

Load the tidyverse and infer packages and use read_csv to load the data into R and name it cbt. Create two new variables in the cbt tibble:

cbt_assigned that is "CBT" if tpassigned is 1 and "No CBT" otherwise.
unhoused_baseline that is "Unhoused" if homeless_baseline is 1 and "Housed" otherwise.

Use the specify/calculate workflow to calculate the difference in proportions in the baseline measure of sleeping on the streets (unhoused_baseline) between the CBT treatment group and the non-CBT group (cbt_assigned). Save this value as base_diff (it should be a 1x1 tibble).

Next, use the infer framework to conduct a two-sided permutation hypothesis test of whether the CBT treatment group has the same proportion of unhoused_baseline as the non-CBT group. Calculate the two-sided p-value using get_p_value() and save it as base_p (it should be a 1x1 tibble).

Report the observed difference and the p-value in the write-up. With an $\alpha$ of 0.05, can you reject the null hypothesis that the proportion sleeping on the streets is the same in the two groups? Given that decision, which type of error (type I vs type II) would be worried for this test?

NOTE: For this problem, set the seed to 02138 at the top of the chunk and use reps = 1000 when generating the permutations.

Rubric: 1pt for successfully compiling Rmd (autograder); 1pt for loading the data and creating the new variables (autograder); 2pts for the base_diff (autograder); 3pts for the base_p (autograder); 1pt for reporting the difference/p-value and the rejection or not decision (PDF); 1pts for correct error type (PDF).

Question 2 (8 Points)

A colleague suggests that you should use actual attendance at the CBT meetings rather than the random assignment since there shouldn’t be any treatment effect for those that don’t attend the meetings. To investigate this, you decide to do a balance test on the attendance variable that measures if a person actually attended 80% of the CBT meetings (attend_80). Create a new variable:

cbt_attended that is "Attended CBT" if attend_80 is 1 and "Not Attended" otherwise.

Calculate the difference in proportions in baseline sleeping on the streets (unhoused_baseline) for those that actually attended 80% of meetings versus those that did not (cbt_attended) and save this difference as base_diff_attend. Calculate the two-sided p-value for another permutation hypothesis for the null hypothesis that the true proportion of baseline sleeping on the street is the same across values of attend_80. Save this p-value as base_p_attend.

Report the observed difference and the p-value in the write-up. Is any imbalance that you find on this variable statistically significant (that is, can you reject the null hypothesis) if $\alpha$ is 0.05? Which type of error are we worried about here? What parameter of the hypothesis test can control this type of error?

NOTE: For this problem, set the seed to 02138 at the top of the chunk and use reps = 1000 when generating the permutations.

Rubric: 2pts for the base_diff_attend (autograder); 3pts for the base_p_attend (autograder); 1pt for reporting the difference/p-value and the rejection or not decision (PDF); 1pt for correct error type (PDF); 1pt for correctly identifying what parameter controls this error (PDF).

Question 3 (3 points)

Explain how the tests in questions 1 and 2 should inform whether it is better to use the CBT assignment or the actual attendance to the CBT meetings as the treatment variable when estimating causal effects. Justify your answer by discussing the benefits and disadvantages of using either treatment variable.

Rubric: 2pts for correctly identifying which variable should be used (PDF); 1pt for justification (PDF).

Question 4 (7 points)

Calculate the estimated ATE of the CBT treatment assignment on the short-term index measure of anti-social behavior (fam_asb_st) and save the estimate as ate. For this, you can either use the specify/calculate approach or the group_by/summarize approach, but in either case, your resulting tibble should contain the ATE somewhere.

Use the infer worklfow to generate 1000 simulations from the null distribution of a permutation hypothesis test of the null hypothesis that there is no treatment effect. Save this distribution as ate_null_dist. Plot this distribution of the difference in means under the null hypothesis along with the observed estimate. Explain, in substantive terms of this setting, what this distribution represents.

Calculate the two-sided p-value and save it as ate_p. Is the estimated effect statistically significant when alpha is 0.05?

NOTE: For this problem, set the seed to 02138 at the top of the chunk and use reps = 1000 when generating the permutations.

Rubric: 2pts for ate (autograder); 2pts for the ate_null_dist (autograder); 1pt for ate_p (autograder); 1pt for plot of the null distribution (PDF); 1pt for explanation of null distribution and reporting if null is rejected (PDF).

Question 5 (8 points)

Calculate the estimated ATE of the CBT treatment assignment on the long-term index measure of anti-social behavior (fam_asb_lt) and save the estimate as ate_lt. For this, you can either use the specify/calculate approach or the group_by/summarize approach, but in either case, your resulting tibble should contain the ATE somewhere.

Generate 1000 simulations from the null distribution of a permutation hypothesis test of the null hypothesis that there is no treatment effect. Save this distribution as ate_lt_null_dist. Plot this distribution of the difference in means under the null hypothesis along with the observed estimate and the shaded region for the p-value using visualize() and shade_p_value().

Calculate the two-sided p-value and save it as ate_lt_p. Is the estimated effect statistically significant when alpha is 0.05?

In the write up, report the ATEs calculated in questions 4 and 5, and describe what they mean for the persistence of the effect of CBT over time?

NOTE: For this problem, set the seed to 02138 at the top of the chunk and use reps = 1000 when generating the permutations.

Rubric: 2pts for ate_lt (autograder); 2pts for the ate_lt_null_dist (autograder); 1pt for ate_lt_p (autograder); 1pt for plot of the null distribution (PDF); 2pts for reporting if null is rejected and describing the persistence over time (PDF).