Lecture 2 โ Association and Causality
DSC 10, Fall 2022
Announcements
Complete the Beginning of Quarter Survey.
Lab 1 is released and is due Saturday at 11:59PM to Gradescope.
Don't worry if it looks foreign. It's guided and Wednesday's class will help.
Post on EdStem or come to office hours for help!
If you have trouble accessing Gradescope, look at this thread on EdStem.
The first discussion section is tonight. Earn some extra credit and prepare for exams starting Week 1! ๐ฏ
Make sure to complete the readings alongside each lecture.
Agenda
Association and causation.
Case study: London in 1854.
Confounding factors and randomized control experiments.
Association and causation
A link ๐
The following headline, in Everyday Health, is about a review published in July 2020 in the European Journal of Preventive Cardiology.
Some terminology:
Individuals, study subjects, participants, units.
336,289 American, Swedish, and Australian adults in several studies ๐ง.
Treatment.
Chocolate consumption ๐ซ.
Outcome.
Coronary artery disease, which causes heart attacks โค๏ธ.
The first question
Is there any relation between chocolate consumption ๐ซ and heart disease โค๏ธ?
Association is another term for "any relation" or "link" ๐.
Some data
Researchers examined [...] a total of 336,289 participants [...] which found that eating any kind of chocolate more than once per week was linked with an 8 percent reduced risk of coronary artery disease.
The second question
Does chocolate consumption ๐ซ lead to a reduction in heart disease โค๏ธ?
This is called causation or a "causal" relation.
More headlines
Other headlines about the same research article:


Concept Check โ โ Answer at cc.dsc10.com
What can you say about the relationship between chocolate consumption ๐ซ and a reduction in heart disease โค๏ธ?
A. The data shows that there is an association and this is a causal link. Eating chocolate reduces the risk of heart disease.
B. The data shows evidence of an association but not causation.
C. The data doesn't necessarily show an association, as there could be another explanation for these results not considered here.
Case study: London in 1854
Miasmas, miasmatism, miasmatists
Miasma is a term for bad smells ๐ given off by waste and rotting matter.
At one point, miasmas were thought to be the main source of disease. Those who believed that miasmas caused disease were called miasmatists.
Suggested remedies for disease:
โFly to clene airโ. โ๏ธ
โA pocket full oโposiesโ. ๐
โFire off barrels of gunpowderโ. ๐
Staunch believers in miasmatism:
Florence Nightingale, founder of modern nursing. ๐ฉโโ๏ธ
Edwin Chadwick, Commissioner of General Board of Health.
John Snow, 1813-1858 โ๏ธ
Not this Jon Snow...
Map of SoHo, London
Each bar represents a death by cholera. What do you notice?
Broad Street Pump
Now the site of a pub ๐ป.
Establishing causation
S&V: dirty water.
Lambeth: clean water.

Comparision โ๏ธ
Treatment group: does receive the treatment.
Control group: does not receive the treatment.
Concept Check โ โ Answer at cc.dsc10.com
Which houses ๐ were part of the treatment group?
A. All houses in the region of overlap.
B. Houses served by S&V (dirty water) in the region of overlap.
C. Houses served by Lambeth (clean water) in the region of overlap.
Snow's "Grand Experiment"
โโฆ there is no difference whatever in the houses or the people receiving the supply of the two Water Companies, or in any of the physical conditions with which they are surrounded โฆโ
In other words, the two groups were similar except for the treatment.
Concept Check โ โ Answer at cc.dsc10.com
Snow collected this data:
Does dirty water cause cholera?
A. Yes โ๏ธ, I think so.
B. No โ, I don't think so.
C. Maybe โ, I can't tell.
Key to establishing causality ๐๏ธ
If the treatment and control groups are similar apart from the treatment, then the differences between the outcomes in the two groups can be ascribed to the treatment.
Confounding factors
Trouble โ ๏ธ
If the treatment and control groups have systematic differences other than the treatment, then it might be difficult to identify causality.
Such differences are often present in observational studies. ๐
In an observational study, participants self-select or naturally fall into groups. Not controlled and not random!
Are the outcomes different because of the treatment or because of other systematic differences? ๐ Hard to tell!
These other differences are called confounding factors (confounding means confusing).
Example: previously, it was widely accepted that coffee โ caused lung cancer. Why?
Randomize! ๐ฒ
If you assign individuals to the treatment and control groups at random, then the two groups are likely to be similar apart from the treatment.
You can account โ mathematically โ for variability in the assignment.
Such an experiment is known as a randomized controlled experiment (or "randomized controlled trial" or RCT).
Question: suppose you have a population of 400 individuals. How would you randomly divide them into treatment and control groups of equal size?
One answer: write down each person's name (or unique identifier) on a ticket. Shuffle the 400 tickets and draw 200 of them. These individuals are in the treatment group; the rest are in the control group.
Careful...
Regardless of what the dictionary says...
In probability theory, random โ haphazard!
Concept Check โ โ Answer at cc.dsc10.com
Which of these questions would we not be able to answer by setting up a randomized controlled trial?
A. Does daily meditation ๐ reduce anxiety?
B. Does playing video games ๐ฎ increase aggressive behavior?
C. Does smoking cigarettes ๐ฌ cause weight loss?
D. Does early exposure to classical music ๐ป increase a personโs IQ?
Ethical and practical limitations of establishing causality
Summary: cause and effect
Comparison โ๏ธ
Group by some treatment and measure some outcome.
Simplest setting: a treatment group and a control group.
If the outcome differs between these two groups, that's evidence of an association (or relation).
E.g., the chocolate eaters have lower rates of heart disease.
If, in addition, the two groups are similar in all ways but the treatment, differences in the outcome can be ascribed to the treatment. This is causation.
E.g., two groups of London residents are similar in all ways besides the water they drink. If one group develops cholera more than the other, it's because of the water.
Confounding ๐
If the treatment and control groups have systematic differences other than the treatment itself, then it's hard to identify a causal link.
Such systematic differences are called confounding factors.
Confounding factors are often present in observational studies.
Observational study: the researcher does not choose which subjects receive the treatment.
Controlled experiment: the researcher designs a procedure for selecting the treatment and control groups. Usually this procedure involves randomization.
Randomize! ๐ฒ
When subjects are split up randomly, it's unlikely that there will be systematic differences between the groups.
And it's possible to account for the chance of a difference.
Therefore, randomized controlled experiments are the most reliable way to establish causal relations.
Next time
On Wednesday, we'll switch gears and start programming ๐ป in Python ๐.
Further reading ๐: The Medical Detective: John Snow, Cholera and the Mystery of the Broad Street Pump
Field trip โ๏ธ
