Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
dsc-courses
GitHub Repository: dsc-courses/dsc10-2022-fa
Path: blob/main/lectures/lec02/lec02.ipynb
3058 views
Kernel: Python 3 (ipykernel)

Lecture 2 โ€“ Association and Causality

DSC 10, Fall 2022

Announcements

  • Check out the Syllabus on the course website, dsc10.com.

  • Complete the Beginning of Quarter Survey.

  • Lab 1 is released and is due Saturday at 11:59PM to Gradescope.

    • Don't worry if it looks foreign. It's guided and Wednesday's class will help.

    • Post on EdStem or come to office hours for help!

    • If you have trouble accessing Gradescope, look at this thread on EdStem.

  • The first discussion section is tonight. Earn some extra credit and prepare for exams starting Week 1! ๐Ÿ’ฏ

  • Make sure to complete the readings alongside each lecture.

Agenda

  • Association and causation.

  • Case study: London in 1854.

  • Confounding factors and randomized control experiments.

Association and causation

The following headline, in Everyday Health, is about a review published in July 2020 in the European Journal of Preventive Cardiology.

Some terminology:

  • Individuals, study subjects, participants, units.

    • 336,289 American, Swedish, and Australian adults in several studies ๐Ÿง.

  • Treatment.

    • Chocolate consumption ๐Ÿซ.

  • Outcome.

    • Coronary artery disease, which causes heart attacks โค๏ธ.

The first question

Is there any relation between chocolate consumption ๐Ÿซ and heart disease โค๏ธ?

Association is another term for "any relation" or "link" ๐Ÿ”—.

Some data

Researchers examined [...] a total of 336,289 participants [...] which found that eating any kind of chocolate more than once per week was linked with an 8 percent reduced risk of coronary artery disease.

The second question

Does chocolate consumption ๐Ÿซ lead to a reduction in heart disease โค๏ธ?

This is called causation or a "causal" relation.

More headlines

Other headlines about the same research article:

Concept Check โœ… โ€“ Answer at cc.dsc10.com

What can you say about the relationship between chocolate consumption ๐Ÿซ and a reduction in heart disease โค๏ธ?

A. The data shows that there is an association and this is a causal link. Eating chocolate reduces the risk of heart disease.

B. The data shows evidence of an association but not causation.

C. The data doesn't necessarily show an association, as there could be another explanation for these results not considered here.

Case study: London in 1854

Miasmas, miasmatism, miasmatists

  • Miasma is a term for bad smells ๐Ÿ‘ƒ given off by waste and rotting matter.

  • At one point, miasmas were thought to be the main source of disease. Those who believed that miasmas caused disease were called miasmatists.

  • Suggested remedies for disease:

    • โ€œFly to clene airโ€. โœˆ๏ธ

    • โ€œA pocket full oโ€™posiesโ€. ๐Ÿ’

    • โ€œFire off barrels of gunpowderโ€. ๐ŸŽ†

  • Staunch believers in miasmatism:

    • Florence Nightingale, founder of modern nursing. ๐Ÿ‘ฉโ€โš•๏ธ

    • Edwin Chadwick, Commissioner of General Board of Health.

John Snow, 1813-1858 โ„๏ธ

Not this Jon Snow...

Map of SoHo, London

Each bar represents a death by cholera. What do you notice?

from IPython.display import HTML HTML('images/snow_map.html')

Broad Street Pump

Now the site of a pub ๐Ÿป.

Establishing causation

  • S&V: dirty water.

  • Lambeth: clean water.

Comparision โš–๏ธ

  • Treatment group: does receive the treatment.

  • Control group: does not receive the treatment.

Concept Check โœ… โ€“ Answer at cc.dsc10.com

Which houses ๐Ÿ  were part of the treatment group?

A. All houses in the region of overlap.

B. Houses served by S&V (dirty water) in the region of overlap.

C. Houses served by Lambeth (clean water) in the region of overlap.

Snow's "Grand Experiment"

โ€œโ€ฆ there is no difference whatever in the houses or the people receiving the supply of the two Water Companies, or in any of the physical conditions with which they are surrounded โ€ฆโ€

In other words, the two groups were similar except for the treatment.

Concept Check โœ… โ€“ Answer at cc.dsc10.com

Snow collected this data:

Does dirty water cause cholera?

A. Yes โœ”๏ธ, I think so.

B. No โŒ, I don't think so.

C. Maybe โ”, I can't tell.

Key to establishing causality ๐Ÿ—๏ธ

If the treatment and control groups are similar apart from the treatment, then the differences between the outcomes in the two groups can be ascribed to the treatment.

Confounding factors

Trouble โš ๏ธ

If the treatment and control groups have systematic differences other than the treatment, then it might be difficult to identify causality.

  • Such differences are often present in observational studies. ๐Ÿ‘€

  • In an observational study, participants self-select or naturally fall into groups. Not controlled and not random!

  • Are the outcomes different because of the treatment or because of other systematic differences? ๐Ÿ˜• Hard to tell!

  • These other differences are called confounding factors (confounding means confusing).

  • Example: previously, it was widely accepted that coffee โ˜• caused lung cancer. Why?

Randomize! ๐ŸŽฒ

  • If you assign individuals to the treatment and control groups at random, then the two groups are likely to be similar apart from the treatment.

  • You can account โ€“ mathematically โ€“ for variability in the assignment.

  • Such an experiment is known as a randomized controlled experiment (or "randomized controlled trial" or RCT).

  • Question: suppose you have a population of 400 individuals. How would you randomly divide them into treatment and control groups of equal size?

  • One answer: write down each person's name (or unique identifier) on a ticket. Shuffle the 400 tickets and draw 200 of them. These individuals are in the treatment group; the rest are in the control group.

Careful...

Regardless of what the dictionary says...

In probability theory, random โ‰  haphazard!

Concept Check โœ… โ€“ Answer at cc.dsc10.com

Which of these questions would we not be able to answer by setting up a randomized controlled trial?

A. Does daily meditation ๐Ÿ˜Œ reduce anxiety?

B. Does playing video games ๐ŸŽฎ increase aggressive behavior?

C. Does smoking cigarettes ๐Ÿšฌ cause weight loss?

D. Does early exposure to classical music ๐ŸŽป increase a personโ€™s IQ?

Ethical and practical limitations of establishing causality

Summary: cause and effect

Comparison โš–๏ธ

  • Group by some treatment and measure some outcome.

  • Simplest setting: a treatment group and a control group.

  • If the outcome differs between these two groups, that's evidence of an association (or relation).

    • E.g., the chocolate eaters have lower rates of heart disease.

  • If, in addition, the two groups are similar in all ways but the treatment, differences in the outcome can be ascribed to the treatment. This is causation.

    • E.g., two groups of London residents are similar in all ways besides the water they drink. If one group develops cholera more than the other, it's because of the water.

Confounding ๐Ÿ˜•

  • If the treatment and control groups have systematic differences other than the treatment itself, then it's hard to identify a causal link.

  • Such systematic differences are called confounding factors.

  • Confounding factors are often present in observational studies.

    • Observational study: the researcher does not choose which subjects receive the treatment.

    • Controlled experiment: the researcher designs a procedure for selecting the treatment and control groups. Usually this procedure involves randomization.

Randomize! ๐ŸŽฒ

  • When subjects are split up randomly, it's unlikely that there will be systematic differences between the groups.

  • And it's possible to account for the chance of a difference.

  • Therefore, randomized controlled experiments are the most reliable way to establish causal relations.

Next time

On Wednesday, we'll switch gears and start programming ๐Ÿ’ป in Python ๐Ÿ.


Further reading ๐Ÿ“–: The Medical Detective: John Snow, Cholera and the Mystery of the Broad Street Pump

Field trip โœˆ๏ธ