Chapter 5 Statistical Reasoning

5A Fundamentals of Statistics

5B  Should You Believe a Statistical Study

5E  Correlation and Causality

5A Fundamentals of Statistics

The term statisticscan represent either

1)   the science (theory and methodology) of collecting, organizing, and interpreting data or
2)   the actual descriptive or summary data.

Statistical Study
 

Population:  the set which the study is attempts to draw conclusions about.

Sample: A subset of the population that is actually studied.

Study gathers raw data from the sample.

Sample statistics are calculated for the raw data.  (An example of a sample statistic would be the mean of the sample)

Information (population parameters) about the population are inferred (induced) from the sample statistics.  For example, we infer from the sample mean, an estimate for a range that the population mean should fall between.

Steps in a statistical study 

1.  Determine goal, population
2. Choose appropriately sized sample from the population and collect raw data.
3.  Analyze the date, calculate sample statistics  (mean, median, mode, etc.)
4.  Make inferences about the population (population parameters) based on sample results.
5.  Evaluate the likelihood (probability) that inferred population parameters reflect the true population parameters and draw conclusions.

Example:  Suppose you wanted to investigate sleeping patterns of students at SJC.  How would the steps above be applied?


 

Impractical to study entire population.
Must select a truly representative sample.
Types of Sampling:

1.  Random sampling is generally the best -- each member of population has equally likely chance of being selected. For example, we can get a random sample by numbering the population, generating random numbers, and select those from population who match the numbers.

2.  Systematic sampling:  Use a systematic system such as choosing every 10th member of the population.

3.  Convenience sampling:  Select one easy to select -- Poll this class to determine snack preferences of SJC students.

4.  Stratified sampling:  Identify subgroups within a population and then draw random samples within each subgroup to generate total sample.

Watching out for bias:  Any problem in the design or implementation of a statistical study that tends to favor certain results (i.e. where the true parameters of population are not inferred).   Often difficult to get a representative sample. Biased sample:  One which is not representative.  Example:  SJC student population -- sample math class.  Why is this not a representative sample?

Two types of statistical studies:  Observations and Experiments
 

Observational Studies:  Observe and measure characteristics of sample, but do not try to influence the members.  Every member treated identically.

Controlled Experiments:  Researchers create two or more groups for the sample.  One group receives experimental treatment; the other group (control group) does not.

Placebo Effect and Blinding

Potential problems with experiments with people.  Placebo effect:  patients improve because they think they are receiving a useful treatment.
Use a placebo with the control group -- contains no active ingredient.
Single blind experiment:  participants don't know which group they belong to  (but experimenters do).
Double-blind experiment:  Neither researchers conducting experiment nor participants know which group participants belong to -- researchers then will not be biased.

Case control study:  Participants choose which group they want to be in.  Those who engage in behavior are cases;  those who do not are the controls.  May be only ethical choice in some cases (example of effect of alcohol on pregnancy.)  Example of an observational study.

Confidence Interval and Margin of Error

Margin of error for a population parameter predicted by  a survey or other statistical study is the range about about the parameter that there is a 95% chance that the true population will lie within that range.

Example:  If a poll predicts that 43% of the population supports universal health insurance with a margin of error of +/- 4%, it means that there is a 95% chance that between 39% and 47% of the population supports universal health insurance.

5B  Should You Believe a Statistical Study

Text gives 7 guidelines:

1.  Identify the goal, type and population.

(Must understand this before you can evaluate quality of study.)
Is type of study appropriate to the goal;  is population well-defined, appropriate for goal?

2.  Consider the source of the experiment.

Who funded, conducted?  Is there a potential for bias based on benefits of the researchers or funders from the results?  Example of Tobacco Research Institute.

3. Look for bias in the sample.

Two common types:
Selection bias -- researchers' method of selection has natural bias.
Participation bias -- in voluntary studies, people who feel strongly about the issue may be more likely to volunteer.


4. Look for difficulties in defining quantities of interest.

Is that which is to be measured well-defined in the study?
Example:  Level of unemployment -- is unemployed appropriately defined?
Example of how do you measure amount of exercise -- time, vigorousness?
Example 7 -- Illegal Drug Supply.

5.  Look for confounding variables:  Confounding variables are variables that are not intended to be part of the study but confuse the study's results.

Example 8:  Radon and Lung Cancer
SJC sleep patterns:   Students who get more sleep have higher GPA's.  Conclusion (?)  Sleep benefits GPA?


6.  If a survey is involved, consider its setting and wording.

Telephone survey?
Mail survey?

Availability Errors:  setting places thoughts in participants mind which influence they responses to survey.

Wording of questions in survey: Example 9:  Traffic vs. Air Pollution survey.

7.  Check for consistency between results and interpretation.

Example 11:  School Board in Boulder, Colorado.

8.  Consider the conclusions -- do they seem meaningful?

Example 12:  Weight loss of 1/2 pound more on Fast Diet Supplement  -- insignificant?


 
 
 5E  Correlation and Causality

Establishing Causality: Does smoking cause lung cancer?  Does studying cause good grades?


Correlation
  (relationship between two variables --
No correlattion (coefficient of correlation is close to zero) -- no apparent relationship between the variables. 

Positive:  Both variables tend to increase or decrease together.

Negative:  Variables tend to change in the opposite direction.

Strength may vary -- values for coefficient of correlation close to + 1 or -1 indicate a strong correlation, while values near .5 or -.5 indicate a moderate correlation.
1.  Coincidence.  Example:  Rise in stock market correlates with winner of the Super Bowl being the one from the city alphabetically later in the alphabet.
2.  Common underlying cause. Infant deaths and longevity. (figure 2.13)
3.  One variable is the cause of the other variable. Most important -- difficult to establish.

Possible types of cause of effect:
1.  Cause is necessary if effect cannot happen in its absence.
2.  Cause is sufficient if effect always happens when the event occurs.
3.  Cause is probabilistic if it is neither necessary nor sufficient, but increases the probability ( likelihood ) of the effect.
 

Some Guidelines for Establishing Causality:

John Stuart Mill's Guidelines:  Given an effect, what is the cause?
1.  Method of agreement:  Look to see if whenever the effect is present, which possible factors are present.
2.  Method of difference:  Look for factors whose absence eliminates the effect.   When the effect is absent,  what factors are also absent.
3.  Method of concomitant variation:  Look for quantitative relationship between suspected causal factor and the effect.  Does increasing/decreasing amount of causal factor increase/decrease the effect?
4.  Method of Residues:  Subtracting other  known factors in order to test what is left as a possible cause.

Try to develop a physical model for the causality.  Explain using science how the factor causes the effect.

Three Legal Levels of Confident in Causality
Possible cause:  Correlation discovered but no causality established yet.
Probable Cause:  Good reason to suspect that the correlation is causal. (Required for search warrants.)
Cause Beyond Reasonable Doubt:  Model is so strong that seems unreasonable to doubt the connection.

Case Study 1: Air bags and child safety.
Case Study 2:  Global Warming
Case Study 3:  The Ozone Hole and CFCs