The term statisticscan represent either
1)
the science (theory and methodology) of collecting, organizing, and
interpreting data or
2) the actual descriptive or summary data.
Statistical
Study
Population: the set which the study
is attempts to draw conclusions about.
Sample: A subset of the population
that is actually studied.
Study gathers raw
data from the sample.
Sample statistics are calculated
for the raw data. (An example of a sample statistic would be the mean of
the sample)
Information (population
parameters) about the population are inferred (induced) from the
sample statistics. For example, we infer from the sample mean, an estimate
for a range that the population mean should fall between.
Steps in a statistical study
1.
Determine goal, population
2. Choose appropriately sized sample from the population and collect raw data.
3. Analyze the date, calculate sample statistics (mean, median,
mode, etc.)
4. Make inferences about the population (population parameters) based on
sample results.
5. Evaluate the likelihood (probability) that inferred population
parameters reflect the true population parameters and draw conclusions.
Example: Suppose you wanted to
investigate sleeping patterns of students at SJC. How would the steps
above be applied?
Impractical
to study entire population.
Must select a truly representative sample.
Types of Sampling:
1. Random
sampling is
generally the best -- each member of population has equally likely chance of
being selected. For example, we can get a random sample by numbering the
population, generating random numbers, and select those from population who
match the numbers.
2. Systematic sampling:
Use a systematic system such as choosing every 10th member of the population.
3.
Convenience sampling: Select one easy to select -- Poll this class to determine
snack preferences of SJC students.
4.
Stratified sampling: Identify subgroups within a population and then draw random
samples within each subgroup to generate total sample.
Watching out for bias:
Any problem in the design or implementation of a statistical study that tends
to favor certain results (i.e. where the true parameters of population are not
inferred). Often difficult to get a representative sample.
Biased sample: One which is not representative.
Example: SJC student population -- sample math class. Why is this
not a representative sample?
Two
types of statistical studies: Observations and Experiments
Observational
Studies:
Observe and measure characteristics of sample, but do not try to influence the
members. Every member treated identically.
Controlled
Experiments:
Researchers create two or more groups for the sample. One group receives
experimental treatment; the other group (control group) does not.
Placebo Effect and Blinding
Potential
problems with experiments with people. Placebo effect:
patients improve because they think they are receiving a useful treatment.
Use a placebo with the control group -- contains no active
ingredient.
Single blind experiment: participants don't know which
group they belong to (but experimenters do).
Double-blind experiment: Neither researchers conducting
experiment nor participants know which group participants belong to --
researchers then will not be biased.
Case control study: Participants choose which group they want to be in. Those who engage in behavior are cases; those who do not are the controls. May be only ethical choice in some cases (example of effect of alcohol on pregnancy.) Example of an observational study.
Confidence Interval and Margin of Error
Margin
of error for a
population parameter predicted by a survey or other statistical study is
the range about about the parameter that there is a 95% chance that the true
population will lie within that range.
Example: If a
poll predicts that 43% of the population supports universal health insurance with
a margin of error of +/- 4%, it means that there is a 95% chance that between
39% and 47% of the population supports universal health insurance.
Text gives 7 guidelines:
1. Identify the
goal, type and population.
(Must understand this before you can evaluate quality of study.)
Is type of study appropriate to the goal; is population well-defined, appropriate for goal?
2. Consider the source of the experiment.
Who
funded, conducted? Is there a potential for bias based on benefits of the
researchers or funders from the results? Example of Tobacco Research
Institute.
3. Look for bias in the sample.
Two
common types:
Selection bias -- researchers' method of selection has natural
bias.
Participation bias -- in voluntary studies, people who feel
strongly about the issue may be more likely to volunteer.
4. Look for difficulties in defining quantities of interest.
Is that which is to be measured well-defined in the study?
Example: Level of unemployment -- is unemployed appropriately defined?
Example of how do you measure amount of exercise -- time, vigorousness?
Example 7 -- Illegal Drug Supply.
5. Look for confounding variables: Confounding variables are variables that are not intended to be part of the study but confuse the study's results.
Example
8: Radon and Lung Cancer
SJC sleep patterns: Students who get more sleep have higher
GPA's. Conclusion (?) Sleep benefits GPA?
6. If a survey is involved, consider its setting and wording.
Telephone survey?
Mail survey?Availability Errors: setting places thoughts in participants mind which influence they responses to survey.
Wording of questions in survey: Example 9: Traffic vs. Air Pollution survey.
7. Check for consistency between results and interpretation.
Example
11: School Board in Boulder, Colorado.
8. Consider the conclusions -- do they seem meaningful?
Example
12: Weight loss of 1/2 pound more on Fast Diet Supplement --
insignificant?
Establishing Causality:
Does smoking cause lung cancer? Does studying cause good grades?
Correlation
(relationship between two variables --
No correlattion (coefficient of correlation is close to zero) -- no apparent
relationship between the variables.
Positive: Both
variables tend to increase or decrease together.
Negative:
Variables tend to change in the opposite direction.
Strength may vary --
values for coefficient of correlation close to + 1 or -1 indicate a strong
correlation, while values near .5 or -.5 indicate a moderate correlation.
1. Coincidence. Example: Rise in stock market
correlates with winner of the Super Bowl being the one from the city
alphabetically later in the alphabet.
2. Common underlying cause. Infant deaths and longevity. (figure
2.13)
3. One variable is the cause of the other variable. Most important
-- difficult to establish.
Possible types of
cause of effect:
1. Cause is necessary if effect cannot happen in its absence.
2. Cause is sufficient if effect always happens when the event
occurs.
3. Cause is probabilistic if it is neither necessary nor
sufficient, but increases the probability ( likelihood ) of the effect.
Some Guidelines for
Establishing Causality:
John Stuart Mill's
Guidelines: Given an effect, what is the cause?
1. Method of agreement: Look to see if whenever the effect
is present, which possible factors are present.
2. Method of difference: Look for factors whose absence
eliminates the effect. When the effect is absent, what
factors are also absent.
3. Method of concomitant variation: Look for quantitative
relationship between suspected causal factor and the effect. Does
increasing/decreasing amount of causal factor increase/decrease the effect?
4. Method of Residues: Subtracting other known factors
in order to test what is left as a possible cause.
Try to develop a
physical model for the causality. Explain using science how the factor
causes the effect.
Three Legal Levels
of Confident in Causality
Possible cause: Correlation discovered but no causality
established yet.
Probable Cause: Good reason to suspect that the correlation is
causal. (Required for search warrants.)
Cause Beyond Reasonable Doubt: Model is so strong that seems
unreasonable to doubt the connection.
Case
Study 1: Air bags and child safety.
Case Study 2: Global Warming
Case Study 3: The Ozone Hole and CFCs