Interpret scatter graphs and correlation
Statistics • Statistics
Flashcards
Test your knowledge with interactive flashcards
Key concepts
What you'll likely be quizzed about
Definition of scatter graphs and bivariate data
A scatter graph (scatterplot) plots bivariate data: two linked numerical variables recorded for the same items or individuals. Each plotted point uses one coordinate from each variable, typically written as (x, y). Bivariate data analysis focuses on the relationship between the two variables. Clear axis labels and consistent scales are essential for accurate reading and comparison.
Axes, variables and plotting points
The horizontal axis (x-axis) typically shows the independent or explanatory variable; the vertical axis (y-axis) shows the dependent or response variable. Each pair of measurements becomes one point on the grid. Accurate plotting requires correct units and equal intervals. Mislabelled axes or uneven scales can distort the apparent relationship between variables.
Direction and form of correlation
Direction describes whether y tends to increase or decrease as x increases. A positive correlation arises when points slope upwards; a negative correlation arises when points slope downwards. No correlation arises when points show no clear trend. Form describes whether the relationship approximates a straight line (linear) or a curve (non-linear). Linear form allows summarising with a straight line; non-linear form requires a different model or description.
Strength of correlation and visual assessment
Strength measures how closely the points cluster around an imagined line or curve. Strong correlation shows points close to a clear line; weak correlation shows widely scattered points. A moderate correlation falls between these extremes. Outliers reduce perceived strength and can mislead assessment. Clustering in subgroups can hide or exaggerate a relationship; careful inspection is necessary before drawing conclusions.
Line of best fit and estimation
A line of best fit (trend line) summarises a linear relationship by fitting a straight line through the points so that it represents the central tendency. The line enables interpolation: estimating y for x values inside the data range. Extrapolation uses the line to estimate outside the observed range and carries increasing uncertainty. The method of fitting (by eye or by calculation) affects accuracy; least-squares methods produce a precise fit but require calculation.
Correlation versus causation and limitations
Correlation indicates association but not cause. Two variables can correlate because one causes the other, because the causation runs the other way, because both are caused by a third (lurking) variable, or because of coincidence. Conclusions about causation require controlled experiments, temporal evidence, or additional justification. Observational scatter graphs alone cannot establish causal links; recognition of confounding factors and outliers reduces misinterpretation.
Key notes
Important points to keep in mind