Estimating lines of best fit and predictions
Statistics • Statistics
Flashcards
Test your knowledge with interactive flashcards
Key concepts
What you'll likely be quizzed about
Scatter diagram and trend
A scatter diagram plots paired numerical data as individual points on horizontal and vertical axes. Clusters, gradients and spread of points reveal the direction and strength of any relationship between the two variables. The appearance of a clear upward or downward pattern indicates a positive or negative trend, which motivates drawing a line of best fit to summarise the relationship.
Estimated line of best fit
An estimated line of best fit is a straight line drawn so that it follows the central tendency of the points on a scatter diagram. The line balances points above and below it, so that it reflects the general trend rather than individual variation. Simple visual estimation uses eye judgment to place the line; algebraic methods (like least squares) produce a calculated best-fit line but are not required for basic prediction tasks.
Drawing technique and checks
Place the ruler so that roughly equal numbers of points lie on either side of the line and so that extreme points do not overly steer the line. Choose the line that minimises the apparent vertical distances (residuals) between points and the line. Check the line by using it to estimate several known points; consistent close estimates indicate a reasonable fit while systematic over- or under-estimates indicate the line placement needs adjustment.
Making predictions using the line
Use the line of best fit to estimate a dependent variable value for a given independent variable by reading the corresponding coordinate on the other axis. The line converts a horizontal (x) input into a vertical (y) estimate. Predictions are most reliable when the input lies near the middle of the observed data range, because the trend is supported there by actual data points.
Interpolation versus extrapolation
Interpolation uses the line to predict values for inputs that fall within the range of observed data. Interpolation remains relatively reliable because it relies on the established trend in the sample. Extrapolation uses the line to predict values outside the observed range. Extrapolation carries higher risk because the relationship may change beyond the sampled data and the line may not represent unobserved behaviour.
Limitations and dangers
Correlation on a scatter plot does not prove causation; a line of best fit shows association but does not identify cause. Outliers can distort visual lines and produce misleading predictions. Extrapolated predictions can be seriously wrong if underlying conditions change. Seasonal effects, saturation, thresholds and different influencing factors often invalidate trends outside the observed range.
Key notes
Important points to keep in mind