close

Вход

Забыли?

вход по аккаунту

код для вставкиСкачать
Correlation & Regression
•
•
A correlation and regression analysis
involves investigating the relationship
between two (or more) quantitative
variables of interest.
The goal of such an investigation is
typically to estimate (predict) the value of
one variable based on the observed value
of the other variable (or variables).
Quantitative Variables
•
Dependent Variable (Y)
•
•
•
the variable being predicted
called the response variable
Independent Variable (X)
•
•
the variable used to explain or predict Y
called the explanatory or predictor variable
Correlation & Regression
•
Correlation
•
•
Addresses the questions:
“Is there a relationship between X and Y?”
“If so, how strong is it?”
Regression
•
Addresses the question
“What is the relationship between X and Y?”
Simple Linear Relationship
•
A linear (straight line) relationship
between Y and a single X.
•
•
The form of the equation is:
Y = b0 + b1 X,
where b0 is the y-intercept and b1is the slope
A scatter-plot of X versus Y is useful for
spotting linear relationships, and obvious
departures from linear.
Correlation
•
•
A correlation exists between two variables
when they are related in some way.
Linear Correlation Coefficient (r)
•
•
measures the strength of the linear relationship
between X and Y
Properties of r
•


•
-1 ≤ r ≤ 1
r = 1 for a perfect positive linear relationship
r = -1 for a perfect negative linear relationship
p = 0 if there is no linear relationship
Sample Correlation Coefficient
•
Statistics that is useful for estimating the
linear correlation coefficient
r
n xy  x y
n( x )   x 
2
2
n( y )   y 
2
2
Coefficient of Determination
•
The coefficient of determination is the
proportion of variability in Y that can be
explained by its linear relationship to X.
•
Computed by squaring the sample correlation
squared (r2)
Hypothesis Testing of the
Linear Correlation Coefficient
•
Appropriate Hypothesis:
H 0 : r  0 (No Linear Relationsh ip)
H1 : r  0 (Linear Relationsh ip)
Testing r
•
Test Statistic:
t
r
1 r 2
,
df  n  2
n2
•
Rejection Region (3 cases of H1)
1.
2.
3.
Two-tailed: For H1: r ≠ 0, Reject H0 for |t| ≥ tα/2
Left-tailed: For H1: r < 0, Reject H0 for t ≤ -tα
Right-tailed: For H1: r > 0, Reject H0 for t ≥ tα
Simple Linear Regression
•
The Least Squares Regression line is our
"best" line for explaining the relationship
between Y and X.
•
•
It minimizes the squared error (distance
between the observed values and the values
predicted by the line).
The predicted value of Y for any value of X
can be found by plugging that value in for X
in the least squares regression line.
Simple Linear Regression Line
•
The equation is:
where
b1 
and
yˆ  b0  b1 x
n xy  x  y
n( x )   x 
2
b0  y  b1 x
2
Proper Use of
Correlation & Regression
•
•
•
•
•
Correlation does not imply causation.
Simple linear regression is appropriate
only if the data clusters about a line.
Do not extrapolate.
Do not apply model to other populations.
For multiple regression, the size of the
parametersdoes not indicate importance.
Effect of Extreme Values
•
•
Extreme values can have a very large
effect on correlation and regression
analysis.
Influential outliers can largely impact
model fit.
1/--страниц
Пожаловаться на содержимое документа