Chapter 2 Aula 06 - Exercicios
2.1 Exercicio 01
vamos pegar o conjunto de dados:
library(readxl)
= read_excel("./Dados/Data_HousePrice_Area.xlsx", sheet = 1)
dadosCen01 = read_excel("./Dados/Data_HousePrice_Area.xlsx", sheet = 2)
dadosCen02 dadosCen01
## # A tibble: 10 × 2
## `Square Feet` `House Price`
## <dbl> <dbl>
## 1 1400 245
## 2 1600 312
## 3 1700 279
## 4 1875 308
## 5 1100 199
## 6 1550 219
## 7 2350 405
## 8 2450 324
## 9 1425 319
## 10 1700 255
par(mfrow = c(1,2))
plot(dadosCen01$`House Price` ~ dadosCen01$`Square Feet`, main = "cenario 1")
plot(dadosCen02$`House Price` ~ dadosCen02$`Square Feet`, main = "cenario 2")
Comparando os dois gráficos, podemos observar:
cenario 1 está mais disperso
cenario 2 está mais coeso
etc
Algumas estatisticas descritivas
House Price - Cenário 1
summary(dadosCen01$`House Price`)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 199.0 247.5 293.5 286.5 317.2 405.0
House Price - Cenário 2
summary(dadosCen02$`House Price`)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 199.0 247.5 293.5 286.5 317.2 405.0
analise da regressao para os dois cenarios
= lm(dadosCen01$`House Price` ~ dadosCen01$`Square Feet`)
modelCen01 summary(modelCen01) #versao mais completa dos coeficientes
##
## Call:
## lm(formula = dadosCen01$`House Price` ~ dadosCen01$`Square Feet`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.388 -27.388 -6.388 29.577 64.333
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 98.24833 58.03348 1.693 0.1289
## dadosCen01$`Square Feet` 0.10977 0.03297 3.329 0.0104 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 41.33 on 8 degrees of freedom
## Multiple R-squared: 0.5808, Adjusted R-squared: 0.5284
## F-statistic: 11.08 on 1 and 8 DF, p-value: 0.01039
= summary(modelCen01)
sumCen01
= lm(dadosCen02$`House Price` ~ dadosCen02$`Square Feet`)
modelCen02 summary(modelCen02) #versao mais completa dos coeficientes
##
## Call:
## lm(formula = dadosCen02$`House Price` ~ dadosCen02$`Square Feet`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.323 -16.654 2.458 15.838 19.336
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.64509 30.46626 -0.317 0.76
## dadosCen02$`Square Feet` 0.16822 0.01702 9.886 9.25e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.56 on 8 degrees of freedom
## Multiple R-squared: 0.9243, Adjusted R-squared: 0.9149
## F-statistic: 97.73 on 1 and 8 DF, p-value: 9.246e-06
= summary(modelCen02) sumCen02
COEFICIENTE DE DETERMINACAO
para o cenario 1, o valor do \(R^2\) é 0.58
para o cenario 2, o valor do \(R^2\) é 0.92
vamos analisar os residuos
par(mfrow = c(1,2))
plot(modelCen01$residuals ~ dadosCen01$`House Price`)
plot(modelCen02$residuals ~ dadosCen02$`House Price`)