Chapter 2 Aula 06 - Exercicios

2.1 Exercicio 01

vamos pegar o conjunto de dados:

library(readxl)
dadosCen01 = read_excel("./Dados/Data_HousePrice_Area.xlsx", sheet = 1)
dadosCen02 = read_excel("./Dados/Data_HousePrice_Area.xlsx", sheet = 2)
dadosCen01
## # A tibble: 10 × 2
##    `Square Feet` `House Price`
##            <dbl>         <dbl>
##  1          1400           245
##  2          1600           312
##  3          1700           279
##  4          1875           308
##  5          1100           199
##  6          1550           219
##  7          2350           405
##  8          2450           324
##  9          1425           319
## 10          1700           255
par(mfrow = c(1,2))

plot(dadosCen01$`House Price` ~ dadosCen01$`Square Feet`, main = "cenario 1")
plot(dadosCen02$`House Price` ~ dadosCen02$`Square Feet`, main = "cenario 2")

Comparando os dois gráficos, podemos observar:

  • cenario 1 está mais disperso

  • cenario 2 está mais coeso

  • etc

Algumas estatisticas descritivas

House Price - Cenário 1

summary(dadosCen01$`House Price`)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   199.0   247.5   293.5   286.5   317.2   405.0

House Price - Cenário 2

summary(dadosCen02$`House Price`)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   199.0   247.5   293.5   286.5   317.2   405.0

analise da regressao para os dois cenarios

modelCen01 = lm(dadosCen01$`House Price` ~ dadosCen01$`Square Feet`)
summary(modelCen01) #versao mais completa dos coeficientes
## 
## Call:
## lm(formula = dadosCen01$`House Price` ~ dadosCen01$`Square Feet`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -49.388 -27.388  -6.388  29.577  64.333 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)  
## (Intercept)              98.24833   58.03348   1.693   0.1289  
## dadosCen01$`Square Feet`  0.10977    0.03297   3.329   0.0104 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 41.33 on 8 degrees of freedom
## Multiple R-squared:  0.5808, Adjusted R-squared:  0.5284 
## F-statistic: 11.08 on 1 and 8 DF,  p-value: 0.01039
sumCen01 = summary(modelCen01)


modelCen02 = lm(dadosCen02$`House Price` ~ dadosCen02$`Square Feet`)
summary(modelCen02) #versao mais completa dos coeficientes
## 
## Call:
## lm(formula = dadosCen02$`House Price` ~ dadosCen02$`Square Feet`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.323 -16.654   2.458  15.838  19.336 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              -9.64509   30.46626  -0.317     0.76    
## dadosCen02$`Square Feet`  0.16822    0.01702   9.886 9.25e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.56 on 8 degrees of freedom
## Multiple R-squared:  0.9243, Adjusted R-squared:  0.9149 
## F-statistic: 97.73 on 1 and 8 DF,  p-value: 9.246e-06
sumCen02 = summary(modelCen02)

COEFICIENTE DE DETERMINACAO

para o cenario 1, o valor do \(R^2\) é 0.58

para o cenario 2, o valor do \(R^2\) é 0.92

vamos analisar os residuos

par(mfrow = c(1,2))
plot(modelCen01$residuals ~ dadosCen01$`House Price`)
plot(modelCen02$residuals ~ dadosCen02$`House Price`)