class: title-slide <br> <br> .pull-right[ # Multiple Linear Regression ## Dr. Mine Dogucu ] --- #### Data `babies` in `openintro` package ``` ## Rows: 1,236 ## Columns: 8 ## $ case <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1… ## $ bwt <int> 120, 113, 128, 123, 108, 136, 138, 132, 120, 143, 140, 144, … ## $ gestation <int> 284, 282, 279, NA, 282, 286, 244, 245, 289, 299, 351, 282, 2… ## $ parity <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, … ## $ age <int> 27, 33, 28, 36, 23, 25, 33, 23, 25, 30, 27, 32, 23, 36, 30, … ## $ height <int> 62, 64, 64, 69, 67, 62, 62, 65, 62, 66, 68, 64, 63, 61, 63, … ## $ weight <int> 100, 135, 115, 190, 125, 93, 178, 140, 125, 136, 120, 124, 1… ## $ smoke <int> 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, … ``` --- class: middle <div align = "center"> | y | Response | Birth weight | Numeric | |---|-------------|-----------------|---------| | `\(x_1\)` | Explanatory | Gestation | Numeric | | `\(x_2\)` | Explanatory | Smoke | Categorical | --- ## Notation `\(y_i = \beta_0 +\beta_1x_{1i} + \beta_2x_{2i} + \epsilon_i\)` `\(\beta_0\)` is intercept `\(\beta_1\)` is the slope for gestation `\(\beta_2\)` is the slope for smoke `\(\epsilon_i\)` is error/residual `\(i = 1, 2, ...n\)` identifier for each point --- ```r model_gs <- lm(bwt ~ gestation + smoke, data = babies) tidy(model_gs) ``` ``` ## # A tibble: 3 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -0.932 8.15 -0.114 9.09e- 1 ## 2 gestation 0.443 0.0290 15.3 3.16e-48 ## 3 smoke -8.09 0.953 -8.49 5.96e-17 ``` -- Expected birth weight for a baby who had 280 days of gestation with a smoker mother `\(\hat {\text{bwt}_i} = b_0 + b_1 \text{ gestation}_i + b_2 \text{ smoke}_i\)` `\(\hat {\text{bwt}_i} = -0.932 + (0.443 \times 280) + (-8.09 \times 1)\)` --- ```r babies %>% modelr::add_predictions(model_gs) %>% select(bwt, gestation, smoke, pred) ``` ``` ## # A tibble: 1,236 x 4 ## bwt gestation smoke pred ## <int> <int> <int> <dbl> ## 1 120 284 0 125. ## 2 113 282 0 124. ## 3 128 279 1 115. ## 4 123 NA 0 NA ## 5 108 282 1 116. ## 6 136 286 0 126. ## 7 138 244 0 107. ## 8 132 245 0 108. ## 9 120 289 0 127. ## 10 143 299 1 123. ## # … with 1,226 more rows ``` --- ```r babies %>% modelr::add_predictions(model_gs) %>% modelr::add_residuals(model_gs) %>% select(bwt, gestation, smoke, pred, resid) ``` ``` ## # A tibble: 1,236 x 5 ## bwt gestation smoke pred resid ## <int> <int> <int> <dbl> <dbl> ## 1 120 284 0 125. -4.84 ## 2 113 282 0 124. -11.0 ## 3 128 279 1 115. 13.5 ## 4 123 NA 0 NA NA ## 5 108 282 1 116. -7.87 ## 6 136 286 0 126. 10.3 ## 7 138 244 0 107. 30.9 ## 8 132 245 0 108. 24.4 ## 9 120 289 0 127. -7.05 ## 10 143 299 1 123. 19.6 ## # … with 1,226 more rows ```