9 Logistic Regression
Logistic regression is a modeling technique used when the outcome variable is binary, meaning it has two possible values (such as 0/1, yes/no, or success/failure). Instead of predicting a continuous value, logistic regression estimates the probability that an observation belongs to a particular class.
9.1 Create Train and Test Sets
As before, we split the data so that we can evaluate the model on unseen data.
In this example, the outcome variable is:
-
large_body = 1→ above-median body mass
-
large_body = 0→ below-median body mass
9.2 Fit the Logistic Regression Model
glm_model <- glm(
large_body ~ bill_length_mm + bill_depth_mm + flipper_length_mm,
data = train_cls,
family = "binomial"
)-
glm()is used for generalized linear models
-
family = "binomial"specifies logistic regression
9.3 Inspect the Model
summary(glm_model)
Call:
glm(formula = large_body ~ bill_length_mm + bill_depth_mm + flipper_length_mm,
family = "binomial", data = train_cls)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -42.64099 7.12539 -5.984 2.17e-09 ***
bill_length_mm -0.06164 0.04502 -1.369 0.171
bill_depth_mm -0.01402 0.12162 -0.115 0.908
flipper_length_mm 0.22738 0.03350 6.787 1.15e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 367.79 on 265 degrees of freedom
Residual deviance: 163.89 on 262 degrees of freedom
AIC: 171.89
Number of Fisher Scoring iterations: 6
- Coefficients represent how each predictor affects the log-odds of the outcome
- Positive values increase the probability of class 1
- Negative values decrease the probability of class 1
9.4 Predict Probabilities
prob_preds <- predict(glm_model, newdata = test_cls, type = "response")- Predictions are probabilities between 0 and 1
- These represent the likelihood of belonging to class 1
9.5 Convert Probabilities to Classes
class_preds <- ifelse(prob_preds >= 0.5, 1, 0)- A threshold (here, 0.5) is used to assign class labels
- This converts probabilities into predictions
9.6 Evaluate the Model
mean(class_preds == test_cls$large_body)[1] 0.8358209
- This computes accuracy, the proportion of correct predictions
table(actual = test_cls$large_body, predicted = class_preds) predicted
actual 0 1
0 29 2
1 9 27
- This is a confusion matrix
- It shows how many predictions were correct vs incorrect
9.7 Key Takeaways
- Logistic regression is used for binary outcomes
- The model predicts probabilities, not just classes
- A threshold is used to convert probabilities into predictions
- Evaluation should be done on a test dataset
Like linear regression, the goal is not just to fit the data,
but to make reliable predictions on new observations.