As I’ve previously mentioned, I’m currently enrolled in Andrew Ng’s Machine Learning class on coursera.org (still highly recommended!). This post will cover when to use logistic regression, which is a nice technique for classification in the field of ML.

## What is logistic regression?

Logistic regression is a supervised learning technique for assessing the probability that an input vector is a member of a particular class. It uses the sigmoid function, which takes any real input, and outputs a value between 0 & 1.

To effectively use the sigmoid function for classification, the input vector`x`

must be mapped into `z`

, which is the sigmoid function input. Mapping an input vector into `z`

consists of performing a matrix multiplication with a coefficients matrix `A`

, which will be discussed later: `z = xA`

. Then, `sigmoid(z)`

indicates the probability that `x`

is a member of the class.
There’s a nice property here: `sigmoid(0) = 0.5`

, so `sigmoid(z) >= 0.5`

if and only if `z >= 0`

, and `sigmoid(z) < 0.5`

if and only if `z < 0`

. Therefore, if we’re using this function for classification (is `x`

a member of the class or not?), we can simply check whether `z = xA`

is less than or greater than 0. If `z < 0`

, the probability that `x`

is a member of the class is < 50% and it is not considered a member of the class. Otherwise, `z >= 0`

indicates the probability that `x`

is a member of the class is >= 50% and it is considered a member of the class.

Since this is a supervised learning technique, there must be a set of input vectors, and a corresponding set of output values. In this case, the input vector set can be represented as the matrix `X`

, with one row per input vector. The output values are represented by the matrix `Y`

, where `Y[i]`

corresponds to the expected output of the function when given inputs `X[i]`

. Values in `Y`

are always only 1 or 0, indicating that `X[i]`

is, or is not, a member of the class, respectively.

The coefficients matrix `A`

is found through training. Training is the task of finding the values in `A`

such that `W = sigmoid(XA)`

, the element-wise application of the sigmoid function to every element in the matrix `XA`

, is as close to `Y`

as possible. It is typically accomplished through an iterative method such as gradient descent.

## When to use it?

Logistic regression assumes that the relationship between the input values in `X`

and the dependent values in `Y`

have a discrete relationship – a subset of input values from `X`

from maps to value 1 (a member of the class), and the complementary inputs map to value 0 (not a member of the class). Applying inputs to a trained logistic regression model will produce the probability that the inputs belong to the class. So, if you are trying to classify your inputs into 2 groups, try using logistic regression to classify them.

## Good candidates for using logistic regression

- Predict whether a tumor is cancerous based on easily measured physical properties such as size, color, color consistency, and border irregularity
- Predict whether a soccer player will score a goal in a particular game
- Determine if an image contains a picture of a cat

Common attributes in these examples include:

- Prediction – a person would use logistic regression to classify an unknown future event based on known present values
- Single-class classification – The model can predict whether or not a tumor is cancerous, whether or not a player will score a goal, and whether or not an image contains a cat. In all cases, the model is reporting a binary, true/false value indicating whether the inputs are a member of a class. A logistic regression model consisting of a single sigmoid function can’t do things like determine how many goals the player will score, or determine what kind of animal is in the picture. We would need a different model to do those things.

## Poor candidates for using logistic regression

It’s helpful to contrast good candidates with poor candidates for logistic regression. Here are some situations where using logistic regression based on a single sigmoid computation would not provide good predictive value.

### Predicting the best treatment for a particular cancer

There may be many possible treatments for a particular cancer, such as surgery, chemotherapy, radiation therapy, or experimental options. The choice of which treatment would be based on the cancer and the patient, possibly including current health and genetic factors. Zero, one, or more options might be chosen. Logistic regression can only classify inputs into binary options – to choose among a set with more than 2 options, a more sophisticated classification model is needed, such as a one-vs-all model or a neural network.

### Predicting which color car a customer is likely to buy

Typically a car can be bought in 3 or more colors. Therefore, the output of a model that predicts the color choice must be able to indicate 3 or more options, which a single sigmoid computation cannot do. This problem would be a good candidate for a one-vs-all approach or a neural network.

### Predicting how many goals a player may score in a soccer game

An argument could be made that a linear regression model provides the best predictive value for this problem, since the number of goals a player may score is theoretically limited only by the rate at which the ball can be kicked into the goal from the midfield line, and returned to that spot. However, it’s extremely rare in practice for any individual player to score more than 4 goals in a game, so this problem could also be modeled well using a one-vs-all or neural network approach.

### Predicting how many likes an image of a cat will generate in a social media post

The number of likes on a social media starts at 0 and has effectively no upper limit. It’s not a classification problem, it’s a linear regression problem.

## Conclusion

So there you have it – use logistic regression when your inputs and outputs have a discrete relationship, and the output is an indication of class membership or exclusion.

## PS: Logistic regression is a foundation for more powerful models

I previously mentioned models such as one-vs-all and neural networks.

One-vs-all is a technique that uses multiple logistic regression models to enable multi-class classification. In short, model 1 indicates membership in class 1, model 2 indicates membership in class 2, and so on. Providing the same inputs to all the models tells you which classes the inputs belong to. When it works well, most inputs belong to only 1 class.

Neural networks are systems of logistic regression models. Each model is called a node. Nodes are organized into layers, with one layer consisting only of inputs, another layer consisting only of outputs, and between the input & output layers there may be multiple inner/hidden layers. Outputs from one layer are multiplied by coefficients found during training and provided as inputs to the next layer. When used for multi-class classification, node V in the output layer represents membership in class V. Neural networks can be used for other purposes as well, and they would not be possible without logistic regression!

I’d love to hear any other situations where logistic regression would be valuable! I’d also love to get any corrections or feedback on this topic – please leave a comment below if you can help improve this article.