# ROC Curves and ROC AUC

ROC curve is a plot summarizing how well a binary classification model performs for the positive class.

ROC curve, also known as a receiver operating characteristic curve, is a plot summarizing how well a binary classification model performs for the positive class.

The False Positive Rate is shown on the x-axis, and the True Positive Rate is shown on the y-axis.

**ROC Curve: Plot of False Positive Rate (x) vs. True Positive Rate (y)**

The total number of true positive predictions divided by the total of the true positives and false negatives yields the true positive rate (e.g. all examples in the positive class). Sensitivity or Recall are terms used to describe the true positive rate.

True Positive Rate = True Positives / (True Positives + False Negatives)

The false positive rate is the total number of false positive predictions divided by the sum of the false positives and true negatives (e.g. all examples in the negative class).

False Positive Rate = False Positives / (False Positives + True Negatives)

The plot represents the fraction of accurate predictions for the positive class (y-axis) versus the fraction of inaccurate predictions for the negative class (x-axis).

We want the proportion of inaccurate negative class predictions to be 0 (left of the plot) and the fraction of correct positive class predictions to be 1 (top of the plot).

This demonstrates that the top-left of the plot (coordinate 0,1) is the greatest possible classifier that achieves perfect skill.

**Perfect Skill: A point in the top left of the plot**

The threshold is applied to the cut-off point in probability between the positive and negative classes, which by default for any classifier would be set at 0.5, halfway between each outcome (0 and 1).

Since there is a trade-off between the True Positive Rate and the False Positive Rate, altering the classification threshold will shift the balance of predictions in favor of improving the True Positive Rate at the expense of the False Positive Rate or vice versa.

One can create a curve that runs from the bottom left to the top right and curves toward the top left by analysing the true positive and false positives for various threshold levels. This curve is referred to as the ROC curve.

A classifier that has no bias between positive and negative classes will form a diagonal line between a False Positive Rate of 0 and a True Positive Rate of 0 (coordinate (0,0) or predict all negative classes) to a False Positive Rate of 1 and a True Positive Rate of 1 (coordinate (1,1) or predict all positive class). Models with points below this line have worse than no skill.

The curve offers a convenient diagnostic tool to examine the impact on the True Positive Rate and False Positive Rate by applying different threshold values to a single classifier. A threshold can be chosen to bias a classification model's predictive performance behavior.

Due to its lack of bias toward the majority or minority class, it is a popular diagnostic tool for classifiers on balanced and imbalanced binary prediction problems.

When dealing with imbalanced data, the fact that ROC analysis does not have any bias toward models that perform well on the minority class at the expense of the majority class is quite appealing.

**AUC: Area Under the ROC Curve**

The complete two-dimensional area under the entire ROC curve is measured by AUC (Area under the ROC Curve). It provides an aggregate measure of performance across all possible classification thresholds.

The score ranges from 0.0 to 1.0 for a perfect classifier. Direct comparisons between binary classifier models can be made using this one score.