F-Score (F1-Score)

A measure used to evaluate the performance of a classification model

F-Score (F1-Score) is a measure used to evaluate the performance of a classification model, particularly in cases where the dataset is imbalanced (i.e., one class is more frequent than the other). It is the harmonic mean of precision and recall, providing a single metric that balances the two. The F-score is especially useful when you want to balance the trade-off between false positives and false negatives.

The F1-Score is defined as the harmonic mean of precision and recall:

F1=2× Precision+Recall / Precision×Recall

  • Precision: The proportion of true positive predictions out of all positive predictions made by the model (i.e., the accuracy of the positive class).
  • Recall (Sensitivity or True Positive Rate): The proportion of true positive predictions out of all actual positive samples (i.e., the model's ability to capture all positive instances).
  • The harmonic mean penalizes extreme values more than the arithmetic mean. Therefore, the F1-score only becomes high when both precision and recall are reasonably high.

Interpretation of F1-Score

  • F1-Score = 1: Indicates perfect precision and recall, meaning that all positive predictions are correct and all actual positives are captured by the model.
  • F1-Score = 0: Means either precision or recall is zero, meaning the model is either failing to capture positive instances or is making entirely incorrect positive predictions.

Use Cases of F1-Score:

  1. Imbalanced Datasets: When the dataset has imbalanced classes (e.g., one class is significantly more frequent than the other), accuracy can be misleading. The F1-score provides a more meaningful evaluation by focusing on the minority class and balancing precision and recall.
  2. Trade-off between False Positives and False Negatives: In certain applications, both false positives and false negatives have consequences, such as in spam detection or medical diagnosis. The F1-score helps ensure that neither precision nor recall is overly favored.
  3. Binary Classification: It is commonly used for binary classification problems, such as fraud detection, churn prediction, and binary medical diagnoses.


  1. Medical Diagnostics: In healthcare, F1-score is crucial, especially when identifying patients with rare diseases. The F1-score helps ensure that the model captures as many actual cases as possible (high recall) without flooding with false positives (high precision).
  2. Spam Detection: For spam filters, the F1-score is useful to balance the risk of marking important emails as spam (false positives) versus letting spam emails through (false negatives).
  3. Fraud Detection: In fraud detection, both precision and recall are critical. A high F1-score ensures that the system not only captures fraudulent transactions but also minimizes the number of legitimate transactions flagged as fraud.

