Understanding Low R-squared in Deep Learning

Evaluating a model is crucial after its creation. Different evaluation metrics exist depending on the problem type: regression, classification, etc. This article focuses on R-squared for regression analysis problems, particularly in the context of deep learning where low R-squared values can be encountered.

What is R-squared?

R-squared, also known as the coefficient of determination, is a statistical measure indicating the goodness of fit of a regression model. It represents the proportion of variance in the dependent variable that can be predicted from the independent variables. The R-squared value ranges from 0 to 1. An R-squared of 1 suggests a perfect fit, where the model perfectly predicts the data with no difference between predicted and actual values. Conversely, an R-squared of 0 indicates that the model fails to predict any variability and does not capture any relationship between the dependent and independent variables.

How is R-squared Calculated?

R-squared quantifies how much of the variability in the dependent variable (Y) is explained by the independent variables (Xi) in a regression model. The calculation involves these steps:

Calculate the mean of the target variable (y): Denoted as y̅.
Calculate the Total Sum of Squares (SStot): This measures the total variability in the dependent variable. It is calculated by subtracting each observation (yi) from the mean (y̅), squaring the difference, and summing these squared differences across all values:

Read also: Your Guide to Nursing Internships

SStot = \sum{i=1}^{n} (yi - \bar{y})^2
Estimate the model parameters: Use a suitable regression model like Linear Regression or SVM Regressor to estimate the model parameters.
Calculate the Sum of Squares due to Regression (SSR): This measures the variability explained by the regression model. It is calculated by subtracting each predicted value of y (ŷpredi) from the mean (y̅), squaring the difference, and summing these squared differences across all values:
SSR = \sum{i=1}^{n} (\hat{ypred}i - \bar{y})^2
Calculate the Sum of Squares of Residuals (SSres): This measures the unexplained variability in the dependent variable after prediction. It is calculated by subtracting each predicted value of y (ŷpredi) from the actual value (yi), squaring the difference, and summing these squared differences across all values:

Read also: The Return of College Football Gaming

SSres = \sum{i=1}^n (yi - {ypred}_i)^2
Calculate R-squared: R-squared can be calculated using either of the following formulas:
R^2 = \frac{SSR}{SStot} or R^2 = 1 - \frac{SSres}{SStot}

R-squared is essentially a comparison of the residual sum of squares (SSres) with the total sum of squares (SStot). The total sum of squares represents the summation of squares of perpendicular distances between data points and the average line, while the residual sum of squares represents the summation of squares of perpendicular distances between data points and the best-fitted line.

Interpreting R-squared for Goodness of Fit

The closer the R-squared value is to 1, the better the model fits the data. However, it's crucial to note that an R-squared value can be negative if the model performs worse than a simple average model.

Limitations of R-squared:

A key limitation of R-squared is that its value never decreases when new variables are added to the model, regardless of their significance. This can lead to the inclusion of non-significant attributes, resulting in a misleadingly high R-squared value and a potentially poor regression model. This occurs because SStot remains constant, and the regression model attempts to minimize SSres by finding correlations with the new attribute, even if spurious.

R-squared vs. Adjusted R-squared

Adjusted R-squared is a modified version of R-squared that accounts for the number of independent variables in the model. It addresses the limitation of R-squared by penalizing the inclusion of irrelevant variables. While R-squared always increases with the addition of independent variables, adjusted R-squared increases only if the new variable meaningfully contributes to the model's performance. The adjusted R-squared can be a better measure of predictive power than the R-squared because it penalizes additional parameters and reduces the overfitting of models to data.

R-squared in the Context of Regression Analysis

Regression analysis, a significant part of supervised machine learning, involves predicting a continuous target variable from a set of predictor variables. Unlike binary classification where the target has only two values, regression deals with a target that can have multiple values.

While regression analysis is widely used, there's no single, universally accepted metric for assessing its results. Common metrics include Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). However, these metrics, ranging from zero to +infinity, don't inherently provide information about the regression's performance relative to the ground truth distribution.

R-squared, alongside Symmetric Mean Absolute Percentage Error (SMAPE), addresses this by generating a high score only when the majority of ground truth elements are correctly predicted.

R-squared vs. Other Performance Metrics

Several performance metrics exist for evaluating regression models. These can be broadly categorized into:

Metrics based on variance: R-squared
Metrics based on distance to actual points: MAE, MSE, RMSE, MAPE, SMAPE

Mean Average Error (MAE): The mean average error is the average of the absolute errors between the predicted and actual values

Mean Squared Error (MSE): The mean squared error is the average of the squared differences between the predicted and actual values. MSE can be used if there are outliers that need to be detected.

Root Mean Squared Error (RMSE): The square root of MSE, providing a more interpretable measure in the original unit of the target variable. The two quantities MSE and RMSE are monotonically related (through the square root).

Mean Absolute Percentage Error (MAPE): Focuses on the percentage error, suitable when relative variations are more important than absolute values. MAPE is another performance metric for regression models, having a very intuitive interpretation in terms of relative error: due to its definition, its use is recommended in tasks where it is more important being sensitive to relative variations than to absolute variations.

Symmetric Mean Absolute Percentage Error (SMAPE): Aims to address the drawbacks of MAPE by providing a more balanced measure. Initially defined by Armstrong (1985), and then refined in its current version by Flores (1986) and Makridakis (1993), SMAPE was proposed to amend the drawbacks of the MAPE metric.

A key difference between these metrics lies in their output range. R-squared is upper-bounded by 1 (perfect fit), with 0 indicating a fit no better than a horizontal line representing the mean of the target values. Negative R-squared values indicate an even worse fit. SMAPE is also bounded, ranging from 0% (perfect fit) to 200%. Conversely, MAE, MSE, RMSE, and MAPE range from zero to positive infinity, making their values heavily dependent on the variable ranges and difficult to interpret in isolation.

R-squared and SMAPE offer the advantage of providing a clear indication of model performance regardless of the scale of the variables. For example, R2 = 0.8 and SMAPE = 0.1 indicate a very good regression model performance, regardless of the ranges of the ground truth values and their distributions. This is particularly useful when comparing the predictive performance of a regression on different datasets with different value scales.

Interpreting R-squared Values

R2 = 0: The fitted line (or hyperplane) is horizontal, indicating independence or no correlation between the variables.
R2 < 0: The model performs worse than a horizontal line, potentially due to constrained intercept or slope in linear regression, or a poorly chosen nonlinear model.

The behavior of R-squared is independent of the linearity of the regression model. A very low R-squared can occur even for a completely linear model, and conversely, a high R-squared can occur even when the model is noticeably non-linear.

Common Misconceptions About R-squared

"A high R-squared always means a good model." This is not necessarily true. R-squared can be high for an overfitted model or one built on spurious correlations.
"R-squared measures predictive power." In reality, it only measures model fit on the training data and says nothing about unseen data.
"We should always aim for a high R-squared." Well, not always. It depends on the domain, as well as on the data quality itself. For social sciences like psychology or history, the value of R-squared can be very low (0.1 or even less) and still meaningful.
R² can be calculated before even fitting a regression model, which doesn’t make sense then to use it for judging prediction ability.
You get the same R² value if you flip the input and output around.

Alternatives to R-squared

Confidence Intervals: If the goal is to interpret the slope or intercept, use a confidence interval for them, not R².
Standard Error (SE): If you want to judge the model’s predictions use either the standard error (SE) or the prediction interval, not R².
Prediction Interval: A range within which a new prediction is expected to fall with a certain degree of confidence. The prediction interval is the shaded region in the illustration, and is nonlinear. It is wider the further away you go from the model center point.

All alternatives described are more interpretable than R², and are in units you care about.

Practical Applications of R-squared

Finance: Evaluating the performance of asset pricing models.
Marketing: Measuring the effectiveness of advertising campaigns.

The advantage of using this metric lies in its ability to measure variability: it describes how much variation is explained by the model itself compared to the base case that could have been used.

When R-squared is Not Ideal

R-squared is not ideal when it comes to certain machine learning models such as those involving non-linear regression or time series prediction.

Using R-squared with Arize

Arize is a platform that allows you to monitor and investigate the R-squared metric of regression models. It allows you to track the timestamps of your predictions. You can compare different dates and see their values. Arize allows you to set up the evaluation window, alert notification, and model baseline.

Illustrative Examples

Consider a simple linear regression model for predicting sales based on marketing spend. R-squared helps evaluate its performance.

tags: #low #R-squared #in #deep #learning