Problem Characterization

Re-Calibration of a Credit Risk Assessment System Based on Biased Data

The most fundamental and most frequently found type of decision is the Binary Decision. This type of decision appears in any business activity where the decision outcome is either to "do that" or "do something else".

In decision support systems, the typical approach for binary decision problems is to map the multivariate input space into a scalar space (the score) where a simple threshold becomes the control parameter for producing decisions.

Binary decisions, in principle, could be assessed "successful" or "unsuccessful" for either outcome, via errors type-I and type-II. In general, however, only the "do that" decision outcome is monitored for decision assessment due to several aspects, but mainly because of the cost of betting in expectedly wrong decisions.

As a consequence, only a part of the "market" is monitored and labeled as a "successful" or "unsuccessful" decision. Furthermore, this part is a very biased sample of the market for system re-calibration/re-training because, instead of having been randomly drawn, this sample has been extracted by a process focused on optimizing the decision objective.

This competition focuses on how to build a model for a binary decision support system based on this type of biased sample in a credit scoring application. There are only data about the company's clients for modeling, but not about the rejected applicants. These represent a sample of the potential clients (market) that is strongly biased given that a systematic procedure focused on the problem target (payment default) has been applied for selection. The systematic extraction procedure had been a MultiLayer Perceptron trained on an already biased sample of clients extracted from the market by a previous logistic regression system constructed from a smaller set of input variables.

The competition labeled data set available for modeling comprises the companies' clients who have been captured during one year (2006) and the test data set comprises applicants who have been captured during one year (2008). The important aspect to emphasize is that the competition test set contains randomly selected applicants who have had their applications rejected by the credit scoring system. However, for the purpose of monitoring the decision support system's performance and collecting data for future model re-calibration, these clients have received the credit they had applied for.

The data have been collected from a time interval in the past not subjected to any drastic change in the economy; only gradual market changes have occurred within that time span.

This competition focuses on the credit scoring model's generalization capacity from partial biased data sets available for modeling.

Participants will download a labeled data set from a one year period for modeling; download an unlabeled data set from a period over one year later and submit the scores to the LeaderBoard; and download another unlabeled data set from a one year period a year later (the Prediction data set) and submit their scores.

These data sets come from a private label credit card operation of a Brazilian credit company and its partner shops, along stable inflation condition (2006-2009).

The official competition performance metric will be the area under the ROC curve and a Java routine for calculating it is available for download. Some other model performance metrics will be used for comparative purposes and will be released as the competition progresses.

The delinquency of the modeling data set is 26% according to 60 days delay in any payment of the bills contracted along the first year after the credit has been granted. As this data set contains only the "approved clients", it is expected that the delinquency rate increases in the Prediction data set, once that it also contains "rejected clients". The clients labeled "good" are those who are not delinquent.

Both the Leaderboard and the Prediciton data sets were extracted from the same time interval and are affected by time degradation. Innovative ways of handling this matter can be found in PAKDD 2009 Competition whose focus was on this type of degradation. These two data sets differ only in their composition: approved clients vs. approved+rejected clients.

The information about the clients consists of 52 explanatory variables of several types affected by the typical imperfections of actual problems, such as noise, missing data, outliers etc. The 53rd variable (last column on the modeling data set) is the problem target with values 1 for bad clients and 0 for good clients. The variable list with their description is downloadable along with the data sets. Data samples general characteristics are presented in the table below

Data set
Number of patterns 50.000 20.000 20.000
Time interval 12 months 12 months 12 months
Target variable Labeled Unlabeled Unlabeled
Target proportion 26% vs. 74% Unrevealed Unrevealed
Composition Approved Approved Approved + Rejected

There are three main aspects influencing the performance on the unlabeled data sets:

    1 - The bias on the modeling examples' distribution.
    2 - The temporal degradation in the leaderboard and prediction data sets (see PAKDD 2009 Competition for ideas in handling this issue).
    3 - The quality of the data is not very good.

Participants should not give up or feel dismayed for having attained apparently low performance in the LeaderBoard data set. It should be taken into account that, on the top of the already mentioned temporal degradation, several variables concerning residence localization and personal identification have been either encoded or removed to preserve client's confidentiality. This time the ZIP code has been partly preserved so knowledge about Brazilian regions will help. For motivation, competititors should look at the example of PAKDD 2009 Competition: the 1st winner was a team which was only in 60th place on the LeaderBoard. Please, do participate. It is exciting, fun and educative.

This site will remain open after the competition. The LeaderBoard will work as an automatic benchmark on the problem for future impartial performance assessment, in a publicly accessible environment, with several performance metrics for binary decision problems. Eventually, the LeaderBoard (not the prediction) data set labels will be released.

Locations of visitors to this page