This credit risk assessment problem comes from the private label credit card operation of a major Brazilian retail chain.
The company has been operating its private label card for over 8 years and has applied two different methods for risk assessment with the application's acceptance rate varying from 50% to 75% within this period.
Each application accepted gives the applicant (now, client) the access to credit for purchasing on the retail chain to be billed 10 to 40 days after the purchase, on a monthly basis on a fixed month day. The client was labeled as bad (target variable=1) if, for 11 months after the first bill, he / she had any payment default (a delay longer than 60 days). Otherwise, the client was labeled as good (target variable=0). Therefore, after his / her credit acceptance, a client would take some time to make their first purchase and receive their first bill. Eleven months later, with or without further bills, his / her set of bills for credit risk assessment was completed. Further 60 days were used for maturing the period's last bill.
The competition focuses on performance robustness against degradation along time. Therefore, the competitors' task consists in extracting knowledge from modeling data to achieve the best performance on the company's clients analyzed in a one-year period starting three years later (the prediction data set). The competitor should produce scores for ranking the clients with the highest scores as the most likely to payment delinquency.
Three data sets are available for the participants: modeling, leaderboard and prediction sets. All data sets consist of data in a condition fully matured captured along a whole year in different time periods further separated by extra years. Labels are available only for the modeling data set which has roughly 20% bad clients. This proportion of classes may not be the same present on the leaderboard and prediction data sets, depending on several factors, mostly on the company's policy. Data samples general characteristics are presented in the table below.
|