PAKDD 2009 Data Mining Competition
Overview

The 13th Pacific-Asia Knowledge Discovery and Data Mining conference (PAKDD 2009) is pleased to host another data mining competition, co-organized by NeuroTech Ltd. and Center for Informatics of the Federal University of Pernambuco (Brazil).

Competitions in scientific events have been organized world-wide for stimulating the application of state-of-the-art approaches to real world problems. In recent years, PAKDD has organized several data mining competitions and this year presents a problem on the well known application of credit scoring. The main novelty is the LeaderBoard for stimulating the competitors’ participation by assessing and ranking their preliminary solutions on an unofficial data set.

The competition is open for academia and industry. The only ineligible participants are staff and students from Center for Informatics of the Federal University of Pernambuco and NeuroTech Ltd.

Problem Summary

Credit Risk Assessment on a Private Label Credit Card Application

The offer of credit for potential clients is a very important service for stimulating consumption in the market. Despite being among the oldest application domains for data mining (from the times when not even the name "data mining" existed), there are some difficulties related to credit scoring which are often overlooked by modelers, namely.

  • In general, there are only data about the company's clients for modeling, but not about the rejected applicants. These represent a sample of the potential clients (market) that is strongly biased given that a systematic procedure focused on the problem target (payment default) has been applied for selection.

  • Also, the data has been collected from a time interval in the past for developing a model to be applied in a future time. Not considering any drastic change in the economy, gradual market changes occur and reduce the performance of the model estimated on the modeling data set.

Another important aspect, often, overlooked by scientists is the series of impacts of re-calibrating the solution already in operation (retraining and tuning the decision threshold) for correcting the degradation to improve or, at least, preserve the existing performance. This task is particularly risky when the credit is lent for long term payment (such as mortgages). Furthermore, the score generated by such a model may be in use within the company for several decision making processes such as, for instance, the marketing/sales department trading off market expansion with risk increase or as input to other decision systems such as debt collection scoring models.

This competition focuses on the credit scoring model's robustness against performance degradation caused by market gradual changes along few years of business operation.

Participants will download a labeled data set from one year period for modeling, will submit scores to the leaderboard dataset from one year later, and the competition results will be assessed on scores submitted for a prediction dataset starting from three years later. These data sets come from the private label credit card operation of a major Brazilian retail chain, along stable inflation condition (2003-2008).

The official performance metrics will be the area under the ROC curve and a Java routine for calculating it is available for download.

Locations of visitors to this page