There are several files to be downloaded from the site, all in standard ANSI format, except the variables list which is in MS-Excel© format.
The 3 data sets are arranged in columns separated by "TAB" having the first column as "ID_CLIENT" which should be used as a key/identifier. The "ID_CLIENT" ranges from 1 to 50,000 in the modeling data set, from 50,001 to 70,000 in the leaderboard data set and from 70,001 to 90,000 in the prediction data set. The last column is the "TARGET_LABEL" which is filled only for the modeling data set with BAD=1 and GOOD=0. All numerical data use the dot "." as decimal separator (not the comma ",").
The column labels of the data files (header) are in an isolate variables list file. The variables list file has four columns, containing the variables order number, names, descriptions and contents.
The Leaderboard submission example is a file in the format required for submission for the leaderboard. The same format is required for the predictions submission.
The AUC_ROC Java code is available for helping teams to calculate the metrics with the same algorithm used as the competition performance assessment metric.
Whenever the scripts for the other metrics are available, they will be released, even though they are not going to be used for ranking.
The files can only be downloaded one at a time.
|