There are several files to be downloaded from the site, all in standard ANSI format, except the variables list which is in MS-Excel© format.
The 3 data sets are arranged in columns separated by "TAB" having the first column as "ID_CLIENT" which should be used as a key/identifier. The "ID_CLIENT" ranges from 1 to 50,000 in the modeling data set, from 50,001 to 60,000 in the leaderboard data set and from 60,001 to 70,000 in the prediction data set. The last column is the "TARGET_LABEL" which is filled only for the modeling data set with BAD=1 and GOOD=0. All numerical data use the dot "." as decimal separator (not the comma ",").
The column labels of the data files are in an isolate variables list file. The variable list file has two columns, containing the variables names and their descriptions.
The Leaderboard submission example is a file in the format required for submission for the leaderboard.
The AUC_ROC Java code is available for helping teams to calculate the metrics with the same algorithm used as the competition performance assessment metrics.
The files can only be downloaded one at a time.