NeuroTech

CIN
Downloads

There are several files to be downloaded from the site, all in standard ANSI format, except the variables list which is in MS-Excel© format.

The 3 data sets are arranged in columns separated by "TAB" having the first column as "ID_CLIENT" which should be used as a key/identifier. The "ID_CLIENT" ranges from 1 to 50,000 in the modeling data set, from 50,001 to 70,000 in the leaderboard data set and from 70,001 to 90,000 in the prediction data set. The last column is the "TARGET_LABEL" which is filled only for the modeling data set with BAD=1 and GOOD=0. All numerical data use the dot "." as decimal separator (not the comma ",").

The column labels of the data files (header) are in an isolate variables list file. The variables list file has four columns, containing the variables order number, names, descriptions and contents.

The Leaderboard submission example is a file in the format required for submission for the leaderboard. The same format is required for the predictions submission.

The AUC_ROC Java code is available for helping teams to calculate the metrics with the same algorithm used as the competition performance assessment metric.

Whenever the scripts for the other metrics are available, they will be released, even though they are not going to be used for ranking.

The files can only be downloaded one at a time.



Standard Classification Task
# Files Number of patterns Time interval Target variable Target proportion Release File Size(Kb)
Modeling 50,000 12 months Labeled 26% vs. 74% Mar 17 2.058
LeaderBoard 20,000 12 months Unlabeled Unrevealed Mar 17 820
Prediction 20,000 12 months Unlabeled Unrevealed Apr 16 814
Area Under ROC (Java Code) -- -- -- -- Mar 15 1
Leaderboard Submission Example -- -- -- -- Mar 17 150
Variables List -- -- -- -- Mar 17 7

Locations of visitors to this page