Credit Card Default Data Set
Contains 20,000 individuals described by 23 attributes (e.g., gender, age). We have removed individuals with missing attributes and reduced sample size to 20,000 from 30,000.
Label is Default Payment (1:yes; 0:no).
Sensitive feature is Education Degree. We have binarized the original value (1:graduate school; 2:university; 3:high school; 4:others) into (1:lower education) if it is <=3 and (0:higher education) otherwise (as done in The Price of Fair PCA: One Extra dimension)
creditcarddefault.csv is the data set; each row is an individual; the 24th column is label; the 3th column is sensitive feature.
creditdefault_index.csv contains 50 random shuffles of individual indicies; each row is a random shuffle.
Communities and Crime Data Set
Contains 1,993 communities described by 101 attributes (e.g., population, household size).
Label is Crime Rate (1:high; 0:low).
Sensitive feature is Percentage of African American Residents. We have binarized the original value into (1:high) if it is >=50% and (0:low) otherwise.
crimecommunity.csv is the data set; each row is a community; the 101th column is label; the 1th column is sensitive feature.
crimecommunity_index.csv contains 50 random shuffles of community indicies.
COMPAS Data Set
Contains 16,000 defendents described by 16 attributes (e.g., sex, ethnic).
Label is Risk of Recidivism (1:high; 0:low).
Sensitive feature is Race (1:black; 0:white).
compas.csv is the data set; each row is a defendant; the 16th column is label; the 15th column is sensitive feature.
compas_index.csv contains 50 random shuffles of defendant indicies.
This webpage is maintained by Austin Okray (firstname.lastname@example.org).