OkCupid Binary Predictors
Kim (2015), "OkCupid Data for Introductory Statistics and Data Science Courses", Journal of Statistics Education, Volume 23, Number 2.
Kuhn and Johnson (2020), Feature Engineering and Selection, Chapman and Hall/CRC . https://bookdown.org/max/FES/ and https://github.com/topepo/FES
data frame frames with 61 columns
Data originally from Kim (2015) includes a training and test set consistent with Kuhn and Johnson (2020). Predictors include ethnicity indicators and a set of keywords derived from text essay data.
data(okc_binary)
str(okc_binary_train)
#> tibble [38,809 × 61] (S3: tbl_df/tbl/data.frame)
#> $ software : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ engineer : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ startup : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ tech : num [1:38809] 0 0 0 0 1 0 0 0 0 1 ...
#> $ computers : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ engineering : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ computer : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ internet : num [1:38809] 0 0 0 0 1 0 0 0 0 0 ...
#> $ technology : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ science : num [1:38809] 0 0 0 0 1 0 0 0 0 0 ...
#> $ programming : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#> $ technical : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ web : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ developer : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ im : num [1:38809] 1 0 1 0 1 1 0 1 1 0 ...
#> $ programmer : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#> $ scientist : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ code : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ stephenson : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ geek : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#> $ nerd : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#> $ lol : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ biotech : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ matrix : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ coding : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ geeky : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#> $ solving : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ problems : num [1:38809] 0 0 1 0 1 0 0 0 0 0 ...
#> $ data : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ fixing : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ teacher : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ student : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ silicon : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ law : num [1:38809] 0 0 0 0 0 0 0 1 0 0 ...
#> $ mechanical : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ electronic : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#> $ pratchett : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ wikipedia : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ neal : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ mobile : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ math : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ lab : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#> $ systems : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ electronics : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ futurama : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ alot : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ solve : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ websites : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ firefly : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#> $ valley : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ apps : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ lawyer : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ asian : num [1:38809] 1 0 0 0 0 0 0 0 0 0 ...
#> $ black : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ hispanic_latin : num [1:38809] 0 0 0 0 0 0 0 1 0 0 ...
#> $ indian : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ middle_eastern : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ native_american : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ other : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ pacific_islander: num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#> $ white : num [1:38809] 1 1 1 1 1 1 1 1 1 1 ...