OkCupid Binary Predictors

Source

Kim (2015), "OkCupid Data for Introductory Statistics and Data Science Courses", Journal of Statistics Education, Volume 23, Number 2.

Kuhn and Johnson (2020), Feature Engineering and Selection, Chapman and Hall/CRC . https://bookdown.org/max/FES/ and https://github.com/topepo/FES

Value

okc_binary_train,okc_binary_test

data frame frames with 61 columns

Details

Data originally from Kim (2015) includes a training and test set consistent with Kuhn and Johnson (2020). Predictors include ethnicity indicators and a set of keywords derived from text essay data.

Examples

data(okc_binary)
str(okc_binary_train)
#> tibble [38,809 × 61] (S3: tbl_df/tbl/data.frame)
#>  $ software        : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ engineer        : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ startup         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ tech            : num [1:38809] 0 0 0 0 1 0 0 0 0 1 ...
#>  $ computers       : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ engineering     : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ computer        : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ internet        : num [1:38809] 0 0 0 0 1 0 0 0 0 0 ...
#>  $ technology      : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ science         : num [1:38809] 0 0 0 0 1 0 0 0 0 0 ...
#>  $ programming     : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ technical       : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ web             : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ developer       : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ im              : num [1:38809] 1 0 1 0 1 1 0 1 1 0 ...
#>  $ programmer      : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ scientist       : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ code            : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ stephenson      : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ geek            : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ nerd            : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ lol             : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ biotech         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ matrix          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ coding          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ geeky           : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ solving         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ problems        : num [1:38809] 0 0 1 0 1 0 0 0 0 0 ...
#>  $ data            : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ fixing          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ teacher         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ student         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ silicon         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ law             : num [1:38809] 0 0 0 0 0 0 0 1 0 0 ...
#>  $ mechanical      : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ electronic      : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ pratchett       : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ wikipedia       : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ neal            : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ mobile          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ math            : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ lab             : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ systems         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ electronics     : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ futurama        : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ alot            : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ solve           : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ websites        : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ firefly         : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ valley          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ apps            : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ lawyer          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ asian           : num [1:38809] 1 0 0 0 0 0 0 0 0 0 ...
#>  $ black           : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ hispanic_latin  : num [1:38809] 0 0 0 0 0 0 0 1 0 0 ...
#>  $ indian          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ middle_eastern  : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ native_american : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ other           : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ pacific_islander: num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ white           : num [1:38809] 1 1 1 1 1 1 1 1 1 1 ...