Skip to content

OkCupid Binary Predictors

Source

Kim (2015), "OkCupid Data for Introductory Statistics and Data Science Courses", Journal of Statistics Education, Volume 23, Number 2. doi:10.1080/10691898.2015.11889737

Kuhn and Johnson (2020), Feature Engineering and Selection, Chapman and Hall/CRC . https://bookdown.org/max/FES/ and https://github.com/topepo/FES

Value

okc_binary_train,okc_binary_test

data frame frames with 61 columns

Details

Data originally from Kim (2015) includes a training and test set consistent with Kuhn and Johnson (2020). Predictors include ethnicity indicators and a set of keywords derived from text essay data.

Examples

data(okc_binary)
str(okc_binary_train)
#> tibble [38,809 × 61] (S3: tbl_df/tbl/data.frame)
#>  $ software        : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ engineer        : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ startup         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ tech            : num [1:38809] 0 0 0 0 1 0 0 0 0 1 ...
#>  $ computers       : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ engineering     : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ computer        : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ internet        : num [1:38809] 0 0 0 0 1 0 0 0 0 0 ...
#>  $ technology      : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ science         : num [1:38809] 0 0 0 0 1 0 0 0 0 0 ...
#>  $ programming     : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ technical       : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ web             : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ developer       : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ im              : num [1:38809] 1 0 1 0 1 1 0 1 1 0 ...
#>  $ programmer      : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ scientist       : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ code            : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ stephenson      : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ geek            : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ nerd            : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ lol             : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ biotech         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ matrix          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ coding          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ geeky           : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ solving         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ problems        : num [1:38809] 0 0 1 0 1 0 0 0 0 0 ...
#>  $ data            : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ fixing          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ teacher         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ student         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ silicon         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ law             : num [1:38809] 0 0 0 0 0 0 0 1 0 0 ...
#>  $ mechanical      : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ electronic      : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ pratchett       : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ wikipedia       : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ neal            : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ mobile          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ math            : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ lab             : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ systems         : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ electronics     : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ futurama        : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ alot            : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ solve           : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ websites        : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ firefly         : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ valley          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ apps            : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ lawyer          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ asian           : num [1:38809] 1 0 0 0 0 0 0 0 0 0 ...
#>  $ black           : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ hispanic_latin  : num [1:38809] 0 0 0 0 0 0 0 1 0 0 ...
#>  $ indian          : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ middle_eastern  : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ native_american : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ other           : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ pacific_islander: num [1:38809] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ white           : num [1:38809] 1 1 1 1 1 1 1 1 1 1 ...