OkCupid Binary Predictors

Source

Kim (2015), "OkCupid Data for Introductory Statistics and Data Science Courses", Journal of Statistics Education, Volume 23, Number 2. http://www.amstat.org/publications/jse/contents_2015.html

Kuhn and Johnson (2020), Feature Engineering and Selection, Chapman and Hall/CRC . https://bookdown.org/max/FES/ and https://github.com/topepo/FES

Value

okc_binary_train,okc_binary_test

data frame frames with 61 columns

Details

Data originally from Kim (2015) includes a training and test set consistent with Kuhn and Johnson (2020). Predictors include ethnicity indicators and a set of keywords derived from text essay data.

Examples

data(okc_binary) str(okc_binary_train)
#> tibble [38,809 × 61] (S3: tbl_df/tbl/data.frame) #> $ software : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ engineer : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ startup : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ tech : num [1:38809] 0 0 0 0 1 0 0 0 0 1 ... #> $ computers : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ engineering : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ computer : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ internet : num [1:38809] 0 0 0 0 1 0 0 0 0 0 ... #> $ technology : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ science : num [1:38809] 0 0 0 0 1 0 0 0 0 0 ... #> $ programming : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ... #> $ technical : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ web : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ developer : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ im : num [1:38809] 1 0 1 0 1 1 0 1 1 0 ... #> $ programmer : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ... #> $ scientist : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ code : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ stephenson : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ geek : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ... #> $ nerd : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ... #> $ lol : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ biotech : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ matrix : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ coding : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ geeky : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ... #> $ solving : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ problems : num [1:38809] 0 0 1 0 1 0 0 0 0 0 ... #> $ data : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ fixing : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ teacher : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ student : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ silicon : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ law : num [1:38809] 0 0 0 0 0 0 0 1 0 0 ... #> $ mechanical : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ electronic : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ... #> $ pratchett : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ wikipedia : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ neal : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ mobile : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ math : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ lab : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ... #> $ systems : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ electronics : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ futurama : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ alot : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ solve : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ websites : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ firefly : num [1:38809] 0 0 0 0 0 0 0 0 0 1 ... #> $ valley : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ apps : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ lawyer : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ asian : num [1:38809] 1 0 0 0 0 0 0 0 0 0 ... #> $ black : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ hispanic_latin : num [1:38809] 0 0 0 0 0 0 0 1 0 0 ... #> $ indian : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ middle_eastern : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ native_american : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ other : num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ pacific_islander: num [1:38809] 0 0 0 0 0 0 0 0 0 0 ... #> $ white : num [1:38809] 1 1 1 1 1 1 1 1 1 1 ...