Results for individual bakers across all GBBO series. We have processed this data some to make it suitable for a "bakeoff".
Format
A data frame with 71 rows representing individual bakers and 16 variables:
- winners
was the baker in the final episode?
- series
an integer denoting UK series (1-8)
- age
an integer denoting age in years at first episode appeared
- occupation
a character string giving occupation
- hometown
a character string giving hometown
- percent_star
the percentage of episodes achieving star baker
- percent_technical_wins
percent of episodes the baker won the technical
- percent_technical_bottom3
percent of times a given baker was in the bottom 3 on the technical challenge
- percent_technical_top3
percent of times a given baker was in the top 3 (1st, 2nd, or 3rd) on the technical challenge
- technical_highest
an integer denoting the best technical rank earned by a given baker across all episodes appeared (higher is better)
- technical_lowest
an integer denoting the worst technical rank earned by a given baker across all episodes appeared (higher is better)
- technical_median
an integer denoting the median technical rank earned by a given baker across all episodes appeared (higher is better)
- judge1
the name of one of the judges
- judge2
the name of the other judge
- viewers_7day
number of viewers in millions within a 7-day window from airdate
- viewers_28day
number of viewers in millions within a 28-day window from airdate
An object of class tbl_df
(inherits from tbl
, data.frame
) with 71 rows and 16 columns.
An object of class grouped_df
(inherits from tbl_df
, tbl
, data.frame
) with 30 rows and 15 columns.
Source
This is a combination of two datasets in Allison Hill's bakeoff package.
Details
bakeoff_train
is the training set for use on the homework.
bakeoff_test
is the Test set for use on the homework.
It contains a held out set of 137 bakers
and omits the winners
column.
Examples
bakeoff_train
#> # A tibble: 71 × 16
#> winners series age occupation hometown percent_star percent_technical_wins
#> <lgl> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 FALSE 1 30 CEO of the… Northam… 0 0
#> 2 FALSE 1 45 Advertisin… Brackne… 0 0
#> 3 FALSE 1 25 Council Wo… London 0 0.333
#> 4 FALSE 1 51 Student Teignmo… 0 0
#> 5 FALSE 1 44 Student su… Sneinto… 0 0
#> 6 FALSE 1 48 IT program… Poynton… 0 0
#> 7 TRUE 1 31 Charity wo… Barton-… 0 0
#> 8 TRUE 2 31 Pastor Essex 0.25 0.25
#> 9 FALSE 2 40 Charity wo… West Ki… 0 0
#> 10 TRUE 2 41 Former sch… Woodfor… 0.125 0.375
#> # ℹ 61 more rows
#> # ℹ 9 more variables: percent_technical_bottom3 <dbl>,
#> # percent_technical_top3 <dbl>, technical_highest <dbl>,
#> # technical_lowest <dbl>, technical_median <dbl>, judge1 <chr>, judge2 <chr>,
#> # viewers_7day <dbl>, viewers_28day <dbl>
bakeoff_test
#> # A tibble: 30 × 15
#> series age occupation hometown percent_star percent_technical_wins
#> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 1 24 Assistant Credit C… Leicest… 0 0.333
#> 2 1 37 PA and administrat… Saltley… 0 0.333
#> 3 1 31 Student Swansea… 0 0
#> 4 2 19 Photographer Midhurs… 0.4 0.2
#> 5 2 31 IT programme manag… Radlett… 0 0.25
#> 6 2 63 Project engagement… Erith 0.143 0.143
#> 7 3 28 Dentist North L… 0 0.143
#> 8 3 36 Graphic designer Manches… 0 0
#> 9 3 23 Trainee anaestheti… Merseys… 0.1 0.1
#> 10 4 31 Intensive care con… Yeovil 0.1 0.1
#> # ℹ 20 more rows
#> # ℹ 9 more variables: percent_technical_bottom3 <dbl>,
#> # percent_technical_top3 <dbl>, technical_highest <dbl>,
#> # technical_lowest <dbl>, technical_median <dbl>, judge1 <chr>, judge2 <chr>,
#> # viewers_7day <dbl>, viewers_28day <dbl>