+ - 0:00:00
Notes for current slide
Notes for next slide



Introduction to ggplot

Dr. Mine Dogucu

1 / 47
  • This specific lecture will not necessarily show you beautiful plots.
2 / 47
  • This specific lecture will not necessarily show you beautiful plots.

  • In this lecture we will focus on how ggplot works.

3 / 47
  • This specific lecture will not necessarily show you beautiful plots.

  • In this lecture we will focus on how ggplot works.

  • In the next lecture we will improve the plots that we make.

4 / 47

ggplot is based on grammar of graphics.

5 / 47

Data

glimpse(titanic)
## Rows: 891
## Columns: 6
## $ survived <lgl> FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TR…
## $ pclass <chr> "Third", "First", "Third", "First", "Third", "Third", "First"…
## $ sex <fct> sex, sex, sex, sex, sex, sex, sex, sex, sex, sex, sex, sex, s…
## $ age <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 14, 55,…
## $ fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.8625, 21…
## $ embarked <fct> Southampton, Cherbourg, Southampton, Southampton, Southampton…

The data frame has been cleaned for you.

6 / 47

Visualizing a Single Categorical Variable

7 / 47



If you could speak to R in English, how would you tell R to make this plot for you?

OR

If you had the data and had to draw this bar plot by hand, what would you do?

8 / 47



Possible ideas

  • Consider the data frame
  • Count number of passengers in each pclass
  • Put pclass on x-axis.
  • Put count on y-axis.
  • Draw the bars.

9 / 47



These ideas are all correct but some are not necessary in R

  • Consider the data frame
  • Count number of passengers in each pclass
  • Put pclass on x-axis.
  • Put count on y-axis.
  • Draw the bars.

R will do some of these steps by default. Making a bar plot with another tool might look slightly different.

10 / 47

3 Steps of Making a Basic ggplot

1.Pick data

2.Map data onto aesthetics

3.Add the geometric layer

11 / 47

Step 1 - Pick Data

ggplot(data = titanic)

12 / 47

Step 2 - Map Data to Aesthetics

ggplot(data = titanic,
aes(x = pclass))

13 / 47

Step 3 - Add the Geometric Layer

ggplot(data = titanic,
aes(x = pclass)) +
geom_bar()

14 / 47

  • Create a ggplot using the titanic data frame.
  • Map the pclass to the x-axis.
  • Add a layer of a bar plot.
ggplot(data = titanic,
aes(x = pclass)) +
geom_bar()
15 / 47

Visualizing a Single Numeric Variable

16 / 47
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  • Create a ggplot using the titanic data frame.
  • Map the fare to the x-axis.
  • Add a layer of a histogram.
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram()
17 / 47

Step 1 - Pick Data

ggplot(data = titanic)

18 / 47

Step 2 - Map Data to Aesthetics

ggplot(data = titanic,
aes(x = fare))

19 / 47

Step 3 - Add the Geometric Layer

ggplot(data = titanic,
aes(x = fare)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

20 / 47

What is this warning?

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

21 / 47
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram(binwidth = 15)

22 / 47

🌈

Pick your favorite color(s) from the list at:

bit.ly/colors-r

25 / 47
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram(binwidth = 15,
color = "white")

26 / 47
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram(binwidth = 15,
fill = "darkred")

27 / 47
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram(binwidth = 15,
color = "white",
fill = "darkred")

28 / 47

Visualizing Two Categorical Variables

29 / 47

Stacked Bar-Plot

ggplot(data = titanic,
aes(x = pclass,
fill = survived)) +
geom_bar()

30 / 47

Standardized Bar Plot

ggplot(data = titanic,
aes(x = pclass,
fill = survived)) +
geom_bar(position = "fill")

Note that y-axis is no longer count but we will learn how to change that later.

31 / 47

Dodged Bar Plot

ggplot(data = titanic,
aes(x = pclass,
fill = survived)) +
geom_bar(position = "dodge")

Note that y-axis is no longer count but we will change that later.

32 / 47

New Data

Artwork by @allison_horst

33 / 47

New Data

glimpse(penguins)
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex <fct> male, female, female, NA, female, male, female, male…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
34 / 47

Artwork by @allison_horst

35 / 47

Visualizing a single numerical and single categorical variable.

36 / 47
## Warning: Removed 2 rows containing non-finite values (stat_ydensity).

  • Create a ggplot using the penguins data frame.
  • Map the species to the x-axis and bill_length_mm to the y-axis.
  • Add a layer of a violin plot.
ggplot(penguins,
aes(x = species,
y = bill_length_mm)) +
geom_violin()
37 / 47

38 / 47

Visualizing Two Numerical Variables

39 / 47
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

40 / 47

Considering More Than Two Variables

41 / 47
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
color = species)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

42 / 47
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

43 / 47
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

44 / 47
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species,
color = species)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

45 / 47
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species,
color = species,
size = body_mass_g)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

46 / 47

47 / 47
  • This specific lecture will not necessarily show you beautiful plots.
2 / 47
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow