Time Series Plots Using ggplot2
Site: | Saylor Academy |
Course: | PRDV420: Introduction to R Programming |
Book: | Time Series Plots Using ggplot2 |
Printed by: | Guest user |
Date: | Monday, October 21, 2024, 11:21 PM |
Description
Of course, the ggplot2 can also visualize time series. This section introduces the relevant ggplot2 syntax.
Basic ggplot of time series
- Plot types: line plot with dates on x-axis
- Demo data set:
economics
[ggplot2] time series data sets are used.
In this section we'll plot the variables psavert
(personal savings rate) and uempmed
(number of unemployed in thousands) by date
(x-axis).
- Load required packages and set the default theme:
library(ggplot2) theme_set(theme_minimal()) # Demo dataset head(economics)
## # A tibble: 6 x 6 ## date pce pop psavert uempmed unemploy #### 1 1967-07-01 507 198712 12.5 4.5 2944 ## 2 1967-08-01 510 198911 12.5 4.7 2945 ## 3 1967-09-01 516 199113 11.7 4.6 2958 ## 4 1967-10-01 513 199311 12.5 4.9 3143 ## 5 1967-11-01 518 199498 12.5 4.7 3066 ## 6 1967-12-01 526 199657 12.1 4.8 3018
- Create basic line plots
# Basic line plot ggplot(data = economics, aes(x = date, y = pop))+ geom_line(color = "#00AFBB", size = 2) # Plot a subset of the data ss <- subset(economics, date > as.Date("2006-1-1")) ggplot(data = ss, aes(x = date, y = pop)) + geom_line(color = "#FC4E07", size = 2)
- Control line size by the value of a continuous variable:
ggplot(data = economics, aes(x = date, y = pop)) + geom_line(aes(size = unemploy/pop), color = "#FC4E07")
Source: STHDA, http://www.sthda.com/english/articles/32-r-graphics-essentials/128-plot-time-series-data-using-ggplot/
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.
Plot multiple time series data
Here, we'll plot the variables psavert
and uempmed
by dates. You should first reshape the data using the tidyr
package: - Collapse psavert
and uempmed
values in the same column (new column). R function: gather()[tidyr]
- Create a grouping variable that with levels = psavert
and uempmed
library(tidyr) library(dplyr) df <- economics %>% select(date, psavert, uempmed) %>% gather(key = "variable", value = "value", -date) head(df, 3)
## # A tibble: 3 x 3 ## date variable value #### 1 1967-07-01 psavert 12.5 ## 2 1967-08-01 psavert 12.5 ## 3 1967-09-01 psavert 11.7
# Multiple line plot ggplot(df, aes(x = date, y = value)) + geom_line(aes(color = variable), size = 1) + scale_color_manual(values = c("#00AFBB", "#E7B800")) + theme_minimal()
# Area plot ggplot(df, aes(x = date, y = value)) + geom_area(aes(color = variable, fill = variable), alpha = 0.5, position = position_dodge(0.8)) + scale_color_manual(values = c("#00AFBB", "#E7B800")) + scale_fill_manual(values = c("#00AFBB", "#E7B800"))
Set date axis limits
Key R function: scale_x_date()
# Base plot with date axis p <- ggplot(data = economics, aes(x = date, y = psavert)) + geom_line(color = "#00AFBB", size = 1) p # Set axis limits c(min, max) min <- as.Date("2002-1-1") max <- NA p + scale_x_date(limits = c(min, max))
Format date axis labels
Key function: scale_x_date()
.
To format date axis labels, you can use different combinations of days, weeks, months and years:
- Weekday name: use
%a
and%A
for abbreviated and full weekday name, respectively - Month name: use
%b
and%B
for abbreviated and full month name, respectively %d
: day of the month as decimal number%Y
: Year with century.- See more options in the documentation of the function
?strptime
# Format : month/year p + scale_x_date(date_labels = "%b/%Y")
Add trend smoothed line
Key function: stat_smooth()
p + stat_smooth( color = "#FC4E07", fill = "#FC4E07", method = "loess" )
ggplot2 extensions for ts objects
The ggfortify
package is an extension to ggplot2 that makes it easy to plot time series objects (Horikoshi and Tang 2017).
It can handle the output of many time series packages, including:
zoo::zooreg(), xts::xts(), timeSeries::timSeries(), tseries::irts(),
forecast::forecast(), vars:vars().
Another interesting package is the ggpmisc
package (Aphalo 2017), which provides two useful methods for time series object:
stat_peaks()
finds at which x positions local y maxima are located, andstat_valleys()
finds at which x positions local y minima are located.
Here, we'll show how to easily:
- Visualize a time series object, using the data set
AirPassengers
(monthly airline passenger numbers 1949-1960). - Identify shifts in mean and/or variance in a time series using the
changepoint
package. - Detect jumps in a data using the
strucchange
package and the data setNile
(Measurements of the annual flow of the river Nile at Aswan). - Detect peaks and valleys using the
ggpmisc
package and the data setlynx
(Annual Canadian Lynx trappings 1821–1934).
First, install required R packages:
install.packages( c("ggfortify", "changepoint", "strucchange", "ggpmisc") )
Then use the autoplot.ts()
function to visualize time series objects, as follow:
library(ggfortify) library(magrittr) # for piping %>% # Plot ts objects autoplot(AirPassengers) # Identify change points in mean and variance AirPassengers %>% changepoint:: cpt.meanvar() %>% # Identify change points autoplot() # Detect jump in a data strucchange::breakpoints(Nile ~ 1) %>% autoplot()
Detect peaks and valleys:
library(ggpmisc) ggplot(lynx, as.numeric = FALSE) + geom_line() + stat_peaks(colour = "red") + stat_peaks(geom = "text", colour = "red", vjust = -0.5, x.label.fmt = "%Y") + stat_valleys(colour = "blue") + stat_valleys(geom = "text", colour = "blue", angle = 45, vjust = 1.5, hjust = 1, x.label.fmt = "%Y")+ ylim(-500, 7300)