This is the second article of two part article series on handling date and time data in R. You can read the first article here.
When date and time data are imported into R they will often default to a character string. This requires us to convert strings to dates. We may also have multiple strings that we want to merge to create a date variable. The time series data comes in various formats, so we need to first get it transformed into a unified structure to carry out further analysis.
In this article we will focus on various different ways by which you can manipulate the time series data. We will learn:
- Converting strings to date using various formatting options
- Extracting date and time component from the date-time object
- Performing arithmetic operations on date-time objects
Let’s get started!
Convert strings to date
If the string is already in ISO 8601 format, we can convert it into date object using as.Date
function.
date1 <- c("2015-07-01", "2015-08-01", "2015-09-01")
date2 <- as.Date(date1)
date2
[1] "2015-07-01" "2015-08-01" "2015-09-01"
class(date2)
[1] "Date"
Note that the default date format is YYYY-MM-DD
; therefore, if your string is of different format you must incorporate the format
argument. In real world, date/time data may come in all types of weird formats. Below is a sample.
- December 12, 2023
- 12th Dec, 2023
- Dec 12th, 23
- 12-Dec-2023
- 2023 December
- 12.12.23
There are multiple formats that dates can be in; for a complete list of formatting code options in R type ?strftime
in your console
When the data is not in the default ISO 8601 format, we need to explicitly specify the format in R. We do this using conversion specifications. A conversion specification is introduced by %, usually followed by a single letter.
Lets’ work through few examples.
Let us say you are dealing with dates in the format 23/12/12. In this format, the year comes first followed by month and the date; each separated by a slash (/). The year consists of only 2 digits i.e. it does not include the century. Let us now map each component of the date to the conversion specification table shown at the beginning.
Using the format argument, let’s specify the date using above mapping.
as.Date("23/12/12", format = "%y/%m/%d")
[1] "2023-12-12"
Another way in which the release data can be written is 2019-Dec-12. We still have the year followed by the month and the date but there are a few changes here:
- the components are separated by a − instead of /
- year has 4 digits, i.e. it includes the century
- the month is specified using abbreviation instead of digits.
Let us map the components to the format table:
Let us specify format for the date using above mapping.
as.Date("2019-Dec-12", format = "%Y-%b-%d")
[1] "2019-12-12"
In the above examples, we have not dealt with time components. Let us include the time i.e. 23/12/12 10:23:32
Since we are dealing with the time, we will use as.POSIXct() function instead of as.Date()
as.POSIXct("23/12/12 10:23:32", format = "%y/%m/%d %H:%M:%S")
[1] "2023-12-12 10:23:32 IST"
By default, POSIXct will consider the local time zone. You can change the time zone using tz argument.
as.POSIXct("23/12/12 10:23:32", format = "%y/%m/%d %H:%M:%S",
tz = "UTC")
[1] "2023-12-12 10:23:32 UTC"
UTC stands for Coordinated Universal Time. It is a standard used to establish time zones worldwide.
Extracting Date and Time component
In this section we will look at various functions to extract the date and time components. We will learn to extract components such as:
- year
- month
- week
- day
- hour
- minute
To extract date and time components, use format function with appropriate conversion specification. Check ?strftime for more information on conversion specification.
example1 <- as.Date("2019-Dec-12", format = "%Y-%b-%d")
example2 <- as.Date("23/12/12", format = "%y/%m/%d")
example3 <- as.POSIXct("23/12/23 10:23:32", format = "%y/%m/%d %H:%M:%S")
Extract Year from example1
format(example1, "%Y")
[1] "2019"
Extract month from example2
format(example2, "%m")
[1] "12"
Extract day from example3
format(example3, "%d")
[1] "23"
Extract week from example2
format(example2, "%U")
[1] "50"
Extract hour from example3
format(example3, "%H")
[1] "10"
Extract minute from example3
format(example3, "%M")
[1] "23"
As we said earlier, POSIXlt
stores date/time components in a list and these can be extracted.
Let us look at the date/time components returned by POSIXlt
using unclass()
.
example3 <- as.POSIXlt("23/12/23 10:23:32", format = "%y/%m/%d %H:%M:%S")
unclass(example3)
$sec
[1] 32
$min
[1] 23
$hour
[1] 10
$mday
[1] 23
$mon
[1] 11
$year
[1] 123
$wday
[1] 6
$yday
[1] 356
$isdst
[1] 0
$zone
[1] "IST"
$gmtoff
[1] NA
attr(,"tzone")
[1] "" "IST" "+0630"
attr(,"balanced")
[1] TRUE
Use unlist()
if you want list to be returned as vector.
unlist(example3)
sec min hour mday mon year wday yday isdst zone gmtoff
"32" "23" "10" "23" "11" "123" "6" "356" "0" "IST" NA
To extract specific component, we can use $ operator.
# extract hour
example3$hour
[1] 10
# extract month
example3$mon
[1] 11
Date and Time Arithmetic
Both Date
and POSIXct
R objects are represented by simple numerical values under the hood. This makes calculation with time and date objects very straightforward: R performs the calculations using the underlying numerical values, and then converts the result back to human-readable time information again.
Increment date object:
as.Date("23/12/12", format = "%y/%m/%d") + 2
[1] "2023-12-14"
Decrement date object:
as.Date("23/12/12", format = "%y/%m/%d") - 4
[1] "2023-12-08"
Date difference:
# Note, two date objects may not be in the same date format.
day1 <- as.Date("23/12/12", format = "%y/%m/%d")
day2 <- as.Date("2023-Dec-08", format = "%Y-%b-%d")
day1 - day2
Time difference of 4 days
Calculations using POSIXct
objects are completely analogous to those using Date objects.
login_time <- as.POSIXct(c("2023-03-14 11:08:32", "2023-03-14 14:37:23"))
logout_time <- as.POSIXct(c("2023-03-14 11:12:51", "2023-03-14 15:31:23"))
time_online <- logout_time - login_time
time_online
Time differences in mins
[1] 4.316667 54.000000
The average time spent online is:
mean(time_online)
Time difference of 29.15833 mins
Summary
In this article we looked at various different ways by which we can manipulate the time series data. We learned:
- How to convert character string into date object.
- How to use conversion specifications to convert character string with non-standard date format
- How to perform arithmetic on date-time object. We can carry out date arithmetic in much the same way as we do with number arithmetic