Handling Date and Time data in R – Part 2

This is the second article of two part article series on handling date and time data in R. You can read the first article here.

When date and time data are imported into R they will often default to a character string. This requires us to convert strings to dates. We may also have multiple strings that we want to merge to create a date variable. The time series data comes in various formats, so we need to first get it transformed into a unified structure to carry out further analysis.

In this article we will focus on various different ways by which you can manipulate the time series data. We will learn:

  • Converting strings to date using various formatting options
  • Extracting date and time component from the date-time object
  • Performing arithmetic operations on date-time objects

Let’s get started!

If the string is already in ISO 8601 format, we can convert it into date object using as.Date function.

date1 <- c("2015-07-01", "2015-08-01", "2015-09-01")
date2 <- as.Date(date1)
date2
[1] "2015-07-01" "2015-08-01" "2015-09-01"

class(date2)
[1] "Date"

Note that the default date format is YYYY-MM-DD; therefore, if your string is of different format you must incorporate the format argument. In real world, date/time data may come in all types of weird formats. Below is a sample.

  • December 12, 2023
  • 12th Dec, 2023
  • Dec 12th, 23
  • 12-Dec-2023
  • 2023 December
  • 12.12.23

There are multiple formats that dates can be in; for a complete list of formatting code options in R type ?strftime in your console

When the data is not in the default ISO 8601 format, we need to explicitly specify the format in R. We do this using conversion specifications. A conversion specification is introduced by %, usually followed by a single letter.

Conversion Specification

Lets’ work through few examples.

Let us say you are dealing with dates in the format 23/12/12. In this format, the year comes first followed by month and the date; each separated by a slash (/). The year consists of only 2 digits i.e. it does not include the century. Let us now map each component of the date to the conversion specification table shown at the beginning.

Conversion Specification Example

Using the format argument, let’s specify the date using above mapping.

as.Date("23/12/12", format = "%y/%m/%d")
[1] "2023-12-12"

Another way in which the release data can be written is 2019-Dec-12. We still have the year followed by the month and the date but there are a few changes here:

  • the components are separated by a − instead of /
  • year has 4 digits, i.e. it includes the century
  • the month is specified using abbreviation instead of digits.

Let us map the components to the format table:

Conversion Specification Example-1

Let us specify format for the date using above mapping.

as.Date("2019-Dec-12", format = "%Y-%b-%d")
[1] "2019-12-12"

In the above examples, we have not dealt with time components. Let us include the time i.e. 23/12/12 10:23:32

Conversion Specification – Time component

Since we are dealing with the time, we will use as.POSIXct() function instead of as.Date()

as.POSIXct("23/12/12 10:23:32", format = "%y/%m/%d %H:%M:%S")
[1] "2023-12-12 10:23:32 IST"

By default, POSIXct will consider the local time zone. You can change the time zone using tz argument.

as.POSIXct("23/12/12 10:23:32", format = "%y/%m/%d %H:%M:%S",
tz = "UTC")
[1] "2023-12-12 10:23:32 UTC"

UTC stands for Coordinated Universal Time. It is a standard used to establish time zones worldwide.

In this section we will look at various functions to extract the date and time components. We will learn to extract components such as:

  • year
  • month
  • week
  • day
  • hour
  • minute

To extract date and time components, use format function with appropriate conversion specification. Check ?strftime for more information on conversion specification.

example1 <- as.Date("2019-Dec-12", format = "%Y-%b-%d")
example2 <- as.Date("23/12/12", format = "%y/%m/%d")
example3 <- as.POSIXct("23/12/23 10:23:32", format = "%y/%m/%d %H:%M:%S")

Extract Year from example1

format(example1, "%Y")
[1] "2019"

Extract month from example2

format(example2, "%m")
[1] "12"

Extract day from example3

format(example3, "%d")
[1] "23"

Extract week from example2

format(example2, "%U")
[1] "50"

Extract hour from example3

format(example3, "%H")
[1] "10"

Extract minute from example3

format(example3, "%M")
[1] "23"

As we said earlier, POSIXlt stores date/time components in a list and these can be extracted.
Let us look at the date/time components returned by POSIXlt using unclass().

example3 <- as.POSIXlt("23/12/23 10:23:32", format = "%y/%m/%d %H:%M:%S")
unclass(example3)

$sec
[1] 32
$min
[1] 23
$hour
[1] 10
$mday
[1] 23
$mon
[1] 11
$year
[1] 123
$wday
[1] 6
$yday
[1] 356
$isdst
[1] 0
$zone
[1] "IST"
$gmtoff
[1] NA
attr(,"tzone")
[1] "" "IST" "+0630"
attr(,"balanced")
[1] TRUE

Use unlist() if you want list to be returned as vector.

unlist(example3)
sec min hour mday mon year wday yday isdst zone gmtoff
"32" "23" "10" "23" "11" "123" "6" "356" "0" "IST" NA

To extract specific component, we can use $ operator.

# extract hour
example3$hour
[1] 10

# extract month
example3$mon
[1] 11

Both Date and POSIXct R objects are represented by simple numerical values under the hood. This makes calculation with time and date objects very straightforward: R performs the calculations using the underlying numerical values, and then converts the result back to human-readable time information again.

Increment date object:

as.Date("23/12/12", format = "%y/%m/%d") + 2
[1] "2023-12-14"

Decrement date object:

as.Date("23/12/12", format = "%y/%m/%d") - 4
[1] "2023-12-08"

Date difference:

# Note, two date objects may not be in the same date format.
day1 <- as.Date("23/12/12", format = "%y/%m/%d")
day2 <- as.Date("2023-Dec-08", format = "%Y-%b-%d")

day1 - day2
Time difference of 4 days

Calculations using POSIXct objects are completely analogous to those using Date objects.

login_time <- as.POSIXct(c("2023-03-14 11:08:32", "2023-03-14 14:37:23"))
logout_time <- as.POSIXct(c("2023-03-14 11:12:51", "2023-03-14 15:31:23"))
time_online <- logout_time - login_time
time_online

Time differences in mins
[1] 4.316667 54.000000

The average time spent online is:

mean(time_online)
Time difference of 29.15833 mins

In this article we looked at various different ways by which we can manipulate the time series data. We learned:

  • How to convert character string into date object.
  • How to use conversion specifications to convert character string with non-standard date format
  • How to perform arithmetic on date-time object. We can carry out date arithmetic in much the same way as we do with number arithmetic

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top