The first step in any kind of data analysis in R is to load the data, that is, to import a dataset into the R Environment. There are variety of data files you can import into R, which includes:
- SAS
- SPSS
- MATLAB
- XLSX
- XLS
- text delim files (csv,tsv etc..)
- and many more..
Among all the files types used to store the data, perhaps the most widely used on is CSV. In a CSV file, first line is header of columns, and each subsequent line represents a data record separated by commas. In this article, we will learn to import the CSV file into R environment. So let’s get started.
CSV File
In a typical CSV file, the first line is the header of columns, and each subsequent line represents a data record with columns separated by commas. Here is an example.
Name, Gender, Age, Major
Ken,Male,24,Finance
Ashley,Female,25,Statistics
Jennifer,Female,23,Computer Science
Importing Data using built-in function
The simplest built-in function to import CSV file is read.csv()
function.
read.csv("C:\\...Your path...\\students_data.csv",stringsAsFactors = FALSE)
Please note, you can include double backward slash or single forward slash to mention the file path in a function. All the file path must be included into double quotes and file name must end with the file extension.
Technically, CSV format is delimited data format that uses a comma (,) to separate columns and a new line to separate rows. More generally speaking, any character can be column separator and row separator. In this case more general version read.table()
function is used.
read.table("C:\\...Your path...\\students_data.csv",sep = ",",header=TRUE)
The sep
argument is used to mention the type of delimiter used in the file. This is field separator character. Values on each line of the file are separated by this character. The argument header=TRUE is used to show the names of variables as its first line.
Importing CSV file using readr package
The readr
package is another good choice to import tabular data in fast and consistent manner. Please install the package readr
and then you can use read_* family of functions to import the data.
read_csv("C:\\...Your path...\\students_data.csv")
There are additional handy arguments from read_csv()
function you can use while importing the data. These arguments are:
- n_max: The maximum number of rows to read
- skip: Number of lines to skip before reading the data
- na: strings to interpret the missing values
Here is an example:
read_csv("C:\\...Your path...\\students_data.csv",skip=1,n_max=2,na="empty",col_names = FALSE)
In this example we used:
- skip = 1, to skip the first row of the data set
- n_max = 2, to read max. 2 rows from the data set
- na = “empty”, to show the missing value observations as empty cell
- col_names = FALSE to show first row of the data set as not column name.
Sometimes, the data set comes in irregular format. The file content looks quite standard and tabular, but the number of spaces between each column is unequal across rows. In this case we can use read_table()
function, and this function is smart enough to figure out the irregularities in the data file.
The function in readr
are fast, and consistent and support the features of the built-in-functions which are much easier to use.
Writing data to CSV file
A typical procedure in data analysis is importing data from a data source, transforming the data, applying appropriate tools and models, and finally creating some new data to be stored for decision making. The interface for writing data to file is similar to that for reading data – we use write.* functions to export data frame to file.
write.csv(your_data,"C:\\...Your path...\\write1.csv")
The write.csv()
function allows us to modify the writing behavior. From the preceding output, we can notice there are some unnecessary components in it. For e.g. we don’t usually want the row names to be exported. We don’t need quotation marks around string values. To proceed, we can run the following code to export same data frame with the behavior and standard we want.
write.csv(your_data,"C:\\...Your path...\\write1.csv",quote = FALSE,row.names = FALSE)
Now the output is simplified CSV file.
Summary
In this article, we looked at some important function to import and export the CSV file. We learned:
- Importing CSV file using
read.csv()
function - Importing CSV file using
readr
package - Using various options while importing CSV file using
readr
package. - Writing CSV file using
base
write.csv( ) function.