Summarizing your data, either numerically or graphically, is an important component of any data analysis. Fortunately, R has excellent graphics capabilities and can be used whether you want to produce plots for initial data exploration, model validation or highly complex publication quality figures.

**Base R Graphics**

The base R graphics system is the original plotting system that comes when you install R. When creating plots with base R we tend to use high level functions (like the `plot()`

function) to first create our plot and then use one or more low level functions (like `lines()`

and `text()`

etc) to add additional information to these plots.

You can provide a wide variety of objects to the `plot()`

function and R will “magically” present something that makes sense for that particular object. To illustrate this point, we will plot two very basic and simple objects – a vector and data frame.

```
x <- c(1:20)^2
plot(x)
```

When the numeric vector is provided to `plot`

function, it will generate scatter plot for a vector. The x-axis is the index of number in a vector and y-axis is the value of the corresponding number in vector.

```
df <- data.frame("a"=x,"b"=1/x,"c"=log(x),"d"=sqrt(x))
plot(df)
```

When data frame is provided to `plot`

function, the plot function will generate matrix of scatter plots of each column against every other column of a data frame. The main diagonal of matrix has names of columns.

`plot`

function will generate:

- scatter plot, if numeric variable is supplied as an input
- bar plot, if one factor object is supplied as input
- box-plot, if one factor and one numeric variable is supplied as in input
- matrix scatter plot, if data frame is supplied as an input

With this basic introduction to base R graphics system, let’s delve into the scatter plots.

**Scatter Plots**

Scatter plot is the most basic chart type you can think of. They show points plotted on the Cartesian Plane (X-Y Axis) Each point represents the combination of two variables. One variable is chosen in the horizontal axis and another in the vertical axis. Scatter plots are widely used to check the relationship between two variables.

The function used for scatter plot:

`plot(x,y,xlab,ylab,xlim,ylim,pch)`

where,`x`

: the data for horizontal axis`y`

: the data for vertical axis`main`

: the title of the graph`xlab`

: the title of x-axis`ylab`

: the title of y-axis`xlim`

: the range of values on x axis`ylim`

: the range of values on y axis`pch`

: the display symbol

Let’s generate scatter plot for `mtcars`

data set. We want to look at the relationship between engine horse-power (hp) and miles per gallon(mpg) variable. To plot a scatter plot of one numeric variable against another numeric variable we just need to include both variables as arguments when using the `plot()`

function.

`plot(x=mtcars$mpg,y=mtcars$hp)`

The `hp`

variable name is automatically set to Y-axis and `mpg`

variable name is automatically set to X-axis. The scales have been also automatically set.

Looking at scatter plot, you can quickly figure out the negative relation between engine hp and miles per gallon variables. As engine size increases, the average fuel consumption reduces.

You can also use formula notation when using `plot()`

function. However, in formula method you need to specify the y-axis variable first, then ~ and then x-axis variable.

`plot(mtcars$hp ~ mtcars$mpg)`

**Adding layers to scatter plot**

Once the basic scatter plot is ready, we can add different `layers`

to it. These `layers`

are used to add title, colors and legends to scatter plot.

We will add X-Axis and Y-Axis legend and give title to the scatter plot.

```
plot(x=mtcars$mpg,y=mtcars$hp, xlab="Miles Per Gallon",ylab="Engine Horsepower"
,main="Miles per Gallon Vs Engine Horsepower")
```

With the argument `pch`

(short form for “plot character”), it is possible to change the symbol that is displayed on the scatter plot. Integer values 0 to 25 specify a symbol as shown in the figure below.

It is possible to change the color via `col`

argument.

```
plot(x=mtcars$mpg,y=mtcars$hp, xlab="Miles Per Gallon",ylab="Engine Horsepower"
,main="Miles per Gallon Vs Engine Horsepower",
pch=2,col="red")
```

**Adding Regression line in scatter plot**

A regression line is a straight line that describes how two numeric variables change with respect to each other. This is used to predict the value of y for a given value of x. Adding regression line to scatter plot clearly shows the nature of relationship between two variables.

For drawing regression line, we need two functions:`abline()`

function to draw straight line through scatter plot`lm()`

function, which stands for *linear model* is used to create simple linear model.

```
plot(x=mtcars$mpg,y=mtcars$hp, xlab="Miles Per Gallon",ylab="Engine Horsepower"
,main="Miles per Gallon Vs Engine Horsepower",
pch=2,col="red")
abline(lm(mtcars$hp~mtcars$mpg,data=mtcars),col='blue')
```

**Summary**

The base R graphics system is the original plotting system that comes when you install R. The base R graphics is built on generic `plot()`

function, which generates the visualizations depending on the nature of object provided to it.

A scatter plot uses points to represent values for two different numeric variables. Scatter plots are used to observe relationships between variables. We also looked at various options to customize the scatter plot.

Finally, we used a linear regression line to represent relationship between two variables.