Four distinct Bar Plot representations in R

Wind pattern in sand, desert near Jaisalmer, Rajasthan, India

Bar charts are used to show the comparison of numeric values of a categorical variable. For e.g. sale of different products in financial year. Here Sale is numeric value and product name is categorical value.

The height of bar is proportional to the value of the variable.

Bar charts are also used to show the frequency of observed values of the variable. For e,g, in the mtcars data set, how many vehicles have 4,6 and 8 cylinder engines, can be shown using bar chart.

Bar plots are so flexible that they can be represented in four different ways, conveying a same information into variety of different perspective. In this article we will look at

  • How to create a simple bar chart?
  • How to create stacked bar chart?
  • How to create proportionate bar chart?
  • How to create grouped bar chart?
  • The difference between Bar Chart and Histogram

We will work with mtcars data set. Let’s get started.

How to draw simple bar chart?

We will use barplot() function to generate bar plot.

barplot(H, xlab, ylab, main, names.org= , col=)

H is vector or matrix containing numeric values used in chart
xlab the label for X-axis
ylab the label for Y-axis
main is the title of the chart
names.arg is vector of names for each bar. But note that the length of the names vector must be the same as the length of the input vector (H).
col argument is used to give different colors to the bars.

Let’s start with simple bar chart. We will first generate a product sales data for the six months – Jan to June. Then using barplot function we will plot the month on month revenue of a product.

Month <- c("jan","feb","mar","apr","may","jun")
Revenue <- c(17,12,28,30,41,19)

barplot(Revenue, names.arg=Month,xlab="Month",ylab="Revenue",col = "khaki")

Simple Bar Chart

In bar chart, by default, numeric variable is always mapped to Y-axis and categorical variable to X-axis. You can change this layout using horiz=TRUE argument.

barplot(Revenue, names.arg=Month,xlab="Revenue",ylab="Month",
        col = "khaki",horiz = TRUE)

Horizontal Bar Chart

The bar chart is also used to show the frequency of the categorical variable in the data set. For e.g. in the mtcars data set, if we want to show the number of vehicles each with 4,6 and 8 cylinder engines, we will have to first aggregate the data as per the engine cylinders.

We can prepare contingency table, which aggregates the data and gives us the count of observations against each category. table() generates contingency table. Since we want the count of observations as per engine cylinder category, we will provide cyl variable as an argument to the table function.

table(mtcars$cyl)

 4  6  8 
11  7 14 

By looking at the data, we can observe there are 11 vehicles having 4 cylinder engine, 7 vehicles having 6 cylinder engine and 14 vehicles having 8 cylinder engine. Let’s use this information to plot the bar chart.

table(mtcars$cyl)

barplot(table(mtcars$cyl), names.arg =c("Four Cylinder","Six Cylinder","Eight cylinder"),xlab="Engine Cylinders",ylab="Number of cars",
        col = "khaki",ylim = c(0,20))

Simple Bar Chart – mtcars dataset

Stacked bar chart

We can extend the basic bar chart to stacked bar chart.

In stacked bar chart, each bar is broken down into sub-parts and all the parts together make up the whole. For e.g. if each bar represents a product revenue as a whole, then we can break the product revenue as per the geography – north, south, east and west region.

So now each bar is broken down into the revenue from different geographies and revenue from all the geographies put together is the total revenue of that particular product.

product_matrix <- matrix(sample(50:100,20,replace=FALSE),nrow=4,ncol=3)

rownames(product_matrix) <- c("North", "East", "west", "south")
colnames(product_matrix) <- c("Product 1", "Product 2", "Product 3")
product_matrix

      Product 1 Product 2 Product 3
North        95        83        84
East         56        97        54
west         85        99        60
south        94        81        55

We can produce stacked bar chart for this product matrix. In order to draw a stacked bar chart, we must use matrix object.

barplot(product_matrix,names.arg=colnames(product_matrix),
        col=c("green","orange","blue","khaki"),ylim=c(0,450))

legend("topright",rownames(product_matrix),
       fill=c("green","orange","blue","khaki"))

Stacked Bar Chart

the legend function adds legend to the plot on top-right hand corner position. The legend names should match as per the stacks in each column category. The Y-axis limits are changed using ylim argument to accommodate the position of the legend.

For mtcars data set, we can generate stacked bar chart by breaking down car engine cylinder category into automatic and manual transmission. How many cars are there with manual transmission within four cylinder engine category. To answer this question, we will have to prepare contingency table for two variables – cyl and am

engine_config <- table(mtcars$cyl,mtcars$am)
engine_config

    0  1
  4  3  8
  6  4  3
  8 12  2

Among all 4 cylinder engine vehicles, there are 3 vehicles with Auto transmission and 8 vehicles with manual transmission. We will convert contingency table into matrix and use it as the input to barplot function.

engine_config <- matrix(table(mtcars$cyl,mtcars$am),nrow=3,ncol=2)

rownames(engine_config) <- c("Four Cylinder","Six Cylinder","Eight cylinder")
colnames(engine_config) <- c("Auto transmission", "Manual Transmission")

barplot(engine_config,names.arg=colnames(engine_config),
        col=c("green","orange","blue"))

legend("topright",rownames(engine_config),fill=c("green","orange","blue"))

Stacked Bar Chart – mtcars dataset

Proportionate bar plot

The proportionate bar plot has all bars to equal length.(all bars with height 1). Since all bars have same height, the proportionate bar plot is used to check the proportion of each sub-category across different categorical variables.

We will use prop.table() function on engine_config object to generate conditional proportions.

prop.table(engine_config,2)

               Auto transmission Manual Transmission
Four Cylinder          0.1578947           0.6153846
Six Cylinder           0.2105263           0.2307692
Eight cylinder         0.6315789           0.1538462

The total proportion of vehicles representing each type of transmission system is 100%. This is called as conditional proportion, the condition is given by Margin=2 argument in prop.table function. Margin=2 condition is used to calculate the columnwise proportions on engine_config object. Margin=1 condition is used to calculate row-wise proportions.

barplot(prop.table(engine_config,2),names.arg=colnames(engine_config),
        col=c("green","orange","blue"))
legend("topright",rownames(engine_config),fill=c("green","orange","blue"))

Proportionate Bar Chart

From the proportionate bar chart, it’s not hard to determine that the majority of cars with automatic transmissions have eight-cylinder engines.

Grouped Bar chart

It is also possible to create a grouped bar chart from a stacked bar chart.The categories within each bar in a grouped bar chart are not stacked; instead, they are placed next to one another.

We will have to pass on additional argument called beside = TRUE This will get us grouped bar chart.

barplot(engine_config,names.arg=colnames(engine_config),
        col=c("green","orange","blue"),beside=TRUE)
legend("topright",rownames(engine_config),fill=c("green","orange","blue"))

Grouped Bar Chart

Difference between Bar chart and Histogram

A histogram is a sort of bar chart that shows the frequency distribution of continuous data by using bars to represent statistical information.

A bar chart is used to compare the frequency, or total count, of data in several categories. Bar chart is used to compare the categorical or discrete data.

Since histogram is used to show the distribution of continuous variable, you can see in histogram, that there are no gaps between the bars.

In case of bar plot, it is used on discrete variable, so you will find there is a gap between each bar.

Summary

Bar charts are used to show the comparison of numeric values of a categorical variable. Bar charts are also used to show the frequency of observed values of the variable.

In this article we examined various options in representing a bar chart. The bar chart can be shown as:

  • Simple bar chart
  • Stacked bar chart
  • Proportionate bar chart
  • Grouped bar chart

These various formats of bar chart are demonstrated using mtcars data set. In the end, we also discussed briefly the difference between Bar chart and Histogram

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top