At present, no sector is untouched by the effects of data analytics. From aviation to manufacturing, retail to pharmaceuticals, gaming to education, every sector leverages data analysis to make informed decisions to achieve its goals.

Whether you are part of marketing team that needs to generate visuals to highlight industry trends, or you are want to generate a report showing detail analysis of sales data, you will need an analytics program to help you develop your reports and effectively communicate your findings.

There are many analytics tools available in the market, however, Microsoft Excel is such a tool that continues to be staple for many professionals. Excel is better at 99% of workloads you will typically see in an office environment. However, when it comes to data analysis, all managers, accountants, and analysts have to accept a gospel truth: their analytical needs had outgrown Excel. As the complexity of your analysis grows, Excel’s limitations are becoming more pronounced.

In this article, I want to recommend the use of programming language such as R as an alternative to Excel for both analysing and visualising data. There is no doubt that the learning curve for R is much steeper than producing one or two charts in spreadsheet. However, there are real long-term advantages to learning a dedicated data analysis tools such as R.

In this article, we will look at some of the differences between R and Excel.

This is part 1 of ongoing series on why you should use R. In the upcoming parts we will compare and contrast Excel and R using data analysis workflows such as:

- The Excel Vs R comparison in Data preparation
- The Excel Vs R comparison in Data Visualization
- The Excel Vs R comparison in presentation

Let’s start with a history of R and Excel

**The History of R and Excel**

**The history of Excel**

In 1987, Microsoft introduced Excel v2.0 for Windows. In 1993, Microsoft released Excel v5.0 for Windows which included VBA (Visual Basic for Applications), also known as Macros. This opened up almost unlimited possibilities in automation of repetitive tasks for crunching numbers, process automation, and presenting data for businesses. Today, Microsoft Excel is the most familiar, flexible, and widely used business application in the world due to its capability to adapt to almost any business process.

**What is R and its brief history?**

R is a programming language which makes it easy and flexible to perform comprehensive statistical computing, data exploration, and visualization.

R is not a programming language like C or Java. It was not created by software engineers for software development. Instead, it was developed by statisticians as an interactive environment for data analysis.

R is a dialect of S language. S is a language that was developed by researchers at Bell Laboratories in the 1960s and 1970s. S was developed as an statistical analysis environment, however it was commercialized as S-plus in 1990s.

With the view of embracing open source software, R was created in the early 1990s by University of Auckland statisticians Ross Ihaka and Robert Gentleman. R is now open source and freely available. You can download R from CRAN server – the Comprehensive R Archive Network. CRAN is network of servers around the world on which R .exe files are hosted.

**Difference between R and Excel**

When choosing between R and Excel, it is important to understand how either softwares will get you the results you need. Here are the key difference between R and Excel.

**Learning the software**

One substantial benefit of using Excel is its initial learning curve is quite minimal, and most analysis can be done via point-and-click on the top panel. Once user opens excel file, it is not difficult to perform basic arithmetic tasks and make basic charts.

R is a programming language, so for those who are not programmers, the initial learning curve is steeper. It will take at least a few weeks to familiarize themselves with the interface and master the various functions. Simple things are much harder to do in R than Excel. But it is worth learning R, because complex things can be much easier.

Because R is a programming language, the complex tasks are broken down into small manageable steps making it far more easier to work with complex analysis.

**Replicating the analysis**

“The result is replicable, when the same analysis is performed on different data sets produces qualitatively same results.”

One of the issues with Excel is that there is no transparency of analytical processes. It is extremely difficult to determine what actually is going on in a spreadsheet. Since Excel’s user interface is point-and-click, you’ll need to rely on memory and repetition frequently.

Let’s take a hypothetical example. You are working on sales data, which is Excel file. You import the file in Excel, apply the filters to extract out the data for 3 months, and generated the summary values using Auto-filter function and record the result in new tab. You selected the columns and generated a nice time-series plot to show the sales trend.

That’s all fine!

After six months…

A stakeholder comes and asks you to reproduce entire analysis again, will you able to remember the steps?

You are asked to reproduce the work with new data. The new data is sales data which is piled up for the six months since you have prepared your first analysis. How much re-work is involved?

What happens if you share your excel file to your friend. She makes “few changes in the analysis” and sends it back to you. How will you know the changes?

Let’s use the same scenario with R programming language.

You want to analyse the same sales data. You import the excel file. You extracted out the data for three months using time-series data extraction function. You used `summarise()`

function, which is one of many in-built functions in R to generate the statistical summary. And finally you generated a nice line chart to show the sales trend, which gives you complete control over appearance and design of a plot.

All the steps you performed are documented in code, which means you can explain all those steps you performed while generating the analysis.

When the new data comes in, you don’t have to re-build the entire analysis. You will have to change only ONE line of code – the line where you selected the file to read in.

In Excel, you cannot import codes and scripts as you would with R, so you’ll have to “reinvent the wheel” to perform the same analysis across different data sets. This can be a trivial issue if you are doing basic statistics, but it may become time-consuming with more complicated analyses. To some extent you could write macro in VBA and automate your workflow, however as complexity increases, it becomes monumental task to write and to maintain VBA code.

Imagine writing a VBA script to import multiple Excel files, clean the data, merge them, and create compelling data visualizations. The VBA script could easily become hundreds of lines long. In contrast, using R, you could accomplish the same tasks with just a few lines of code.

**Scalability**

Scalability refers to Excel’s and R’s ability to handle ever-increasing volumes of data. In Excel, as amount of data grows, you will first notice performance issues.The spreadsheet becomes slow to load and then slow to calculate. When Excel file is loaded, the entire file is loaded into RAM. So every time your data changes, Excel has to reload the entire document into RAM, making it slow to process even the smallest changes.

R is also sometimes slow in performance, because it is primarily designed to work in single thread operation. Hence by default, it only uses one core for a given process.

This limitation can be overcome by using several add-on packages in R that support multi-threaded processing. One such package is `data.table`

, which is optimized for working with large data sets and offers significantly faster computation than Excel. R can be easily integrated with external databases such as SQL. So instead of loading all your data into R, consider connecting directly to a database and extracting the data you need.

**Price and Community**

Excel costs at least $70 a year as part of Office 365, which also includes Outlook, Word, PowerPoint, and other software. You can download free add-ins for Excel also, including options to improve visualization, among others. Since Excel is part of Microsoft, you also gain access to their community of forums and technical help professionals.

R, however, is free to download. As R is open-source and open to the public, it has gathered a vibrant community, with many forums, websites, and blogs devoted to sharing resources and tips. If there’s something you want to do in R, someone is out there to help.

**Statistical Analysis**

If you want to run simple statistics and arithmetic quickly, Excel might be a better choice, since its easy point-and-click way to run the statistical functions.

R programming has its roots in statistics, hence can perform rigorous statistical models on your data. If you are looking to do anything beyond statistical analysis, such as regression, clustering or time series analysis, R may be better bet.

**Visualisation**

In Excel, you can quickly highlight a group of cells and make a simple chart for PowerPoint. If you need a more comprehensive graph, however, R can produce incredibly attractive , detailed visuals to generate the data insights. In Excel, it would be nearly impossible to produce anything like correlogram – as shown below. In R, this is produced using one line of code.

**Careers**

Data analytics is most in demand skill across variety of industries. Countless jobs are looking for applicants with at least some Excel experience, but knowing R has higher earning potential and is more in-demand than Excel.

R is one of the most popular programming languages and is an industry-standard for data analytics. If you want to enter the field, there’s a good chance you’ll have a competitive advantage by knowing R. Entry-level jobs for those focusing on R also tend to make a high salary, frequently starting off earning more than $75,000.

There are around 2 million R users globally. Learning R programming will allow you to join a specialist developer community. By acquiring R programming skills, you can differentiate your Analytics skills and stand out from a crowd. As per The PYPL Popularity of Programming Language Index (index is created by analyzing how often language tutorials are searched on Google.)**R ranks in top 10 most searched programming languages on Google.**Source:https://pypl.github.io/PYPL.html

**Summary**

R and Excel are beneficial in different ways. Excel is easier to learn and go-to program for reporting and basic analysis tasks. R is designed to handle more complex analysis tasks, create compelling visualizations and perform advance statistical models such as regression and clustering. In the end, it really depends on your analysis requirement to choose the right kind of tool.