Menu
Most Business Managers use Data Analytics and Business Analytics terms interchangeably. However, there is subtle difference between these terms and subtleties are based on the end usage of the...
The first step in any kind of data analysis in R is to load the data, that is, to import a dataset into the R Environment. There are variety of data files you can import into R, which includes: Among...
When you wanted to import the text file in R, you must have used read.table() function. But do you know that you can pass on more than 20 different arguments to read.table() function? You must have...
When the new data is presented to you for the analysis, you would like to get first hand information on the data set before diving deep into analysis. This is similar to doing a warm up exercises...
When I speak to beginner data analysts, I could hear few miss-beliefs about data mining tools. I am writing this article to debunk some data mining fallacies. Here are the list of six Data Mining...
In my last article we learned how to estimate population mean by using sample mean when population standard deviation is known. In most instances, the population standard deviation will be unknown and...
What is point estimation and what is its fundamental drawback? The use of single sample value such as X_bar (sample mean value) to estimate the population value is known as point estimation, because...
In this article we will have high-level overview of the process of data science. We will look at different stages of data science work. The process of solving a data science problem is summarized in...
Many statistical tests assume that the data is normally distributed. Hence if the underlying data is not normal, we need to transform a data to make it near normal before we apply these tests. The...
The basic idea of inferential statistics is to use a statistic (mean,Standard Deviation etc.) calculated on a sample in order to estimate a parameter of a population (mean,Standard Deviation etc.)...
In data analysis, we may obtain greater insight while expressing a variable in different form. For e.g. you could use different scale to better visualize the variables that have points close by. Many...
There are range of techniques that you can use to check if your data sample deviates from a Gaussian distribution, called normality tests. In this tutorial, we will learn some techniques that can be...
A gentle introduction to Sampling terms and definitions – A pre-requisite to inferential statistics.
In this tutorial we will get comfortable with some of the commonly used terms from the field of Sampling theory The clarity on these terms is required to understand the inferential statistical...
In this tutorial, we will have gentle introduction to normal distribution with real world example. We will generate normal distribution plot in R and learn some R functions to calculate the...
In this tutorial we will look at Poisson distribution characteristics, build Poisson distribution formula and look at some R functions to calculate the probability of occurrence using Poisson...
In this tutorial, we will understand the assumptions of binomial distribution, take a business example of binomial distribution, build the binomial distribution formula and use R to solve the problem...
In this tutorial we will look at two measures of relationship between two numeric variables: the covariance and coefficient of correlation...
More commonly used dispersion measures in statistics are variance and standard deviation. These measures give summary statistics, hence does not tell much about the overall data. A five number summary...
Descriptive statistics is used to describe the data. A first step in this process is to check the distribution of values of each numeric variable. In this tutorial, various tools for describing the...
In this post you will discover why statistics is important in general and for data science in particular, and types of methods that are available...
R packages are extensions to the R programming language. R packages contain code, data, and documentation in a standardized collection format that can be installed by users of R, typically via a...
Machine learning, knowledge discovery from data and related areas experienced strong development in the 1990s. Both in academia and industry, the research on these topics was advancing quickly...
In R, a list is an ordered collection of objects, like vector, but lists can actually combine objects of different types. List elements can contain any type of object that exists in R...
Think back your first chemistry or biology lab course. As you entered into a lab, what was the first thing that were taught? It was not Chemistry or Biology. For most of us, the first instruction in...
The factor data type is used to represent character data. This character data, however takes a small number of distinct values. Each distinct value is represented by a integer code, which is called as...
A data frame represents a data with a number of rows and columns. Unlike matrix, data frames can contain variables with different data types, therefor Data Frames are heterogeneous...
In R, matrix is a vector with two additional attributes, the number of rows and the number of columns. Since vectors are the building blocks of matrices, like vectors, matrices are also constrained to...
The first step in learning R programming is getting familiar with basic R objects and their structure. The fundamental data object in R is a vector. In this article we will define the R objects...
This is the second post in the “Getting Started with R Programming” series. In the previous post, we discussed the processes for getting the R programme from the CRAN website. We also...
In this blog we will install R (for Windows and Mac OS) and have a quick tour of R environment...
R, as a programming language, has been evolving and developing over the last 20 years. Its goal is quite clear to make it easy and flexible to perform comprehensive statistical computing, data...
It is possible to do lot of data work using Excel, Tableau, or any other Business Intelligence tools that have graphical interfaces. These BI tools are known for taking any kind of data from almost...
Data mining is used to search for valuable information from vast amount of data collected over time. The information may be certain patterns or relationships that exists within a data. Businesses use...
Data science is the practice of using data to try to understand and solve real-world problems. If you’ve looked into the different areas of data science, you may be familiar with Drew Conway’s popular...
Without data, you’re just another person with an opinion – W. Edwards Deming, noted Statistician, Professor, Author, and Lecturer “The sexiest job of the 21st century.” Data scientist, a title...