Menu
Statistics and Probability
In my last article we learned how to estimate population mean by using sample mean when population standard deviation is known. In most instances, the population standard deviation will be unknown and...
What is point estimation and what is its fundamental drawback? The use of single sample value such as X_bar (sample mean value) to estimate the population value is known as point estimation, because...
Many statistical tests assume that the data is normally distributed. Hence if the underlying data is not normal, we need to transform a data to make it near normal before we apply these tests. The...
The basic idea of inferential statistics is to use a statistic (mean,Standard Deviation etc.) calculated on a sample in order to estimate a parameter of a population (mean,Standard Deviation etc.)...
In data analysis, we may obtain greater insight while expressing a variable in different form. For e.g. you could use different scale to better visualize the variables that have points close by. Many...
There are range of techniques that you can use to check if your data sample deviates from a Gaussian distribution, called normality tests. In this tutorial, we will learn some techniques that can be...
A gentle introduction to Sampling terms and definitions – A pre-requisite to inferential statistics.
In this tutorial we will get comfortable with some of the commonly used terms from the field of Sampling theory The clarity on these terms is required to understand the inferential statistical...
In this tutorial, we will have gentle introduction to normal distribution with real world example. We will generate normal distribution plot in R and learn some R functions to calculate the...
In this tutorial we will look at Poisson distribution characteristics, build Poisson distribution formula and look at some R functions to calculate the probability of occurrence using Poisson...
In this tutorial, we will understand the assumptions of binomial distribution, take a business example of binomial distribution, build the binomial distribution formula and use R to solve the problem...
In this tutorial we will look at two measures of relationship between two numeric variables: the covariance and coefficient of correlation...
More commonly used dispersion measures in statistics are variance and standard deviation. These measures give summary statistics, hence does not tell much about the overall data. A five number summary...
Descriptive statistics is used to describe the data. A first step in this process is to check the distribution of values of each numeric variable. In this tutorial, various tools for describing the...
In this post you will discover why statistics is important in general and for data science in particular, and types of methods that are available...