Introduction to tidyR Package

Many real world dats-sets have missing values in it. There could be number of reasons for having missing values in a data-set, such as:
 - The data values are not recorded, may be due to equipment malfunction
 - The data values are not available. for e.g. in a survey, some questions are not mandatory, hence respondents have not answered to those questions.
 - Sometimes data analyst decides to drop few outlier values to remove the bias in the data analysis.

There could be many such reasons for having missing values in data-set. Hence as a Data Analyst, it is your job to tackle these missing values. Fortunately, tidyr package has very powerful functions to tackle the missing values.
In this video, we will look at some functions from tidyR package that are designed to handle the missing values.
 - complete.cases ( ) function scans all the variables in data-frame and drops all the rows with the missing values.

 - drop_na ( ) function scans the perticular column from data frame and drops all the rows with the missing values.

 - replace_na ( ) function replaces a missing values with any non-missing value.

 - fill ( ) function imputes the missing values with the most recent previous/next non-missing value.

We will work on flights data set. 

The Rmarkdown file is available on Github Repo:
https://github.com/siddharth-sahasrabudhe/tidyR-Package-videos/commit/7d0ff4ef5f225c0d9ade3be4be0448ba727d052e

Many real world dats-sets have missing values in it. There could be number of reasons for having missing values in a data-set, such as:
– The data values are not recorded, may be due to equipment malfunction
– The data values are not available. for e.g. in a survey, some questions are not mandatory, hence respondents have not answered to those questions.
– Sometimes data analyst decides to drop few outlier values to remove the bias in the data analysis.

There could be many such reasons for having missing values in data-set. Hence as a Data Analyst, it is your job to tackle these missing values. Fortunately, tidyr package has very powerful functions to tackle the missing values.
In this video, we will look at some functions from tidyR package that are designed to handle the missing values.
– complete.cases ( ) function scans all the variables in data-frame and drops all the rows with the missing values.

– drop_na ( ) function scans the perticular column from data frame and drops all the rows with the missing values.

– replace_na ( ) function replaces a missing values with any non-missing value.

– fill ( ) function imputes the missing values with the most recent previous/next non-missing value.

We will work on flights data set.

The Rmarkdown file is available on Github Repo:
https://github.com/siddharth-sahasrabudhe/tidyR-Package-videos/commit/7d0ff4ef5f225c0d9ade3be4be0448ba727d052e

0 0

YouTube Video UExCdnFrRHlDc2x4eWw0RjJiWjViZVNDbmxnSm1Ha2xOVi4xMkVGQjNCMUM1N0RFNEUx

tidyR Package| How to handle missing values?

Plotly Analytics – Giving Life to Data November 28, 2023 4:45 pm

Scroll to Top