A brief introduction to Linear Transformation

Indian Summer Painting. Image Source: The Metropolitan Museum of Art

In data analysis, we may obtain greater insight while expressing a variable in different form. For e.g. you could use different scale to better visualize the variables that have points close by.

Many statistical tests assume that the data is normally distributed. Hence if the underlying data is not normal, we need to transform a data to make it near normal before we apply these tests.

Sometimes two variables in the data-set are not comparable. For e.g. a height of a person can be measured in inches. The same height, without loss of information can be measured in meters or in feet instead. Hence we must transform these variables into a common scale before proceeding with the analysis.

A transformation is mathematical expression that defines one-to-one correspondence between the numeric systems. For e.g. a height in inches can be transformed into cms using a rule: 1 cm = 2.54*inches. Here every point on cm scale will be able to be matched uniquely to one and only one number on inch scale.

There are two types of transformation: linear transformation and nonlinear transformation. Linear transformation preserves the shape of original distribution, while in non-linear transformation, the shape of the original distribution is changed.

In this tutorial we will discuss the linear transformation.

Linear transformation

Linear transformations are mathematical expressions that include only combination of addition, subtraction, multiplication or division to set-up the correspondence between the numeric systems. for e.g. if we used original numbers for measuring the player’s heights in both countries, these numbers would not be comparable because of different measurement systems used in both countries (In US, height is measured in inches or feets, in India, it is measured in cm or meters). In this case, to make both values compatible to each other, we will have to multiply the values of height in inches by 2.54 since there are 2.54 centimeters per inch.

library(tidyverse)

In the data-frame, player_heights, the heights in inches and in centimeters are presented below:

set.seed(123)
player_height <- data.frame(Players = sprintf("Player_%01d", 1:20),
                            Height_inch = sample(seq(from = 68, to = 83, by = 1)
                                                 , size = 20, replace = TRUE))

player_height <- player_height %>% 
  mutate(Height_cm = Height_inch*2.54)

player_height

Because, rule to convert inches into centimeters involves multiplying each value in the distribution by positive constant, it is an example of linear transformation. Notice that order of data points is retained under this transformation. Player_8 is taller than Player_7 regardless of whether height is measured in cm or in inches.

Linear transformation may involve multiplication, addition, or combination of multiplication and addition. The general form of linear transformation is given below:

\[X_{new} = K*X_{old}+C\].

where, K and C are constants and K is not equal to 0.
$K$ represents both multiplication and division. For, e.g. K=1/2, multiplication by K is equivalent to division by 2. Also, C represents both addition and subtraction. For e.g. C=-4, adding C units is equivalent to subtracting 4 units.

K changes the scale of the original values:

1<K<-1: stretching the horizontal axis
-1<K<1: compressing the horizontal axis

C translates the location of original values:

C>0: the location of original values are shifted up along the horizontal axis.
C<0: the location of original values are shifted down along the horizontal axis.

Below example illustrates the linear transformation where K=2 and C=10.

set.seed(123)
X <- sample(1:10,100,replace=TRUE)
par(mfrow = c(1, 2))

hist(X,xlim = c(0,40),breaks = c(0:40),main = "Original data")

hist(2*X+10,xlim = c(0,40),breakes = c(0:40),main="Transformed data")

Notice that the scale of new values is doubled (with each unit increase in X, there is two-unit increase in $2X+10$ ) and the location of new values is shifted in positive direction along the horizontal axis from 1 to 12.

Effect of linear transformation on the shape of distribution

Linear transformation also retains the relative distance between points in the distribution. That is two points that are far from one another in the original data will be as far from one another to the same extent in the transformed data.

In the data-frame, player_heights, the Player_5’s height(70 inch.) is closer to the player_8’s height(73 inch.) than to Player_4’s height(81 inch.), to the same extent in both inches and centimeters.

Because relative distance does not change in case of linear transformation, neither does the general shape of distribution under linear transformation.

Let’s create a box-plot for heights in inches and centimeters.

boxplot(player_height$Height_cm , player_height$Height_inch ,
names=c("height cm","height inch"))

**Effect of liner transformation on shape of distribution**

clearly two box-plots have the same shape. By transforming the data linearly from inches to centimeters, the general shape of the distribution remains same, although central tendency and spread can change as in they have in this example.

Effect of linear transformation on summary statistics of a distribution

lapply(player_height[-1], function(x) c(std_dev = sd(x), avg = mean(x)))

$Height_inch
  std_dev       avg 
 4.671019 75.650000 

$Height_cm
  std_dev       avg 
 11.86439 192.15100

Notice the difference between central tendency or spread of distribution for the two box-plots. the mean height in centimeters (192.3) equals to mean height in inches (75.65) multiplied by 2.54 and the standard deviation of the heights in centimeters(11.86) equals to standard deviation of the heights in inches(4.67) multiplied by 2.54. In other words, both mean and standard deviation are multiplied by 2.54, the value by which the heights themselves were multiplied.

While the new standard deviation is larger than the original standard deviation by the factor of 2.54, the new variance will be larger than the original variance by a factor of 2.54^2 = 6.45

The effect on the measure of central tendency, spread, and shape of the distribution when multiplying all values in a distribution by nonzero constant, K, are summarized as follows,

\[when, X_{new} = K*X_{old}\]

For measures of central tendency:

Mode_new = K*Mode_old
Median_new = K*Median_old
Mean_new = K*Mean_old

Measures of spread:

IQR_new = K*IQR_old
SD_new = K*SD_old
Range_new = K*Range_old
Variance_new = K*Variance_old

Shape of a distribution depends on sign of K:

If K is positive: Skewness_new = Skewness_old
If K is negative: Skewness_new = (-1)*Skewness_old

The following box-plots give an example of additive transformation. In this example, we have created side-by-side box-plots of original data, original data + 10 and 2*original data -10. The central tendency of the distribution is given by the median of the box-plot. Here, the operation that was performed on every score was also performed on the median. The median of the original un-transformed data is 6. After adding 10 to every point, the median of the second box-plot is 16. Similarly, median of third box-plot from left is -4, or 10 less than un-transformed variable. The spread is given by IQR of the distribution. Here we see that by adding or subtracting, we do not change the spread of distribution. Similarly, shape of the distribution is unchanged by the transformation.

set.seed(123)
X <- sample(1:10,100,replace=TRUE)
boxplot(X,X+10,X-10,
        names=c("original data","plus 10","minus 10"))

The effect of summary statistics of additive transformation is described below:

\[X_{new} = C + X_{old}\]

For measures of central tendency:

Mode_new = Mode_old + C
Median_new = Median_old + C
Mean_new = Mean_old + C

Measures of spread:

IQR_new = IQR_old
SD_new = SD_old
Range_new = Range_old
Variance_new = Variance_old

Shape of a distribution:

If K is positive: Skewness_new = Skewness_old

Z-score: the most Common Linear transformation

In order to make the data compatible in different units, the most widely used transformation is z-score transformation. z-score measures number of standard deviation a data value is from its mean value within a distribution. A z-score distribution has mean of 0 and standard deviation of 1. Any data distribution may be re-expressed as z-score by following equation:

\[Z = \frac{X-\bar{X}}{S}\]

X-bar is mean of distribution
X is individual observation
S is standard deviation of distribution

The z-transformation is calculated as:

sapply(player_height[-1], function(x) (x-mean(x))/sd(x))

Thus, by transforming player heights’ in inches and centimeters into z-scores, we gain knowledge as to the placement of each players’ height in that distribution relative to the mean and standard deviation of the heights in that distribution. Because conversion of z-scores involves linear transformation, the order of players’ heights relative to the original distribution is preserved in the z-distribution. If Player_6 is taller than Player_5 in the original distribution, so will it be in the z-distribution as well. The shape of z-score distribution will be same as that of original distribution, except for possible shrinking or stretching along X-axis.

you can also observe that z-scores for both the variables are same, since one is simply a linear transformation of the other.

Summary

In this article, we saw an example of linear transformation. We learned:

Linear transformation involves multiplication, addition, or combination of multiplication and addition of each data value by some constant.
The effect of linear transformation on shape of the data
The effect of linear transformation on summary statistics of the data
The z-score, most common type of linear transformation.

Linear transformation

Effect of linear transformation on the shape of distribution

Effect of linear transformation on summary statistics of a distribution

Z-score: the most Common Linear transformation

Summary

Leave a Comment Cancel Reply