# Studying The Iterative Principal Axis Transformation algorithm and its correctness according to X^2-test proposed by Rippe D.D. using R program

Bachelor Thesis 2009 28 Pages

## Excerpt

## Inhaltsverzeichnis

1 What is Factor Analysis?

2 Why we use Factor analysis?

3 History of Factor Analysis

4 Uses in psychology

5 Factor analyzing in marketing

6 Factor analyzing and Physical science

7 Mathematical definition

8 Representation of the Random Vector

9 Covariance of the Random Vector

10 The task of Factor analysis

11 Simulation

12 Algorithm

13 Testing

14 Results

15 Conclusion

16 Code

## 1 What is Factor Analysis?

Factor Analysis is a method often used in statistics to examine and analyze the relationship between a larger numbers of variables to find a smaller number of Factors which explain the relationship between the original variables.

## 2 Why we use Factor analysis?

Factor analysis began with psychometrics; a field of study which concerns mostly on psychological measurements including measuring the knowledge, personalities or emotions. Later it has been mostly used in social sciences, product management and also marketing. The use of Factor analysis may always come in mind whenever we face huge amount of data and there is a need to find similarities between these amounts of data.

## 3 History of Factor Analysis

Since hundred years ago^{1} when Charles Spearman introduced his Ideas about factor of general intelligence, Factor analysis became a vital statistic strategy for investigating in our modern science and finding the interaction principles of components. Different scientists have been working on Factor analysis since its origination date in psychometrics 100 years ago.^{2}

After Spearman, Raymond Cattell started to develop and expand Spearman’s intelligence theory and for this purpose he used his personal tests and relative Factor analysis.

## 4 Uses in psychology

Factor analysis in psychology mostly related to intelligence researches. All factors which have been identified by Factor analysis in psychology will show the underlying similarities and also the correlations of different patterns. As an example in human’s personality it would be very difficult to understand how many of the words or phrases that a person uses show the characteristic of that person.

## 5 Factor analyzing in marketing

Factor analyzing will be use in marketing to understand which factors have effect or influences on purchasing a product. Factor analysis in marketing focuses on how different variables of a product will affect the customer’s purchase of the product.

## 6 Factor analyzing and Physical science

Factor analysis has been used since several years in physical sciences like ecology or geochemistry. As an example in water quality management different chemical variables will be considered in the quality of the water and by using factor analyzing it will be possible to see how these variables are distributed.

The same situation will be considered in geochemistry, for example different factors based on the distribution of different mineral associations.^{3}

## 7 Mathematical definition

Factor analysis tries to display a Random Vector ∈ *d* in the form of = + , in which Y is a f-dimensional Vector (f < d) with independent “Factors” with Variance 1 and Z is a Vector of independent “Errors” ,which are also independent from Factors.

The Variances D from Z are called Individual Residual Variances (“uniquen-esses”) and the ( × ) Matrix is called “Loading Matrix”.

## 8 Representation of the Random Vector

It is possible to represent the Random Vector in a form of a Matrix; here you can see the simple presentation of the Random Vector in the form of a Matrix:

Abbildung in dieser Leseprobe nicht enthalten

As you can see the loading Matrix is a [Abbildung in dieser Leseprobe nicht enthalten] Matrix. It is important to keep in mind that the Loading Matrix is a constant matrix which means all elements of this Matrix are constant, in contrast to the vectors X, Y and Z, which are random variables.

## 9 Covariance of the Random Vector

First, some theoretical considerations. There are two important rules in calculation of the covariance in our case. These rules will be presented here:

Abbildung in dieser Leseprobe nicht enthalten

According to these rules that we already know, we can calculate the covariance of the random vector X.

Abbildung in dieser Leseprobe nicht enthalten

In calculation of *Cov(Z)* we know that the elements of vector are independent, that’s why matrix is a Diagonal Matrix like below.

Abbildung in dieser Leseprobe nicht enthalten

*Cov(Y)* is also a diagonal matrix as it was indicated before that “Y is a f-dimensional Vector *(f < d)* with independent Factors with Variance 1”.

Abbildung in dieser Leseprobe nicht enthalten

At last these lead to the representation of *[Abbildung in dieser Leseprobe nicht enthalten]* in which the is the *∑* Covariance Matrix of *X* and the *D* is the Diagonal Matrix with entries of *D*.

It is also important to know how the covariance matrix *Cov(X)* may look like. In our case *Cov(X)* is a (*d × d*) matrix, which its elements will look like below:

Abbildung in dieser Leseprobe nicht enthalten

As you can see elements on the main diagonal are variances of elements of .

In our simulation, we will use an estimation of the Covariance Matrix of the Random Vector calculated from a random sample. The calculation of this estimation is very useful for understanding the factor analysis process.

The formula for calculating covariance of two real-valued random variables like and is shown below.

Abbildung in dieser Leseprobe nicht enthalten

It is important not to forget that the covariance which has been calculated using this formula is an estimation and that’s why by calculations we use the∑ symbol which means that’s an estimation of the covariance. ˆ

However in our case, calculation of covariance will be done by computer using the R program. Details about calculating the covariance with R will be explained further in this paper.

## 10 The task of Factor analysis

The task of factor analysis is now to estimate the Parameters and from a sample. There are a whole range of techniques available for this purpose including Iterative Principal Axis Transformation. The Iterative Principal Axis Transformation algorithm consists of five different steps:

1. Starting value for will be assigned.

2. The f largest eigenvalues [Abbildung in dieser Leseprobe nicht enthalten] and the corresponding normalized right eigenvectors [Abbildung in dieser Leseprobe nicht enthalten] will be determined. ( i.e. = 1)

3. Let [Abbildung in dieser Leseprobe nicht enthalten]

4. The diagonal of [Abbildung in dieser Leseprobe nicht enthalten] − will be taken as new estimate of .

5. When the new estimated value differs from the old estimate by less than a predetermined tolerance limit then the solution is found, otherwise it will return to point number 2.

Iterative Principal Axis Transformation will be executed on computer using statistical program R.

According to this Algorithm, some point may need to be explained. As an ex-ample, we consider that [Abbildung in dieser Leseprobe nicht enthalten] is a quadratic *(d × d) * matrix A .

Abbildung in dieser Leseprobe nicht enthalten

Above *[Abbildung in dieser Leseprobe nicht enthalten]* represents Eigenvectors and *A* represents Eigenvalues. In our case the Eigenvectors and Eigenvalues will be calculated using *R* and for this purpose we use *R* command *e = eigen()* and relatively by using commands *e$values* and *e$vectors* we will be able to have the Eigenvectors and Eigenvalues.

Our Eigenvectors and Eigenvalues may look like below.

Abbildung in dieser Leseprobe nicht enthalten

After receiving the Eigenvectors and Eigenvalues now computer can calculate the amount of L according to formula which has been given on point 3, [Abbildung in dieser Leseprobe nicht enthalten].

In this case *L* will look like below:

Abbildung in dieser Leseprobe nicht enthalten

After calculating the *L* it is now possible to calculate *D* using formula below: *ˆ*

Abbildung in dieser Leseprobe nicht enthalten

We should consider that the *D* which has been calculated using the formula above is not our optimum *D*,that’s why we need to repeat the algorithm until

the estimations of *D* converge, at which point we hope that the optimal *D* is found. For this purpose we need to define a tolerance limit, which in according to this limit the Algorithm should be repeated until it converges the last *D* as our optimum *D*.

Diagram below illustrates the process of finding an optimum *D*:

Abbildung 1: Nummber of iterations we need to go till we find the optimum *D*

Rippe D.D suggested a [Abbildung in dieser Leseprobe nicht enthalten] test^{4} to see whether the Factor-model describes the data adequately or not.

For implementing this test there is a need of calculating the Statistic *T* as defined by

Abbildung in dieser Leseprobe nicht enthalten

To validate this hypothesis, a simulation will be performed using an implementation in the R programming language. The aim is to see if the Iterative Principal Axis Transformation algorithm works.

To see if the Iterative Principal Axis Transformation algorithm works we need to execute a process and see if the results are close or not.

The test process consists of two levels; Simulation and the Algorithm.

**[...]**

^{1} Why using factor analysis?(dedicated to the centenary of factor analysis)Alexander S. Kaplunovsky Research Center for Quantum Communication Engineering-http://www.magniel.com/fa/kaplunovsky.pdf

^{2} (Exploratory) Factor Analysis in Personality Psychology 2005

http://webspace.ship.edu/tosato/factanls.htm

^{3} http://en.wikipedia.org/wiki/Factor_Analysis#cite_note-0

^{4} Rippe,D.D., Application of a large sampling criterion to some sampling problems in factor analysis, Psychometrika 18(1953),191-20