1

This issue applies to my own data, but for the sake of reproducability, my issue/question is also present in the FactoExtra vignette, or here, so I'll use that for the sake of simplicity.

To start, a simple PCA was generated (scale = T) and the coordinate variables from the first 4 axes extracted:

head(var$coord) # coordinates of variables
>                   Dim.1       Dim.2       Dim.3       Dim.4

> Sepal.Length  0.8901688 -0.36082989  0.27565767  0.03760602

> Sepal.Width  -0.4601427 -0.88271627 -0.09361987 -0.01777631

> Petal.Length  0.9915552 -0.02341519 -0.05444699 -0.11534978

> Petal.Width   0.9649790 -0.06399985 -0.24298265  0.07535950

This was also done for the "individuals." Here is the output:

head(ind$coord) # coordinates of individuals
>       Dim.1      Dim.2       Dim.3        Dim.4

> 1 -2.257141 -0.4784238  0.12727962  0.024087508

> 2 -2.074013  0.6718827  0.23382552  0.102662845

> 3 -2.356335  0.3407664 -0.04405390  0.028282305

4 -2.291707  0.5953999 -0.09098530 -0.065735340

5 -2.381863 -0.6446757 -0.01568565 -0.035802870

6 -2.068701 -1.4842053 -0.02687825  0.006586116

Since the PCA was generated with scale=T, I'm highly confused as to why the individual coordinates are not scaled (-1 to 1?). For instance, "individual 1" has a DIM-1 score of -2.257141, but I have no comparative basis for the variable coordinates which range from -0.46 to 0.991. How can a score of -2.25 be interpreted with a scaled PCA range of -1 to 1?

Am I missing something? Thanks for your time!

Updated with all relevant code gaps filled:

> data(iris)

> res.pca <- prcomp(iris[, -5],  scale = TRUE)

> ind <- get_pca_ind(res.pca)

> print(ind)

>var <- get_pca_var(res.pca)

> print(var)
Community
  • 1
  • 1
Eric
  • 51
  • 6
  • @Hack-R I wasn't sure if I should do that since this is either an issue of a bug, or my incorrect interpretation! Sorry! – Eric Jul 07 '18 at 01:59

2 Answers2

2

I asked the author of FactoExtra this question. Here was his reply:

Scale = TRUE will normalize the variables to make them comparable. This is particularly recommended when variables are measured in different scales (e.g: kilograms, kilometers, centimeters, …);(http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/).

In this case, the correlation between a variable and a principal component (PC) is used as the coordinates of the variable on the PC. The representation of variables differs from the plot of the observations: The observations are represented by their projections, but the variables are represented by their correlations.

So, the coordinates of individuals are not expected to be between -1 and 1, even if scale = TRUE.

It’s only possible to interpret the relative position of individuals and variables by creating a biplot as described at: http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/.

A biplot isn't idea for me, but I have tried rescale and it works. Also, I suppose I could take an individual and project them into the PCA to see where they fall.

Anyways, that's the end of that. Thanks for your help @Hack-r!

Eric
  • 51
  • 6
1

The scaling that is being done when prcomp(...,scale=T) is scaling of the input variables to unit variance.

I don't think it does anything about range standardization of the individual co-ordinates, unless perhaps center = ... is used. However, it would be easy to do post-hoc (or pre). Here's a related post:

Range standardization (0 to 1) in R

Hack-R
  • 22,422
  • 14
  • 75
  • 131
  • Right, but I don't see why the individuals wouldn't be on the same scale. I don't see how I could interpret a PC-1 value of -2.55 when the scale is -1 to 1. However, I just tried the link you posted and gave rescale a shot - it seems like that might work out! Thanks! – Eric Jul 07 '18 at 03:27
  • @Eric It sounds like you may be confusing the scaling of the ***variance*** with the range of the values. Nothing that happened in the original code made any attempt to put the ***values*** on a -1 to 1 scale. You scaled the ***variance***. – Hack-R Jul 07 '18 at 16:31
  • Perhaps I'm confused about the application of this. For instance, in the original post, how can you make any conclusions about "individual 1?" (PC1: -2.257141 PC2: -0.4784238 PC3: 0.12727962 PC4: 0.024087508) Essentially, I'd like to see "where the individuals are" within the PCA that was generated. – Eric Jul 07 '18 at 18:22