Statistical Correlation
A correlational study determines whether or not two variables are correlated. This means to study whether an increase or decrease in one variable corresponds to an increase or decrease in the other variable.
Types
There are three types of correlations that are identified:
- Positive correlation: Positive correlation between two variables is when an increase in one variable leads to an increase in the other and a decrease in one leads to a decrease in the other. For example, the amount of money that a person possesses might correlate positively with the number of cars he owns.
- Negative correlation: Negative correlation is when an increase in one variable leads to a decrease in another and vice versa. For example, the level of education might correlate negatively with crime. This means if by some way the education level is improved in a country, it can lead to lower crime. Note that this doesn't mean that a lack of education causes crime. It could be, for example, that both lack of education and crime have a common reason: poverty.
- No correlation: Two variables are uncorrelated when a change in one doesn't lead to a change in the other and vice versa. For example, among millionaires, happiness is found to be uncorrelated to money. This means an increase in money doesn't lead to happiness.
A correlation coefficient is usually used during a correlational study. It varies between +1 and -1. A value close to +1 indicates a strong positive correlation while a value close to -1 indicates strong negative correlation. A value near zero shows that the variables are uncorrelated.
Relationship Between Variables
Correlation can tell you something about the relationship between variables. It is used to understand:
- whether the relationship is positive or negative
- the strength of relationship.
Correlation is a powerful tool that provides these vital pieces of information.
In the case of family income and family expenditure, it is easy to see that they both rise or fall together in the same direction. This is called positive correlation.
In case of price and demand, change occurs in the opposite direction so that increase in one is accompanied by decrease in the other. This is called negative correlation.
Coefficient of Correlation
Statistical correlation is measured by what is called coefficient of correlation (r). Its numerical value ranges from +1.0 to -1.0. It gives us an indication of the strength of relationship.
In general, r > 0 indicates positive relationship, r < 0 indicates negative relationship while r = 0 indicates no relationship (or that the variables are independent and not related). Here r = +1.0 describes a perfect positive correlation and r = -1.0 describes a perfect negative correlation.
Closer the coefficients are to +1.0 and -1.0, greater is the strength of the relationship between the variables.
As a rule of thumb, the following guidelines on strength of relationship are often useful (though many experts would somewhat disagree on the choice of boundaries).
Correlation is only appropriate for examining the relationship between meaningful quantifiable data (e.g. air pressure, temperature) rather than categorical data such as gender, favorite color etc.
Disadvantages
While 'r' (correlation coefficient) is a powerful tool, it has to be handled with care.
- The most used correlation coefficients only measure linear relationship. It is therefore perfectly possible that while there is strong non linear relationship between the variables, r is close to 0 or even 0. In such a case, a scatter diagram can roughly indicate the existence or otherwise of a non linear relationship.
- One has to be careful in interpreting the value of 'r'. For example, one could compute 'r' between the size of shoe and intelligence of individuals, heights and income. Irrespective of the value of 'r', it makes no sense and is hence termed chance or non-sense correlation.
- 'r' should not be used to say anything about cause and effect relationship. Put differently, by examining the value of 'r', we could conclude that variables X and Y are related. However the same value of 'r' does not tell us if X influences Y or the other way round. Statistical correlation should not be the primary tool used to study causation, because of the problem with third variables
Spearman Rank Correlation Coefficient
Spearman Rank Correlation Coefficient is a non-parametric measure of correlation, using ranks to calculate the correlation.
Spearman Rank Correlation Coefficient uses ranks to calculate correlation.
Whenever we are interested to know if two variables are related to each other, we use a statistical technique known as correlation. If the change in one variable brings about a change in the other variable, they are said to be correlated.
A well known measure of correlation is the Pearson product moment correlation coefficient which can be calculated if the data is in interval/ ratio scale.
It is also known as the "spearman rho" or "spearman r correlation".
The Spearman Rank Correlation Coefficient is its analogue when the data is in terms of ranks. One can therefore also call it correlation coefficient between the ranks. The correlation coefficient is sometimes denoted by rs.
Example
As an example, let us consider a musical (solo vocal) talent contest where 10 competitors are evaluated by two judges, A and B. Usually judges award numerical scores for each contestant after his/her performance.
What makes more sense is correlation between ranks of contestants as judged by the two judges. Spearman Rank Correlation Coefficient can indicate if judges agree to each other's views as far as talent of the contestants are concerned (though they might award different numerical scores) - in other words if the judges are unanimous.
Interpretation of Numerical Values
The numerical value of the correlation coefficient, rs, ranges between -1 and +1. The correlation coefficient is the number indicating the how the scores are relating.
rs = correlation coefficient
In general,
- rs > 0 implies positive agreement among ranks
- rs < 0 implies negative agreement (or agreement in the reverse direction)
- rs = 0 implies no agreement
Closer rs is to 1, better is the agreement while rs closer to -1 indicates strong agreement in the reverse direction.
Assigning Ranks
In order to compute Spearman Rank Correlation Coefficient, it is necessary that the data be ranked. There are a few issues here.
Suppose that scores of the judges (out of 10 were as follows):
Ranks are assigned separately for the two judges either starting from the highest or from the lowest score. Here, the highest score given by Judge A is 9.
If we begin from the highest score, we assign rank 1 to contestant 2 corresponding to the score of 9.
The second highest score is 8 but two competitors have been awarded the score of 8. In this case both the competitors are assigned a common rank which is the
arithmetic mean of ranks 2 and 3. In this way, scores of Judge A can be converted into ranks.
Similarly, ranks are assigned to the scores awarded by Judge B and then difference between ranks for each contestant are used to evaluate rs. For the above example, ranks are as follows.
Spearman Rank Correlation Coefficient tries to assess the relationship between ranks without making any assumptions about the nature of their relationship.
Hence it is a non-parametric measure - a feature which has contributed to its popularity and wide spread use.
Spearman Rank Correlation Coefficient is a non-parametric measure of correlation.
Pearson Product-Moment Correlation
Pearson Product-Moment Correlation is one of the measures of correlation which quantifies the strength as well as direction of such relationship. It is usually denoted by Greek letter ρ.
In the study of relationships, two variables are said to be correlated if change in one variable is accompanied by change in the other - either in the same or reverse direction.
Conditions
This coefficient is used if two conditions are satisfied
- the variables are in the interval or ratio scale of measurement
- a linear relationship between them is suspected
Positive and Negative Correlation
The coefficient (ρ) is computed as the ratio of covariance between the variables to the product of their
standard deviations. This formulation is advantageous.
First, it tells us the direction of relationship. Once the coefficient is computed, ρ > 0 will indicate positive relationship, ρ < 0 will indicate negative relationship while ρ = 0 indicates non existence of any relationship.
Second, it ensures (mathematically) that the numerical value of ρ range from -1.0 to +1.0. This enables us to get an idea of the strength of relationship - or rather the strength of
linear relationship between the variables. Closer the coefficients are to +1.0 or -1.0, greater is the strength of the linear relationship.
As a rule of thumb, the following guidelines are often useful (though many experts could somewhat disagree on the choice of boundaries).
Range of Ρ
Properties of Ρ
This measure of correlation has interesting properties, some of which are enunciated below:
- It is independent of the units of measurement. It is in fact unit free. For example, ρ between highest day temperature (in Centigrade) and rainfall per day (in mm) is not expressed either in terms of centigrade or mm.
- It is symmetric. This means that ρ between X and Y is exactly the same as ρ between Y and X.
- Pearson's correlation coefficient is independent of change in origin and scale. Thus ρ between temperature (in Centigrade) and rainfall (in mm) would numerically be equal to ρ between temperature (in Fahrenheit) and rainfall (in cm).
- If the variables are independent of each other, then one would obtain ρ = 0. However, the converse is not true. In other words ρ = 0 does not imply that the variables are independent - it only indicates the non existence of a non-linear relationship.
source :https://explorable.com/pearson-product-moment-correlation