UGC NET STATISTICS FOR ECONOMICS MATERIAL
MEASURES OF CENTRAL TENDENCY
1. Arithmetic Mean: It is the most common form of average and is used only in case of
quantitative data. It can be calculated for both grouped as well as ungrouped data.
Properties of Arithmetic Mean:
1. Sum of deviations of all items from the mean is zero, that is, ⅀(X- X ) = 0
2. Sum of squared deviations from the mean is minimum.
3. If all items are replaced by some number, then mean won’t change.
4. If we add/subtract/multiply/divide all the items with a number, then the mean would be
changed in the same manner.
5. It is capable of further mathematical treatment like combined mean.
Combined mean = ( X1N1 + X 2 N2) / (N1 + N2)
Properties of Geometric Mean
1. Geometric Mean is less than the arithmetic mean, that is, GM<AM
2. The product of all items will remain the same if each item is replaced by the geometric mean.
3. The geometric mean of the ratio of two corresponding observations in a series is equal to the
ratio of their geometric means.
4. The geometric mean of the product of corresponding items in a geometric mean is equal to
the product of their geometric means.
Harmonic Mean: It is the reciprocal of the arithmetic mean of the reciprocal of all items in a
series. Harmonic Mean is also applicable in case of quantitative data.
It is used less in economics because it gives largest weight to smallest items. It is used in
probability of time and speed and also in finding their rates.
Relationship between Arithmetic Mean, Geometric Mean and Harmonic Mean
1. If all items are the same then, AM=GM=HM. If not, then, AM>GM>HM.
2. GM = √𝑨𝑴 ∗ 𝑯𝑴
POSITIONAL AVERAGES
1. Median (M): Median divides the series into two equal parts. The sum of deviations from the
median will be minimum if we ignore the signs.
Quartiles: It divides the series into four equal parts.
Q1 = (N+1)/4 = First Quartile
Q2 = 3(N+1)/4 = Third Quartile
Second Quartile = Median
Deciles: They divide the series into ten equal parts. series.
Percentiles: They divide the series into hundred equal parts.
The percentiles and deciles are used in education statistics, and psychological statistics.
Mode (Z): It is the most frequently occurring item in the series. For individual and discrete series,
mode is calculated using the most frequently occurring value, while for continuous series, we first
identify the modal class with the highest frequency and then use the following formula:
MEASURES OF DISPERSION
Dispersion is the degree or amount by which items in a series are different from the central
value. It will tell us the reliability and representative of an average calculated from the series.
Range: It is the difference between the largest and the smallest value of a series.
Range = L - S
Coefficient of Range = (𝐋−𝐒) / (𝐋+𝐒)
It must be noted that though Range is very easy to calculate, it is usually not used as a
measure of variability because of its inherent feature of instability. Therefore, range is
considered to be a very limited measure of variability.
Quartile Deviation: It is the difference between the third quartile (Q3) and the first quartile (Q1).
Semi-Quartile Range = (Q3 – Q1)/2
Coefficient of Quartile Deviation =𝐐𝟑−𝐐𝟏 / 𝐐𝟑+𝐐𝟏
Quartile deviation measures the variability of the middle 50 percent values, that is, which are
between Q3 and Q1.
Mean or Average Deviation: It is the sum of items taken from any central tendency.
It is usually preferred from median because the deviations from the median are minimum
when ignoring signs.
Coefficient of Mean Deviation Formula
From Mean Mean Deviation from mean/Mean
From Median Mean Deviation from mean/Median
From Mode Mean Deviation from mean/Mode
Standard Deviation/Root Mean Square Deviation: It was introduced by Karl Pearson in 1823.
Coefficient of Variation (CV): It was given by Karl Pearson.
CV = 𝝈 / 𝑴𝒆𝒂𝒏 ∗ 𝟏𝟎𝟎
Higher the value, poorer the series.
Variance: The concept of variance was given by R.A. Fischer in 1913. It is simply the square of
standard deviation Graphical Method of measuring dispersion is the
Lorenz Curve
Graphical Method of measuring dispersion
The concept was given by Musgrave through Lorenz Curve
Where, the straight line is considered to be the line of equality. Closer is the curve to the line
of equality, less will be the dispersion.
SKEWNESS
Skewness measures the direction in which the values are dispersed. It measures the degree of
symmetry or asymmetry of the series.
Symmetric Distribution: A series is said to
be symmetric when it has the same shape on
both the sides of the centre. A symmetric
distribution which has only one peak is known
as a normal distribution.
Positively skewed distribution: A distribution
is considered to be positively skewed when it
has a long tail extended from its right. Under
such kind of a distribution, mean is greater than
the median, and is prone to large shifts when
the sample drawn is small and contains extreme
values.
Negatively Skewed: A series is said to
be negatively skewed when it has a
long tail extending from its left.
Tests of Skewness
1. If mean = median = mode, then the series is symmetrical.
2. Shape of the curve also determines whether the series is skewed or not.
3. If Q3 – Median = Median – Q1 then the series is symmetrical.
4. For a series to be symmetrical, the sum of positive deviations from the median should be equal
to the sum of negative deviations from the median.
5. In case of a symmetrical distribution, the frequencies are equally distributed at points of equal
deviations from the mode.
MOMENTS
Moments can be calculated using three ways:
1. Central Moments or Moments about the actual mean (μ)
μr = ⅀(X-Mean)r/N
μ1 = ⅀(X-Mean)1/N = Zero
μ2 = ⅀(X-Mean)2/N = Variance
2. Non - Central Moments or raw moments (Using assumed mean) (μ’)
It is of least utility and is used for conversion purposes only.
μr’ = ⅀(X-A)r / N
3. Moments about the origin or zero (v)
vr = ⅀(X-0)r / N
v1 = ⅀(X)1 / N = Mean
μ2 and μ3 measures skewness.
μ2 and μ4 measures kurtosis.
KURTOSIS
Kurtosis determines the shape of the curve, that is, degree of flatness or peakedness of the curve.
It refers to how the items in a distribution are concentrated in the Centre.
Mesokurtic: A normal distribution is also known as mesokurtic. There is neither an excess or
deficient items in the centre of a mesokurtic curve
Platykurtic: The items in a platykurtic are scattered around the shoulder and are not
concentrated around the centre or at the tails.
Leptokurtic: The items are more concentrated in the centre in case of a leptokurtic curve. As a
result of this, such curve has a sharp peak.
Measure of Kurtosis
β2 = μ4 / μ22
Ɣ2 = β2 – 3
β2 = 3 Mesokurtic
β2 > 3 Leptokurtic
β2 < 3 Platykurtic
CORRELATION
Correlation determines the relationship between two or more variables. It determines the degree
and direction of relationship between two or more than two variables.
Spurious Correlation is the one in which relationship cannot be determined.
Types of Correlation
Depending upon direction of change
1. Positive correlation: Both the variables move in the same direction.
2. Negative correlation: Both variables change in the opposite directions.
Ratio of changes
1. Linear correlation: It reflects the same ratio of change.
2. Curvilinear: Ratio of change between the variables is not the same.
Degree of relationship
1. Multiple correlation: Correlation between more than two variables.
2. Simple Correlation: Correlation between two variables.
3. Partial Correlation: Relationship is studied between two variables keeping other variables
constant.
Properties of Correlation (‘r’)
1. Its value lies between -1 and +1, that is, -1 ≤ r ≤ +1
2. r is independent of change in origin and change in scale.
3. r is the under-root of the product of two regression coefficients.
4. It is symmetric, that is, rxy = ryx
5. If the two variables are not related or are independent of each other, then r is equal to zero. of
each other.
Probable Error: The probable error is used to test the value and significance of the correlation
coefficient. It helps in testing the reliability of r. we can also find out the exact lower and upper
limits within which r is expected to lie.
Methods for Measuring Correlation
1. Graphical or Scattered Diagram Method
2. Karl Pearson’s Coefficient of Correlation: It is based on the idea of covariance.
3. Spearman’s rank correlation method: The method was developed in 1904. It is useful for
finding the correlation in case of qualitative data. Under this, no assumption is made regarding
the distribution of data.
Properties: -
1. ⅀D = ⅀(R1 – R2) = 0
2. R is distribution free and non-parametric.
3. R=r when all values are different and no value is repeated.
QUESTIONS FOR CLARIFICATION
1. Which one of the following statistical measures is not affected by extremely Large or small
values
A. Median B. Harmonic mean
C. Standard deviation D. Coefficient of variation
2. Which of the following is not a characteristic of a good average
A. It should be easy and simple to understand
B. It should have a sampling stability
C. It should not be rigidly defined
D. It should be based on all the observations
3. In any set of numbers, the geometric mean exists only when all numbers are
A. Negative B. Positive, zero or negative
C. Positive D. Zero
4. Which of the following is true about the arithmetic mean
A. It possesses a large number of characteristics of a good average
B. It is unduly affected by the presence of extreme values
C. In extremely asymmetrical distribution it is not a good measure of central tendency
Codes:
A. 1 only B. 1,2 and 3
C. 2 and 3 only D. 3 only
5. Assertion (A) : skewness measures Regression
Reason (R): Kurtosis measures flatness at the top of the frequency curve.
A. Both A and B are true and R is the correct explanation of A
B. Both A and R are true but R is not the correct explanation of A
C. A is false but R is true
D. A is true but R is false
6. Consider the following statements
1.Quartile deviation is more instructive range as it discards the dispersion of extreme items
2.Coefficient of quartile deviation cannot be used to compare the degree of variation in different
distributions
3.There are 10 deciles for a series
Codes:
A.1,2 and 3 B.2 only
C.3 only D.1 only
Answers:-
1) A 2) D 3) C 4) B 5) C 6) D
REGRESSION
from correlation because values of dependent variable is calculated on the basis of
given values of explanatory variables.
Standard Error of Regression Estimate: It measures the dispersion around the average or is
commonly known as the mean relationship. It measures the reliability of regression
coefficient.
Coefficient of Determination: It is measured using the following formula:
Coefficient of Determination (r2) = 𝐄𝐱𝐩𝐥𝐚𝐢𝐧𝐞𝐝 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧 / 𝐓𝐨𝐭𝐚𝐥 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧
Correlation coefficient (r) will be equal to r2 only when r = 0 or 1. Also, r tells the direction of
correlation while r2 doesn’t provide us with any such information.
Coefficient of Non-Determination (k2) = 1 – r2 or
k2= 𝐔𝐧𝐞𝐱𝐩𝐥𝐚𝐢𝐧𝐞𝐝 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧 / 𝐓𝐨𝐭𝐚𝐥 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧
TYPES OF DATA
1. Primary Data: This includes Direct personal investigation , Indirect/Telephonic oral
investigation, Information through local sources or correspondents, Schedule filled in by the
respondents, Schedule filled in by the enumerators.
2. Secondary Data: It is the second hand information and is not costly.
Census
Census covers each item of the population and is considered to be very authentic and
reliable. However, it is time consuming and costly.
Sampling
Law of Statistical Regularity: Any sample out of the population will have the properties of
the population.
Inertia of large numbers: Larger the sample size, more authentic is the sample.
Sampling Methods
1. Probability/Random/Chance Sampling: All units of the population have the same chance of
being selected in the sample.
Simple Unrestricted Probability Sampling: Under this method, the sample is drawn randomly,
that is, each item has equal probability of getting selected in the sample.
Stratified Sampling: Under stratified sampling, the population is divided into different strata.
Strata are defined in such a way that the population inside one strata is homogenous in nature.
Later, the sample is drawn from each strata, hence forming our complete sample.
Systematic Sampling: Under this, the population is arranged in an order. A sample is so
constituted that every nth term is a part of the sample. For example, I choose every 7th individual
to be a part of my sample.
Multi-Stage/Cluster sampling: Under this method, the sample is selected in stages, starting from
an elementary stage.
2. Non-Probability Sampling: It is non-random. All items of the population don’t have the same
chance of being selected.
Types:
Judgement Sampling: It is also known as ‘Sampling by Opinion’. Under this, the person,
according to his own judgement, decides on who is to be included in the sample.
Quota Sampling: Under this method, a quota is decided on to how much should be the sample.
Further, each investigator is given a quota to interview a specific number of respondents.
Convenience Sampling: Under this, the sample is formulated according to the convenience of
the investigator
Sampling Errors
Error is the difference between sample statistic and population parameters.
Types of errors are:
1. Sampling Error: It can occur because of any procedure involved in sampling.
Biased: Due to human element
Unbiased: Due to problem in procedure of selection.
With increase in sample size, the sampling errors (especially the unbiased errors) decrease.
2. Non-Sampling Error: It occurs after the selection of a sample. For example, faulty printing,
coding, tabulation of data, etc.
PROBABILITY DISTRIBUTION
Observed Frequency Distribution: Based on observations
Theoretical Probability Distribution 1. Binomial
2. Normal
3. Poisson
Binomial Distribution: It is given by James Bernoulli and thus is also known as Bernoulli
distribution. It is a kind of a discrete possibility distribution where we have only two cases: either
a success or a failure.
Graphically, binomial distribution will be symmetrical when p = ½. If p > ½ then the
distribution is positively skewed and when p < ½ then it is negatively skewed.
Skewness of binomial distribution will be less pronounced if ‘n’ is increased. If we increase p
for a fixed n, then binomial distribution will shift to the right. Mode, in this case, is the value of
n which has the highest probability.
Binomial Distribution is also called a limiting case of normal distribution.
Mean = n*p
Variance = npq
Binomial distribution is usually used in business and social sciences as well as for quality
control.
Poisson distribution: It was given by Denis Poisson in 1937. It is used when the number of trials
are infinite but probability of success is very small and probability of failure is tending towards one.
It is used in case of finding the number of defects, number of accidents, number of causalities,
etc.
Mean = np
Variance= np (Mean = Variance)
Normal Distribution: It was given by A.D. Moivre, Karl Gauss and Laplace in 1733.
Properties:
1. The distribution is symmetrical, hence it resembles a bell-shaped curve.
2. The normal curve is symmetrical and is unimodal.
3. The two tails of the curve do not touch the axis, that is, they continue to extend indefinitely.
4. Total area under the normal curve is 1. 5. Mean = Median = Mode
6. Standard Normal Distribution is the random variable which has a normal distribution with
mean equal to zero and standard deviation equal to one.
Poisson distribution is a limiting case of normal distribution if m is large.
Mean Zero
Standard Deviation One
Variance One
INDEX NUMBERS
First Price Index was given in Italy in 1754. It was used to compare price changes in the time
period 1750 to 1760.
1. Simple Index Numbers: It is an Unweighted Index Number.
Simple Aggregative Method = ⅀𝑷𝟏 / ⅀𝑷𝟎 ∗ 𝟏𝟎𝟎
Price Relatives: Firstly the price relatives are calculated using the formula:
Price Relatives = 𝑷𝟏/ 𝑷𝟎 ∗ 𝟏𝟎𝟎
Then a simple average of price relatives is calculated, that is,
(⅀Price relatives)/n
Laspeyer’s Index (L) P01 = (⅀P1Q0/⅀P0Q0 )*100
Under this, the base year quantities are taken as weights. The index value has an upwards bias
as it over- estimates the price changes.
Paasche’s Index (P) P01 = (⅀P1Q1/⅀P0Q1 )*100
Under this, the current year quantities are taken as weights. It underestimates the price
changes and as a result has a downward bias.
Dorbish-Bowley Index (L+P)/2
It uses both the current and base year quantities as weights. It is the arithmetic mean of both
Laspeyer’s and Paasche’s index.
Fischer’s Ideal Index √𝐿 ∗ 𝑃
It is the geometric mean of both Laspeyer and Paasche’s Index. It is considered to be an ideal
index because: 1. Both current year and base year quantities are taken.
2. It cancels the upward and downward bias.
3. It satisfies the time reversal and factor reversal test.
Marshall-Edgeworth Index P01 = [( ⅀P1(Q0+Q1))/( ⅀P1(Q0+Q1)) *100
It is the sum of current and base year quantities as weights.
Kelly’s Index P01 = (⅀P1q/⅀P0q) *100
It is also known as fixed weight aggregative method.
Tests of Adequacy of Index Numbers
Unit Test
Formula for index number construction should be independent of the units in which prices and
quantities are quoted.
Apart from simple unweighted aggregative index, this test is satisfied by all other indices.
Time Reversal Test P01 * P10 = 1
It was given by Fischer. With the basis reversed, the two indices should be reciprocals of each
other, that is, interchanging the time periods should not lead to inconsistent results. It is satisfied
by; 1. Fischer’s index 2. Marshall-Edgeworth index
3. Simple Geometric mean of price relatives 4. Aggregates with fixed weights.
5. Weighted geometric mean
Factor Reversal Test P01 * Q10 = V01 : Given by Fischer.
On interchanging the prices and quantities, the results should be consistent.
This test will be satisfied by Fischer’s Index
Circular Test P01* P12 * P20 = 1
It is an extension of time reversal test and also the shiftability of weights. The index should be
able to adjust index values from period to period without referring to the original base.
This test is satisfied using 1. Simple Geometric mean of price relatives
2. Weighted aggregative with fixed weights.
Splicing of Index Numbers: Replacing the old series with the new series.
Deflating of Index Numbers: Accounting for price changes.
TIME SERIES
Time series analysis that over a period of time, how does the value of the variable changes. It
forecasts future achievements.
Components of Time Series are:
1. Secular Trend (T): Persisting movement of any variable over a period of time.
2. Seasonal Variations (S): Repetitive movements in every season.
3. Cyclical Variation (C): Business Cycles
4. Irregular Variations (I): Completely random
STATISTICAL INFERENCE
Statistical Inference is that branch of statistics where we use probabilities to deal with
uncertainties in decision making. It involves: 1. Hypothesis Testing
2. Estimation
Hypothesis: It is a general statement made about any relationship. In other words, it is a tentative
assumption made about any relationship/population parameter.
Null Hypothesis (H0): Null hypothesis is stated for the purpose of testing or verifying its validity.
It assumes that there is no difference between population parameter and sample statistics and if
there is any difference, it is by chance.
Alternate Hypothesis (H1): It includes any other admissible hypothesis, other than null
hypothesis. Alternate hypothesis is accepted when the null hypothesis is rejected.
Level of Significance: It is the probability of committing Type I error.
Power of a test: It analysis how well the test is working and depends majorly on Type II error.
Power of a test can be measured by 1-β.
Two Tailed Test: In this, the critical region lies on both sides. It does not tell us whether the value
is less than or greater than the desired value. The rejection region under this test is taken on both
the sides of the distribution. For example,
H0: β = 100
H1: β≠ 100
One Tailed Test: Under this, H1 can either be greater than or less than the desired value. The
rejection region, under this test, is taken only on one side of the distribution.
H0: β = 100
H1: β > 100
Any value which is based on the sample data is known to be a sample statistic and the value
calculated from the population is known as population parameter.
Standard error of sampling distribution tells us the standard deviation of a sampling distribution.
Standard error of a sample = σ / √𝒏
Steps of Testing Hypothesis
1. Set your hypothesis.
2. Choose a level of significance
3. Choose a test
4. Make calculations or computations using the test.
5. Decision making stage.
Estimators
There are two kinds of estimators
1. Point estimators
2. Interval estimators: They are the confidence intervals in which lower and upper value of the
parameter will lie.
Properties of estimators
1. Unbiasedness
2. Consistency
3. Efficiency
4. Sufficiency: should include maximum information about the population parameter.
Testing of Attributes
To test the attributes, we calculate the standard error, and the difference between the actual
and the observed values. If the difference is greater than the standard error at a particular
level of significance, we reject the null hypothesis. If the difference is less than the standard
error, we accept the hypothesis.
T-Test: It was given by William Gosset in 1905 and is also known as Student’s T-test.
Assumptions
a. Number of observations is less than 30.
b. Random sampling distribution is approximately normal.
c. It is used when standard deviation of population is not known.
If calculated value is greater than the table value, then difference is significant and we reject
the null hypothesis. If calculated value is less than the table value, then we accept the null
hypothesis implying that there is no significant difference.
Properties of t-distribution
1. The distribution is lower at the mean and flatter across the axis. T-distribution has greater area
under its tails than the normal distribution.
2. It is not symmetrical, that is, variance is greater than one. To make it symmetrical, we can
increase the degrees of freedom which would lead the t-distribution towards normal distribution.
Z-test: The test was given by Fischer and is used when population standard deviation is known.
The test is used when we need to identify whether the two samples are from the same population
or not.
Assumptions:
1. Sample size is large, that is, n > 30.
2. Population variance is known.
3. Population is normally distributed.
Applications of Z-Test
Z-test is used to compare the sample mean to a hypothesized mean of the population in case
of large samples.
It is also used to test the difference between the mean of two samples, assuming that they
have been drawn from the same population.
It is also used to test the difference between the two sample means, when the sample drawn
is from two different population.
Uses of Z-Test are:
1. To test the significance of ‘r’
2. To test the significance of the difference between two independent correlation coefficient
derived from different samples.
F-test: The test was given by Fischer in 1920s and is closely related with ANOVA. It is also known
as Variance Ratio Test.
Assumptions:
1. Normality: Normal Distribution
2. Homogeneity: Variance in each group should be equal for all groups.
3. Independence of error: Variance of each value around its mean should be independent.
Used To find out whether two independent estimates and population variances differ
significantly or whether two samples may be regarded as drawn from the normal
population having the same variance.
Chi-Square Test
It is a non-parametric test and does not make any assumptions about population from which
samples are drawn. It was first used by Karl Pearson in 1900. χ2 = ⅀(O – E)2 / E
Where: O is the observed value and E is the expected value.
Application of Chi-Square tests:
1. It is used to test the discrepancies between the observed frequencies and the expected
frequencies.
2. It is also used to test the goodness of fit.
3. It is used to determine the association between two or more attributes.
Features of the test:
1. It is a test of independence.
2. It is a test of goodness of fit.
3. It is a test of homogeneity, where two or more samples are drawn from same population or
different population.
4. Chi-Square distribution is skewed to the right and the skewness can be reduced by increasing
the degrees of freedom.
5. Value of χ2 is always positive and upper limit is infinity.
6. It applies Yates Correction (developed in 1934) to reduce the inflated difference between the
observed and theoretical frequencies.
QUESTIONS FOR CLARIFICATION
1. Consider the following statements
The coefficient of correlation
1.Is not affected by a change of origin and scale
2.lies between -1 and +1
3.Is a relative measure of linear association between two or more variables
Codes
A. 1,2 and 3 B. 1 and 3
C. 2 and 3 D. 1 only
2. Which of the following is referred to as lack of peakedness
A. Skewness B. Kurtosis
C. Moments D. Mode
3. Which of the following statements is true
1.When the margin of error is small, the confidence level is high
2.When the margin of error is small, the confidence level is low
3.A confidence interval is a type of point estimate
4.A population mean is an example of a point estimate
Codes:
A. 1 only B. 2 only
C. 4 only D. None of the above
4. In case of high income inequality ,the income distribution is
A. A symmetric distribution
B. U shaped distribution
C. Inverted J-Shaped distribution
D. None of the above
5. The basic construction of price index number involves which of the following steps
1.Selection of a base year and price of a group of commodities in that year
2.Prices of a group of commodities for the given year that are to be compared
3.Changes in the price of the given year are shown as percentage variation for the base year
4.Index number of price of a given year is denoted as100
Codes:
A. 1 ,2 and 3 B. 2,3 and 4
C. 1,3 and 4 D. 1,2 and 4
Answers:-
1) B 2) A 3) D 4) C 5) A