UGC NET STATISTICS FOR ECONOMICS MATERIAL

Human-Omics July 02, 2020

MEASURES OF CENTRAL TENDENCY

1. Arithmetic Mean: It is the most common form of average and is used only in case of

quantitative data. It can be calculated for both grouped as well as ungrouped data.

Properties of Arithmetic Mean:

1. Sum of deviations of all items from the mean is zero, that is, ⅀(X- X ) = 0

2. Sum of squared deviations from the mean is minimum.

3. If all items are replaced by some number, then mean won’t change.

4. If we add/subtract/multiply/divide all the items with a number, then the mean would be

changed in the same manner.

5. It is capable of further mathematical treatment like combined mean.

Combined mean = ( X1N1 + X 2 N2) / (N1 + N2)

Properties of Geometric Mean

1. Geometric Mean is less than the arithmetic mean, that is, GM<AM

2. The product of all items will remain the same if each item is replaced by the geometric mean.

3. The geometric mean of the ratio of two corresponding observations in a series is equal to the

ratio of their geometric means.

4. The geometric mean of the product of corresponding items in a geometric mean is equal to

the product of their geometric means.

Harmonic Mean: It is the reciprocal of the arithmetic mean of the reciprocal of all items in a

series. Harmonic Mean is also applicable in case of quantitative data.

It is used less in economics because it gives largest weight to smallest items. It is used in

probability of time and speed and also in finding their rates.

Relationship between Arithmetic Mean, Geometric Mean and Harmonic Mean

1. If all items are the same then, AM=GM=HM. If not, then, AM>GM>HM.

2. GM = √𝑨𝑴 ∗ 𝑯𝑴

POSITIONAL AVERAGES

1. Median (M): Median divides the series into two equal parts. The sum of deviations from the

median will be minimum if we ignore the signs.

Quartiles: It divides the series into four equal parts.

Q1 = (N+1)/4 = First Quartile

Q2 = 3(N+1)/4 = Third Quartile

Second Quartile = Median

Deciles: They divide the series into ten equal parts. series.

Percentiles: They divide the series into hundred equal parts.

The percentiles and deciles are used in education statistics, and psychological statistics.

Mode (Z): It is the most frequently occurring item in the series. For individual and discrete series,

mode is calculated using the most frequently occurring value, while for continuous series, we first

identify the modal class with the highest frequency and then use the following formula:

MEASURES OF DISPERSION

Dispersion is the degree or amount by which items in a series are different from the central

value. It will tell us the reliability and representative of an average calculated from the series.

Range: It is the difference between the largest and the smallest value of a series.

Range = L - S

Coefficient of Range = (𝐋−𝐒) / (𝐋+𝐒)

It must be noted that though Range is very easy to calculate, it is usually not used as a

measure of variability because of its inherent feature of instability. Therefore, range is

considered to be a very limited measure of variability.

Quartile Deviation: It is the difference between the third quartile (Q3) and the first quartile (Q1).

Semi-Quartile Range = (Q3 – Q1)/2

Coefficient of Quartile Deviation =𝐐𝟑−𝐐𝟏 / 𝐐𝟑+𝐐𝟏

Quartile deviation measures the variability of the middle 50 percent values, that is, which are

between Q3 and Q1.

Mean or Average Deviation: It is the sum of items taken from any central tendency.

It is usually preferred from median because the deviations from the median are minimum

when ignoring signs.

Coefficient of Mean Deviation Formula

From Mean Mean Deviation from mean/Mean

From Median Mean Deviation from mean/Median

From Mode Mean Deviation from mean/Mode

Standard Deviation/Root Mean Square Deviation: It was introduced by Karl Pearson in 1823.

Coefficient of Variation (CV): It was given by Karl Pearson.

CV = 𝝈 / 𝑴𝒆𝒂𝒏 ∗ 𝟏𝟎𝟎

Higher the value, poorer the series.

Variance: The concept of variance was given by R.A. Fischer in 1913. It is simply the square of

standard deviation Graphical Method of measuring dispersion is the

Lorenz Curve

Graphical Method of measuring dispersion

The concept was given by Musgrave through Lorenz Curve

Where, the straight line is considered to be the line of equality. Closer is the curve to the line

of equality, less will be the dispersion.

SKEWNESS

Skewness measures the direction in which the values are dispersed. It measures the degree of

symmetry or asymmetry of the series.

Symmetric Distribution: A series is said to

be symmetric when it has the same shape on

both the sides of the centre. A symmetric

distribution which has only one peak is known

as a normal distribution.

Positively skewed distribution: A distribution

is considered to be positively skewed when it

has a long tail extended from its right. Under

such kind of a distribution, mean is greater than

the median, and is prone to large shifts when

the sample drawn is small and contains extreme

values.

Negatively Skewed: A series is said to

be negatively skewed when it has a

long tail extending from its left.

Tests of Skewness

1. If mean = median = mode, then the series is symmetrical.

2. Shape of the curve also determines whether the series is skewed or not.

3. If Q3 – Median = Median – Q1 then the series is symmetrical.

4. For a series to be symmetrical, the sum of positive deviations from the median should be equal

to the sum of negative deviations from the median.

5. In case of a symmetrical distribution, the frequencies are equally distributed at points of equal

deviations from the mode.

MOMENTS

Moments can be calculated using three ways:

1. Central Moments or Moments about the actual mean (μ)

μr = ⅀(X-Mean)r/N

μ1 = ⅀(X-Mean)1/N = Zero

μ2 = ⅀(X-Mean)2/N = Variance

2. Non - Central Moments or raw moments (Using assumed mean) (μ’)

It is of least utility and is used for conversion purposes only.

μr’ = ⅀(X-A)r / N

3. Moments about the origin or zero (v)

vr = ⅀(X-0)r / N

v1 = ⅀(X)1 / N = Mean

μ2 and μ3 measures skewness.

μ2 and μ4 measures kurtosis.

KURTOSIS

Kurtosis determines the shape of the curve, that is, degree of flatness or peakedness of the curve.

It refers to how the items in a distribution are concentrated in the Centre.

Mesokurtic: A normal distribution is also known as mesokurtic. There is neither an excess or

deficient items in the centre of a mesokurtic curve

Platykurtic: The items in a platykurtic are scattered around the shoulder and are not

concentrated around the centre or at the tails.

Leptokurtic: The items are more concentrated in the centre in case of a leptokurtic curve. As a

result of this, such curve has a sharp peak.

Measure of Kurtosis

β2 = μ4 / μ22

Ɣ2 = β2 – 3

β2 = 3 Mesokurtic

β2 > 3 Leptokurtic

β2 < 3 Platykurtic

CORRELATION

Correlation determines the relationship between two or more variables. It determines the degree

and direction of relationship between two or more than two variables.

Spurious Correlation is the one in which relationship cannot be determined.

Types of Correlation

Depending upon direction of change

1. Positive correlation: Both the variables move in the same direction.

2. Negative correlation: Both variables change in the opposite directions.

Ratio of changes

1. Linear correlation: It reflects the same ratio of change.

2. Curvilinear: Ratio of change between the variables is not the same.

 Degree of relationship

1. Multiple correlation: Correlation between more than two variables.

2. Simple Correlation: Correlation between two variables.

3. Partial Correlation: Relationship is studied between two variables keeping other variables

constant.

Properties of Correlation (‘r’)

1. Its value lies between -1 and +1, that is, -1 ≤ r ≤ +1

2. r is independent of change in origin and change in scale.

3. r is the under-root of the product of two regression coefficients.

4. It is symmetric, that is, rxy = ryx

5. If the two variables are not related or are independent of each other, then r is equal to zero. of

each other.

Probable Error: The probable error is used to test the value and significance of the correlation

coefficient. It helps in testing the reliability of r. we can also find out the exact lower and upper

limits within which r is expected to lie.

Methods for Measuring Correlation

1. Graphical or Scattered Diagram Method

2. Karl Pearson’s Coefficient of Correlation: It is based on the idea of covariance.

3. Spearman’s rank correlation method: The method was developed in 1904. It is useful for

finding the correlation in case of qualitative data. Under this, no assumption is made regarding

the distribution of data.

Properties: -

1. ⅀D = ⅀(R1 – R2) = 0

2. R is distribution free and non-parametric.

3. R=r when all values are different and no value is repeated.

QUESTIONS FOR CLARIFICATION

1. Which one of the following statistical measures is not affected by extremely Large or small

values

A. Median B. Harmonic mean

C. Standard deviation D. Coefficient of variation

2. Which of the following is not a characteristic of a good average

A. It should be easy and simple to understand

B. It should have a sampling stability

C. It should not be rigidly defined

D. It should be based on all the observations

3. In any set of numbers, the geometric mean exists only when all numbers are

A. Negative B. Positive, zero or negative

C. Positive D. Zero

4. Which of the following is true about the arithmetic mean

A. It possesses a large number of characteristics of a good average

B. It is unduly affected by the presence of extreme values

C. In extremely asymmetrical distribution it is not a good measure of central tendency

Codes:

A. 1 only B. 1,2 and 3

C. 2 and 3 only D. 3 only

5. Assertion (A) : skewness measures Regression

Reason (R): Kurtosis measures flatness at the top of the frequency curve.

A. Both A and B are true and R is the correct explanation of A

B. Both A and R are true but R is not the correct explanation of A

C. A is false but R is true

D. A is true but R is false

6. Consider the following statements

1.Quartile deviation is more instructive range as it discards the dispersion of extreme items

2.Coefficient of quartile deviation cannot be used to compare the degree of variation in different

distributions

3.There are 10 deciles for a series

Codes:

A.1,2 and 3 B.2 only

C.3 only D.1 only

Answers:-

1) A 2) D 3) C 4) B 5) C 6) D

REGRESSION

from correlation because values of dependent variable is calculated on the basis of

given values of explanatory variables.

Standard Error of Regression Estimate: It measures the dispersion around the average or is

commonly known as the mean relationship. It measures the reliability of regression

coefficient.

Coefficient of Determination: It is measured using the following formula:

Coefficient of Determination (r2) = 𝐄𝐱𝐩𝐥𝐚𝐢𝐧𝐞𝐝 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧 / 𝐓𝐨𝐭𝐚𝐥 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧

Correlation coefficient (r) will be equal to r2 only when r = 0 or 1. Also, r tells the direction of

correlation while r2 doesn’t provide us with any such information.

Coefficient of Non-Determination (k2) = 1 – r2 or

k2= 𝐔𝐧𝐞𝐱𝐩𝐥𝐚𝐢𝐧𝐞𝐝 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧 / 𝐓𝐨𝐭𝐚𝐥 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧

TYPES OF DATA

1. Primary Data: This includes Direct personal investigation , Indirect/Telephonic oral

investigation, Information through local sources or correspondents, Schedule filled in by the

respondents, Schedule filled in by the enumerators.

2. Secondary Data: It is the second hand information and is not costly.

Census

Census covers each item of the population and is considered to be very authentic and

reliable. However, it is time consuming and costly.

Sampling

Law of Statistical Regularity: Any sample out of the population will have the properties of

the population.

Inertia of large numbers: Larger the sample size, more authentic is the sample.

Sampling Methods

1. Probability/Random/Chance Sampling: All units of the population have the same chance of

being selected in the sample.

Simple Unrestricted Probability Sampling: Under this method, the sample is drawn randomly,

that is, each item has equal probability of getting selected in the sample.

Stratified Sampling: Under stratified sampling, the population is divided into different strata.

Strata are defined in such a way that the population inside one strata is homogenous in nature.

Later, the sample is drawn from each strata, hence forming our complete sample.

Systematic Sampling: Under this, the population is arranged in an order. A sample is so

constituted that every nth term is a part of the sample. For example, I choose every 7th individual

to be a part of my sample.

Multi-Stage/Cluster sampling: Under this method, the sample is selected in stages, starting from

an elementary stage.

2. Non-Probability Sampling: It is non-random. All items of the population don’t have the same

chance of being selected.

Types:

Judgement Sampling: It is also known as ‘Sampling by Opinion’. Under this, the person,

according to his own judgement, decides on who is to be included in the sample.

Quota Sampling: Under this method, a quota is decided on to how much should be the sample.

Further, each investigator is given a quota to interview a specific number of respondents.

Convenience Sampling: Under this, the sample is formulated according to the convenience of

the investigator

Sampling Errors

Error is the difference between sample statistic and population parameters.

Types of errors are:

1. Sampling Error: It can occur because of any procedure involved in sampling.

Biased: Due to human element

Unbiased: Due to problem in procedure of selection.

With increase in sample size, the sampling errors (especially the unbiased errors) decrease.

2. Non-Sampling Error: It occurs after the selection of a sample. For example, faulty printing,

coding, tabulation of data, etc.

PROBABILITY DISTRIBUTION

Observed Frequency Distribution: Based on observations

Theoretical Probability Distribution 1. Binomial

2. Normal

3. Poisson

Binomial Distribution: It is given by James Bernoulli and thus is also known as Bernoulli

distribution. It is a kind of a discrete possibility distribution where we have only two cases: either

a success or a failure.

Graphically, binomial distribution will be symmetrical when p = ½. If p > ½ then the

distribution is positively skewed and when p < ½ then it is negatively skewed.

Skewness of binomial distribution will be less pronounced if ‘n’ is increased. If we increase p

for a fixed n, then binomial distribution will shift to the right. Mode, in this case, is the value of

n which has the highest probability.

Binomial Distribution is also called a limiting case of normal distribution.

Mean = n*p

Variance = npq

Binomial distribution is usually used in business and social sciences as well as for quality

control.

Poisson distribution: It was given by Denis Poisson in 1937. It is used when the number of trials

are infinite but probability of success is very small and probability of failure is tending towards one.

It is used in case of finding the number of defects, number of accidents, number of causalities,

etc.

Mean = np

Variance= np (Mean = Variance)

Normal Distribution: It was given by A.D. Moivre, Karl Gauss and Laplace in 1733.

Properties:

1. The distribution is symmetrical, hence it resembles a bell-shaped curve.

2. The normal curve is symmetrical and is unimodal.

3. The two tails of the curve do not touch the axis, that is, they continue to extend indefinitely.

4. Total area under the normal curve is 1. 5. Mean = Median = Mode

6. Standard Normal Distribution is the random variable which has a normal distribution with

mean equal to zero and standard deviation equal to one.

Poisson distribution is a limiting case of normal distribution if m is large.

Mean Zero

Standard Deviation One

Variance One

INDEX NUMBERS

First Price Index was given in Italy in 1754. It was used to compare price changes in the time

period 1750 to 1760.

1. Simple Index Numbers: It is an Unweighted Index Number.

Simple Aggregative Method = ⅀𝑷𝟏 / ⅀𝑷𝟎 ∗ 𝟏𝟎𝟎

Price Relatives: Firstly the price relatives are calculated using the formula:

Price Relatives = 𝑷𝟏/ 𝑷𝟎 ∗ 𝟏𝟎𝟎

Then a simple average of price relatives is calculated, that is,

(⅀Price relatives)/n

Laspeyer’s Index (L) P01 = (⅀P1Q0/⅀P0Q0 )*100

 Under this, the base year quantities are taken as weights. The index value has an upwards bias

as it over- estimates the price changes.

Paasche’s Index (P) P01 = (⅀P1Q1/⅀P0Q1 )*100

 Under this, the current year quantities are taken as weights. It underestimates the price

changes and as a result has a downward bias.

Dorbish-Bowley Index (L+P)/2

It uses both the current and base year quantities as weights. It is the arithmetic mean of both

Laspeyer’s and Paasche’s index.

Fischer’s Ideal Index √𝐿 ∗ 𝑃

It is the geometric mean of both Laspeyer and Paasche’s Index. It is considered to be an ideal

index because: 1. Both current year and base year quantities are taken.

2. It cancels the upward and downward bias.

3. It satisfies the time reversal and factor reversal test.

Marshall-Edgeworth Index P01 = [( ⅀P1(Q0+Q1))/( ⅀P1(Q0+Q1)) *100

It is the sum of current and base year quantities as weights.

Kelly’s Index P01 = (⅀P1q/⅀P0q) *100

It is also known as fixed weight aggregative method.

Tests of Adequacy of Index Numbers

Unit Test

Formula for index number construction should be independent of the units in which prices and

quantities are quoted.

Apart from simple unweighted aggregative index, this test is satisfied by all other indices.

Time Reversal Test P01 * P10 = 1

It was given by Fischer. With the basis reversed, the two indices should be reciprocals of each

other, that is, interchanging the time periods should not lead to inconsistent results. It is satisfied

by; 1. Fischer’s index 2. Marshall-Edgeworth index

3. Simple Geometric mean of price relatives 4. Aggregates with fixed weights.

5. Weighted geometric mean

Factor Reversal Test P01 * Q10 = V01 : Given by Fischer.

On interchanging the prices and quantities, the results should be consistent.

This test will be satisfied by Fischer’s Index

Circular Test P01* P12 * P20 = 1

It is an extension of time reversal test and also the shiftability of weights. The index should be

able to adjust index values from period to period without referring to the original base.

This test is satisfied using 1. Simple Geometric mean of price relatives

2. Weighted aggregative with fixed weights.

Splicing of Index Numbers: Replacing the old series with the new series.

Deflating of Index Numbers: Accounting for price changes.

TIME SERIES

Time series analysis that over a period of time, how does the value of the variable changes. It

forecasts future achievements.

Components of Time Series are:

1. Secular Trend (T): Persisting movement of any variable over a period of time.

2. Seasonal Variations (S): Repetitive movements in every season.

3. Cyclical Variation (C): Business Cycles

4. Irregular Variations (I): Completely random

STATISTICAL INFERENCE

Statistical Inference is that branch of statistics where we use probabilities to deal with

uncertainties in decision making. It involves: 1. Hypothesis Testing

2. Estimation

Hypothesis: It is a general statement made about any relationship. In other words, it is a tentative

assumption made about any relationship/population parameter.

Null Hypothesis (H0): Null hypothesis is stated for the purpose of testing or verifying its validity.

It assumes that there is no difference between population parameter and sample statistics and if

there is any difference, it is by chance.

Alternate Hypothesis (H1): It includes any other admissible hypothesis, other than null

hypothesis. Alternate hypothesis is accepted when the null hypothesis is rejected.

Level of Significance: It is the probability of committing Type I error.

Power of a test: It analysis how well the test is working and depends majorly on Type II error.

Power of a test can be measured by 1-β.

Two Tailed Test: In this, the critical region lies on both sides. It does not tell us whether the value

is less than or greater than the desired value. The rejection region under this test is taken on both

the sides of the distribution. For example,

H0: β = 100

H1: β≠ 100

One Tailed Test: Under this, H1 can either be greater than or less than the desired value. The

rejection region, under this test, is taken only on one side of the distribution.

H0: β = 100

H1: β > 100

Any value which is based on the sample data is known to be a sample statistic and the value

calculated from the population is known as population parameter.

Standard error of sampling distribution tells us the standard deviation of a sampling distribution.

Standard error of a sample = σ / √𝒏

Steps of Testing Hypothesis

1. Set your hypothesis.

2. Choose a level of significance

3. Choose a test

4. Make calculations or computations using the test.

5. Decision making stage.

Estimators

There are two kinds of estimators

1. Point estimators

2. Interval estimators: They are the confidence intervals in which lower and upper value of the

parameter will lie.

Properties of estimators

1. Unbiasedness

2. Consistency

3. Efficiency

4. Sufficiency: should include maximum information about the population parameter.

Testing of Attributes

To test the attributes, we calculate the standard error, and the difference between the actual

and the observed values. If the difference is greater than the standard error at a particular

level of significance, we reject the null hypothesis. If the difference is less than the standard

error, we accept the hypothesis.

T-Test: It was given by William Gosset in 1905 and is also known as Student’s T-test.

Assumptions

a. Number of observations is less than 30.

b. Random sampling distribution is approximately normal.

c. It is used when standard deviation of population is not known.

If calculated value is greater than the table value, then difference is significant and we reject

the null hypothesis. If calculated value is less than the table value, then we accept the null

hypothesis implying that there is no significant difference.

Properties of t-distribution

1. The distribution is lower at the mean and flatter across the axis. T-distribution has greater area

under its tails than the normal distribution.

2. It is not symmetrical, that is, variance is greater than one. To make it symmetrical, we can

increase the degrees of freedom which would lead the t-distribution towards normal distribution.

Z-test: The test was given by Fischer and is used when population standard deviation is known.

The test is used when we need to identify whether the two samples are from the same population

or not.

Assumptions:

1. Sample size is large, that is, n > 30.

2. Population variance is known.

3. Population is normally distributed.

Applications of Z-Test

Z-test is used to compare the sample mean to a hypothesized mean of the population in case

of large samples.

It is also used to test the difference between the mean of two samples, assuming that they

have been drawn from the same population.

It is also used to test the difference between the two sample means, when the sample drawn

is from two different population.

Uses of Z-Test are:

1. To test the significance of ‘r’

2. To test the significance of the difference between two independent correlation coefficient

derived from different samples.

F-test: The test was given by Fischer in 1920s and is closely related with ANOVA. It is also known

as Variance Ratio Test.

Assumptions:

1. Normality: Normal Distribution

2. Homogeneity: Variance in each group should be equal for all groups.

3. Independence of error: Variance of each value around its mean should be independent.

Used To find out whether two independent estimates and population variances differ

significantly or whether two samples may be regarded as drawn from the normal

population having the same variance.

Chi-Square Test

It is a non-parametric test and does not make any assumptions about population from which

samples are drawn. It was first used by Karl Pearson in 1900. χ2 = ⅀(O – E)2 / E

Where: O is the observed value and E is the expected value.

Application of Chi-Square tests:

1. It is used to test the discrepancies between the observed frequencies and the expected

frequencies.

2. It is also used to test the goodness of fit.

3. It is used to determine the association between two or more attributes.

Features of the test:

1. It is a test of independence.

2. It is a test of goodness of fit.

3. It is a test of homogeneity, where two or more samples are drawn from same population or

different population.

4. Chi-Square distribution is skewed to the right and the skewness can be reduced by increasing

the degrees of freedom.

5. Value of χ2 is always positive and upper limit is infinity.

6. It applies Yates Correction (developed in 1934) to reduce the inflated difference between the

observed and theoretical frequencies.

QUESTIONS FOR CLARIFICATION

1. Consider the following statements

The coefficient of correlation

1.Is not affected by a change of origin and scale

2.lies between -1 and +1

3.Is a relative measure of linear association between two or more variables

Codes

A. 1,2 and 3 B. 1 and 3

C. 2 and 3 D. 1 only

2. Which of the following is referred to as lack of peakedness

A. Skewness B. Kurtosis

C. Moments D. Mode

3. Which of the following statements is true

1.When the margin of error is small, the confidence level is high

2.When the margin of error is small, the confidence level is low

3.A confidence interval is a type of point estimate

4.A population mean is an example of a point estimate

Codes:

A. 1 only B. 2 only

C. 4 only D. None of the above

4. In case of high income inequality ,the income distribution is

A. A symmetric distribution

B. U shaped distribution

C. Inverted J-Shaped distribution

D. None of the above

5. The basic construction of price index number involves which of the following steps

1.Selection of a base year and price of a group of commodities in that year

2.Prices of a group of commodities for the given year that are to be compared

3.Changes in the price of the given year are shown as percentage variation for the base year

4.Index number of price of a given year is denoted as100

Codes:

A. 1 ,2 and 3 B. 2,3 and 4

C. 1,3 and 4 D. 1,2 and 4

Answers:-

1) B 2) A 3) D 4) C 5) A

Thursday, 2 July 2020

UGC NET STATISTICS FOR ECONOMICS MATERIAL

Human-Omics

Translate