UGC NET ECONOMETRICS MATERIAL
INTRODUCTION TO ECONOMETRICS 
The term “Econometrics” was given by Ragnar Frisch. The study is a combination of 
mathematics and statistics applied together to understand the economic relationships 
between the variables taken into consideration. 
Theoretical Study: Forming tools of studying basic method. The study focuses on the 
development of appropriate methods for measuring economic relationships in specified 
economic models. 
Applied Study: It deals with the tools of theoretical econometrics which are used to study special 
field of economics. 
Methodology used in Econometrics 
Following are the steps which are followed to identify any economic relationships in 
econometrics: 
 Statement of the theory/hypothesis Specification of mathematical model Formation of econometric model Obtain data Estimation of the model and identify the numerical 
values of the parameters  Hypothesis testing Forecasting or prediction
TYPES OF DATA 
Data is classified into following types:
Time series: data refers to the series of data for one variable over a period of time. For example, 
to see the trend of inflation levels in the economy, the inflation in the past few years is collected 
and hence analyzed. 
Cross Sectional data: refers to the data collected for more than one variables collected at a 
single point of time. For example, to analyze the standard of living of people for the year 2013-14, 
one may collect data related to income levels, education levels, health status, etc. for the same 
year, that is, 2013-14. 
Pooled cross-section and time series data: Example variables in 2005 combined with variables 
in 2010. 
Panel data: is the multidimensional data for more than one variables over a period of time. In 
simple words, it is the combination of both time series data as well as cross sectional data.
REGRESSION 
The term regression was introduced by Francis Galton. It tells us the dependence of one variable 
on one or more independent variable and tries to figure it out. 
Dependent variable is also known as the Explained variable, Regressand, Endogenous variable, 
controlled variable, Predictand variable, Response. 
Independent variable is also known as the Explanatory variable, Regressor, Predictor, Stimulus, 
Exogenous variable, control variable. 
To explain the concept of dependent and independent variable, we take 
E(Y/Xi) 
The above expression is known as a Conditional Expected Value or Mean. It states the expected 
average value of y given the value of X, hence our y is the dependent variable and X is the 
independent variable. 
Further elaborating it,
E(Y/Xi) = β1 + β2Xi
 It is a Population Regression Function (PRF), also known as Conditional Expectation 
 Function. 
 Population Regression Line is the locus of the conditional means of the dependent variable 
for the fixed values of the explanatory variables. 
However, for samples, we have a Stochastic Regression Function (SRF) which is defined as 
follows: y – E(y/Xi) = μ 
y = E(y/Xi) + μ 
This can be expressed as: 𝐲 = 𝛃𝟏 + 𝛃𝟐𝐗𝐢 + μ
Where: 𝛽𝑖 are the sample estimators of population parameter βi 
𝜇𝑖 is the random quotient which is unexplained, un-systematic and cannot be determined. 
𝛽1 + 𝛽2𝑋𝑖 is the explained or systematic part which can be determined.
Relevance of Stochastic term (μi) 
1. To incorporate the randomness in human behavior. 
2. Principle of parsimony that is, lesser the explanatory variables; better will be the results. 
3. Vagueness of the theory 
4. Unavailability of the data 
5. Poor proxy variables 
6. Interdependence of variables 
OLS METHOD 
OLS was introduced by Carl Gauss 
According to the method, ⅀(𝒚𝒊 − 𝒚̂𝒊)² should be minimum
We have to choose the Stochastic Regression Function in such a manner that it is as close as 
possible to the actual income (y). The choice among the lines have to be done on the basis of 
least squares criteria. This implies, the smaller the deviation from the line, the better the fit of 
the line. 
⅀(𝑦𝑖 − 𝛽1 − 𝛽2𝑋𝑖)²
𝛃𝟏 = 𝐲− 𝛃𝟐𝐗
𝛃= Σ 𝐱𝐢𝐲𝐢 / Σ 𝐱i²
Assumptions of OLS/Simple Linear Regression Model 
Following are the assumptions of OLS Model: 
1. Model should be linear. 
2. X values are fixed in repeated sampling. 
3. 𝑋value or expected value of the random term is zero. 
E(ui/Xi) = 0
4. No auto-correlation between the random terms. 
Cov (uiuj) = 0 
5. Homoscedasticity or constant variance of the error term. 
Var (ui/Xi) = E[ui – E(ui/Xi)]² = σ²
 If the assumption is violated then this implies presence of heteroscedasticity. The variance of 
error term about its mean is constant for all values of X. 
6. No multi-collinearity, that is, explanatory variables should not be related to each other. In other 
words, one variable cannot be expressed in terms of other variables. 
7. Specification of the model should be correct. 
8. Error term is independent of the explanatory variables. 
Cov (uiXi) = 0 
9. Number of observations in the sample should be greater than the number of parameters 
estimated. 
10. Variance of the explanatory variable should be a positive number.
Properties of Estimates 
1. Small Sample or Finite Sample Properties: 
Unbiasedness: Unbiased means that the expected value should be equal to the actual value. 
Therefore, the property can be represented as, 𝛃= 𝐄(𝛃)
Linearity: The model should be linear. 
Minimum Variance: OLS estimates give us minimum variance. 
Efficiency: When the property of unbiasedness and minimum variance hold true, it means that 
the model is efficient. 
Sufficiency property: OLS estimates are sufficient because it utilizes all the information a sample 
contains about the true parameter. For example, mean is a sufficient estimator. 
Minimum ‘Mean Square Estimator’ (MSE): An estimator is a minimum MSE if it has a smallest 
mean square error. i.e., 𝐌𝐒𝐄 = 𝐄(𝛃− 𝐛)𝟐
These properties are called the BLUE properties, that is, Best Linear Unbiased Efficient properties.
2. Large Sample/Asymptotic properties: 
When sample sizes increases infinitely, then these properties should be satisfied: 
Unbiasness: This means that asymptotic mean of the estimator should be equal to the true value 
of the parameter. It only holds for large samples and not for finite samples. 
Asymptotic consistency: Any estimator is consistent then the true population parameters satisfy 
the two conditions: • Asymptotic unbiasness
• Asymptotic variance 
Asymptotic efficiency: 
An estimator is efficient if: 1. 𝑏
̂
is consistent 
2. It has the smallest asymptotic variance as compared to any 
other consistent estimator. 
When the sample size increases, variance becomes minimum. Also, with the increase in 
sample size, the estimators converge to unbiasedness.
Gauss Markov Theory 
If all the assumptions of classical linear regression model are fulfilled, then the estimates that we 
obtain are linear, unbiased or in other words they are best linear unbiased estimators. These 
properties are also called BLUE properties. 
Multiple Linear Regression Model 
Multiple Linear Regression Model is used when there are more than one explanatory variables in 
the model. For example, 
y = β1 + β2X1i + β3X2i + ui 
Where: β2 and β3 are the partial slope coefficients 
The assumptions in case of Multiple Linear Regression Models are the same as OLS 
However, under both, Simple or Multiple Linear Regression Model if any of the assumptions are 
violated, then there can be three consequences: 1. Multi-collinearity
2. Heteroscedasticity 
3. Auto-correlation
MULTI-COLLINEARITY 
The term multi-collinearity was given by Ragnar Frisch. When explanatory variables are related to 
each other, it implies existence of multi-collinearity in the model. 
Perfect Multi-collinearity
If none of the coefficients of X, that is, β is equal to zero, then none explanatory variable can 
be converted into another.
β1X1i + β2X2i + ..............+ βnXni = 0
2. Imperfect Multi-collinearity
It is the case when one explanatory variable cannot be perfectly converted into another, and the 
model is:
β1X1i + β2X2i + .............. + βnXni + vi = 0
PCauses of Multi-collinearity
1. Problems in the specification of the model or nature of specification of the model.
2. Data collection methods may impose a limiting range on the values
 taken by the regressors in a population.
3. Constraint on the model may also result in the problem of multi-collinearity.
4. Over-determined model also exhibit the problem of multi-collinearity. If number of 
observations are greater than the number of parameters, we will not be able to get unbiased and 
unique solutions.
5. Lagged values of the variables included in the model also lead to multi-collinearity. In other 
words, multi-collinearity is more commonly found in time series data.
Consequences of Multi-collinearity
1. Even though the OLS estimates are BLUE, but they have large variances and covariances.
2. Confidence intervals are widened leading to the acceptance of null hypothesis (H0) more
frequently.
3. T-ratio becomes statistically insignificant.
4. R2 is very high but t-ratios are insignificant.
5. OLS estimators and their standard errors can be very sensitive to changes in the data
Detection of Multi-collinearity
Following are the signs which could help locate multi-collinearity in our models: 
1. High R2 and few significant t-ratios. 
2. High pair wise correlations among the regressors will also indicate multi-collinearity. This is 
only a necessary condition and not sufficient, which means that even if the pair wise correlation 
coefficients are zero, multi-collinearity can exist. 
3. Frisch’s Confluence Test: It regresses the dependent variable on each of the explanatory 
variable separately and thus obtain all the possible simple regressions and examine the results on 
the basis of: • Apriori knowledge 
• Statistical criteria 
4. Auxiliary Regression: Under this we exclude one variables and regress it on other variables to 
estimate our R2. Steps followed are: 
Regress y on X2 Calculate R2 Exclude X and regress it on other variables 
(Xs) Calculate RA²
 We then apply Klein’s Rule of thumb stating that, If RA²>R², then Multi-collinearity is present
Remedies for correcting multi-collinearity
1. Using the apriori information 
2. Dropping a variable. 
3. Transformation of a variable: 
Given a set of data, if there is a problem of multi-collinearity, it can also be solved using the 
first difference form. 
4. Additional or new data: 
 Since multi-collinearity is a sample phenomenon, it can be reduced by taking another sample 
or by increasing the sample size. 
5. Reducing polynomial regressions: 
 Having lesser number of polynomial variables in regression equation also takes care of multi-
collinearity. 
6. Combining cross-section and time series data: 
 In the face of multi-collinearity, one method of reducing it is pooling of time-series and cross-
section data. This is done by estimating regression coefficient from cross-section data and 
then incorporating them in the original regression equation.
QUESTIONS FOR CLARIFICATION
1. To test the stationarity of the series in time series analysis, the following test will be used
A. Unit Root test
B. Random Walk test
C. Cochrane –Orcutt Iterative procedure test
D. Durban –Watson statistic test
2. By solving which of these simultaneous equations we obtain the least squares estimators
A. Non normal equations
B. Normal equations
C. Linear equations
D. Non linear equations
3. Multi –Linear regression models 
A. Are linear in parameter and linear in variables
B. Are linear in parameter and may not be linear in variables
C. May not be linear in parameter but are linear in variables
D. May not be linear in parameter and variables
4. The assumption of Multi col linearity means that 
A. There should be no correlation among the regressors
B. There should be no linear relationship among the regressors
C. There should be linear relationship among the regressors
D. There should be no relationship among the regressors.
 5. In a multiple regression with three independent variables, the regression coefficients are ti be 
tested. which test would be used
A. Z test B. F test 
C. Chi Square test D. t test
6. Multiple coefficient of determination measures the 
A. Goodness of fit of multiple regression model
B. Homoscedasticity of multiple regression model
C. Heteroscedasticity of multiple regression model
D. Multicollinearity of multiple regression model
7. Which of the following is true concerning standard regression model
A. Y has a probability distribution
B. X has a probability distribution
C. The distribution term is assumed to be correlated with x
D. For an adequate model the residual will be zero for all sample data points
8. If OLS is applied separately to each equation that is part of a simultaneous equation system the 
resulting estimates will be
A. Unbiased consistent
B. Biased consistent
C. Biased and inconsistent
D. Unbiased and inconsistent 
Answers:-
1) A 2) C 3) B 4) B 5) D 6) B 7) A 8) D
HETEROSCEDASTICITY 
 The term heteroscedasticity means that the variance of error terms is not constant, that is, 
different error terms have different variances. 
𝛔𝟐 = 𝐟(𝐗𝐢) 
So, heteroscedasticity is the problem of fluctuating variances of error terms. Moreover, the 
variance of the error terms depends on the value of the explanatory variables. It is a rule in 
cross-section data and very rate in time series data. 
Causes of Heteroscedasticity 
1. All the models pertaining to learning skills or error learning models exhibit heteroscedasticity. 
2. Economic variables such as income and wealth also show heteroscedasticity because as income 
or wealth increase, so is the discretion to use it. 
3. Some economic variables exhibit the skewness in distribution. 
4. Data collection technique improves with time. As a result constant variance is not found for all 
those economic variables where data collecting techniques are changing very fast.
5. Specification errors in the models also lead to heteroscedasticity. 
6. Incorrect data transformation methods also show heteroscedasticity. 
7. Outliers in the data may result into the problem of heteroscedasticity
Consequences of heteroscedasticity 
1. The estimators are still linear, unbiased and consistency doesn’t change. 
2. However, the estimators are no longer BLUE because they are no longer efficient, therefore, 
they are not the best estimators. The estimators do not have a constant variance. 
Tests of Heteroscedasticity 
1. Graphical Method
Under this method, the error term is plotted against the X-variable and then observe whether 
there is any systematic pattern or not. If the graph shows some pattern, it would imply that there 
is heteroscedasticity present in the model.
Park Test: 
Under this test, we assume that variance is a function of the explanatory variable (X) 
Run OLS and calculate 𝑢𝑖 Run regression of Log 𝑢2 on log Xi as given in the 
above equation Apply t-test and if β1 turns out to be significantly different 
from zero, then it implies heteroscedasticity is present.
Gleizer Test:
Error term is related to the independent variable through different functional forms.
 With this Test, we judge the statistical significance of β1 & β2 by any standard test. If they are 
estimated to be statistically different from zero, then heteroscedasticity is present.
 There can be two possibilities: 
 If β1 = 0 & β2 ≠ 0 Pure heteroscedasticity 
If β1 ≠ 0 & β2 ≠ 0 Mixed heteroscedasticity
Spearman’s Rank Correlation Test: 
Following steps are followed to conduct this test:
Step 1: Fit the regression to the data and obtain the residual 𝑢𝑖.
Step 2: Ignoring the sign of 𝑢𝑖 by taking the absolute order or descending order values, we rank 
both |𝑢𝑖| and Xis either in ascending
Step 3: Now compute the Spearman’s Rank Correlation
Step 4: Now assuming that the population correlation coefficient is zero, we apply the t-test.
𝑡 =𝑟𝑠√𝑛 − 2/√1 − 𝑟²
For n-2 degrees of freedom.
If the computed t-value is greater than the table value, heteroscedasticity is present.
 If computed t-value is less than the table value, then there is no correlation between Xi and ui.
Goldfeld-Quandt test: 
This test is only applicable to sample size greater than or equal to 30.
The following assumptions are taken into consideration while applying this test:
1. σ ui² = σ²Xi²
2. assumes that ui are not auto-correlated
3. Error term follows a normal distribution.
4. Number of observations is atleast twice the number of parameters.
We follow the underlined steps while applying this test: 
We assume there is a positive relationship between the variance and explanatory variables The 
observations are ranked Divide the series into two parts, that is, (n-c)/2 where c is the central 
value Regress the two parts separately and obtain Residual sum of squares (RSS) or ui2 for each 
Use the F-test F = [(RSS2)/dof] ÷ [(RSS1)/dof] for d.o.f. = (n-c-2k)/2 . If F > table value, then 
heteroscedasticity is present and vice versa.
White’s General Heteroscedasticity Test: 
The main merit of this test is it has no assumptions like normality.
Steps to perform this test are:
Calculate 𝑢𝑖𝑠 Regress 𝑢𝑖 on our explanatory variable X𝑖² = α1 + α2X2i + α3X3i + α4X2i² 
+α5X3i² + α6X2iX3i + vi Calculate R2, that is, goodness of fit Use χ2 test to estimate the 
model Test for heteroscedasticity and specification errors. 
 If no cross products are present, that is, (X2iX3i), then pure heteroscedasticity. If cross 
products are present, then it reveals the presence of heteroscedasticity and specification bias.
Remedial Measures
The problem of heteroscedasticity can be corrected when variances are known or unknown.
Case 1: When variances are known:
Generalised Least Squares (GLS) or Weighted Least Squares (WLS) methods are used when we 
know the variances. GLS is a procedure of transforming the original variables of the model in 
such a way that the transformed variables satisfy the assumptions of the classical model and 
then apply OLS to them.
 Case 2: When Variance is not known 
In such a case, we transform the model in such a way so as to obtain a functional form in which 
transformed disturbance term has a constant variance.
AUTO-CORRELATION 
 By the term auto-correlation we mean that the error terms are related to each other over time 
or over space. Value of error term in one period is correlated with its value in another period, 
then they are said to be serially correlated or auto-correlated.
Spatial Auto- correlation Correlation between cross-sectional units
Serial Correlation Correlation between error terms over a period of time
 Auto-correlation is more of the time series phenomenon because error term is likely to be 
related to successive values of other error terms of the same series. Therefore, because auto-
correlation is generally not found in cross-section data, then it is known as spatial auto-correlation.
Positive auto-correlation is when the variables are going in the same direction. 
Negative auto-correlation is when variables are going in different directions
Causes of auto-correlation 
1. More prevalent in time series data. 
2. Specification bias: Whenever we exclude an important variable or we use incorrect functional 
form, then there is a possibility of auto-correlation in the model. 
3. Cobweb Model: In case of cobweb models or whenever economic phenomenon reflects 
cobweb phenomenon, that is, supply reacts to the price with a lag of one time period, then we say 
that problem of auto-correlation arises in such data. 
4. Manipulation of data 
5. Lags: Whichever economic phenomenon is showing impact of the variables from the previous 
time period, problem of auto-correlation is likely to surface in such models. A regression model in 
which one of the explanatory variables is the lagged value of the dependent variable, it is known 
as an auto-regressive model and such models also exhibit auto correlation. 
6. Non-stationarity, that is, mean and variance are not constant. 
Impact of Auto-correlation on estimates 
1. Estimates are statistically unbiased. However, they are no longer best because they won’t have minimum variance.
2. With auto-correlated values of the error term, the OLS parameters’ estimates are likely to be 
larger than those of other econometric models. Therefore, the βs are still linear and unbiased, but 
no longer BLUE and best. 
3. The estimates are not efficient, but they will be asymptotically consistent. 
4. From application of simple formula, the estimation of variance will be smaller, that is, the simple 
OLS formula will underestimate the true variance of the estimators. 
5. Confidence intervals are likely to be wider than those which are based on GLS procedure. 
Consequences of Auto-correlation 
1. t and F-tests are no longer valid and if applied will lead to incorrect results. 
2. R2 is likely to be over-estimated. 
 Detecting Auto-correlation 
Graphical Method 
Von-Neumann Ratio: It is the ratio of 1st difference of a variable X to the variance of X. However, 
this method is only applicable for directly observed series and for variables that are random.
Runs Test: It is a non-parametric test. 
 A run is an uninterrupted sequence of one symbol or attribute. Length of a run means that 
the number of symbols or elements included in one run. 
Too many runs may suggest negative serial correlation and too few runs may suggest positive 
serial correlation. 
Mean (E(R)) = 𝟐𝐍𝟏𝐍𝟐 /N+ 𝟏
Variance (σR2) = 𝟐𝐍𝟏𝐍𝟐(𝟐𝐍𝟏𝐍𝟐−𝐍)/𝐍² (𝐍−𝟏) 
Limits of mean = E(R) - 1.96σR < R < E(R) + 1.96 σR 
 If R is within the limits, then no auto-correlation. However, if they are not in the limits, then 
auto- correlation is present.
Durbin-Watson test (or d-statistic): 
The tests incorporate following assumptions: 
a. It is applicable to small samples. 
b. It based on estimated residuals.
3. Regression model must include an intercept term. 
4. The explanatory variables should be non-stochastic. 
5. Disturbances (𝑢 𝑡) are generated by the 1st order auto-regressive scheme. ut = ϱut-1 + v 
6. Error terms are assumed to be normally distributed. 
7. Regression model does not include lagged values of the dependent variable as one of the 
explanatory variables. 
8. There are no missing observations in the data. 
d is the ration of sum of squared differences in the successive residuals and residual sum of 
squares (RSS).
𝐝 ≈ 𝟐[𝟏 − 𝛒] If 𝜌
̂= 0, this would imply d ≈ 2, there is no autocorrelation 
Where: 𝛒= Σ 𝒖𝒕−𝟏 / Σut² If 𝜌
̂=1, this would imply d = 0, Positive auto-correlation 
If 𝜌
̂= -1, this would imply d =4, Negative auto-correlation
Steps of DW Test:-
Calculate the residuals of the model Calculate d statistic Calculate limits of d 
Analyse whether the value of d lies within the limits or not, and accordingly 
take the decision.
Breusch-Godfrey Test (LM Test): 
The test is also known as Durbin’s M-test. It is used when:
a. In case of non-stochastic regressors. b. In case of lagged values of the regressand.
c. Higher order auto-regressive schemes. d. Simple or higher order moving average of 
error terms.
Steps followed under this test are: 
Run the regression and calculate the value of the residuals Regress the residuals on the original 
Xt and 𝑢𝑡−1, 𝑢𝑡−2 From this, auxillary regression, calculate the value of R² Apply chi-square 
test, assuming that auto-regressive test is of pth order. (n-p)R2 ~ χ2 
 If (n-p)R2 is greater than critical χ2 value, then reject H0 stating that autocorrelation is present
Remedies for correcting Auto-correlation in the model: 
1. If the source of auto-correlation is omitted variables, then the remedy is to include those 
variables in the set of explanatory variables. 
2. If auto-correlation is because of misspecification of the mathematical form, then we need to 
change the form. 
If auto-correlation exists because of reasons other than the above mentioned two, then it is 
the case of pure auto-correlation. 
➢ If ϱ is known, then we apply GLS, transforming the original data so as to produce a 
model whose random variables satisfy the assumptions of Classical Least Squares and 
consequently the parameters can be optimally estimated with this method. 
➢ If ϱ is not known, we then try to calculate p by using the 1st difference method or we 
can even calculate p on the basis of Durbin-Watson d statistic.
QUESTIONS FOR CLARIFICATION
1. Let the two regression lines be given as=3x=10+5y and 4y=5+15x.then the correlation 
coefficient between x and y is
A. -0.40 B. 0.40 C. 0.89 D. 1.05
2. In the presence of heteroscadasticity, the best linear unbiased estimators are provided by the 
method of
A. Ordinary least squares B. Indirect least squares 
C. Weighted least squares D. Instrumental variables
3. To test the stationarity of the series in time series analysis, the following test will be used 
A. Unit root test B. Random walk test
C. Cochrane orcutt iterative procedure test D. Durbin Watson statistic test
4. The term ‘Best’ in the best Linear unbiased estimators (Blue) implies
A. Unbiased variance of the estimators 
B. Minimum variance of the estimators
C. Average variance of the estimators
D. Maximum variance of the estimates 
5. Which of the following is true in the context of statistical test of hypothesis for the variables 
Linear regression model.
A. T²>F B. T<F C. T²=F D. T=F
6.The technique used to estimate the over-identified system of simultaneous equation is
A. Ordinary least squares
B. Maximum likelihood
C. Limited information maximum likelihood 
D. Two stage least squares
7.Coefficient of determination of a regression model
1.Explains the proportion of total variation in the values of the dependent variable
  2.It can be used to derive the estimate of the extent of variation in the value of Y that is explained 
by the random factors
3.Direction of interrelation between the dependent and independent variables
4.It explains the influence if the intercept on the dependent variable
Select the correct code from the list given below
A. 1234 B. 1243
C. 2431 D. 4132
Answers:-
1) B 2) C 3) A 4) B 5) C 6) D 7) A
SIMULTANEOUS EQUATION MODELS 
The simultaneous equation models deal with the two-way relationship between the variables. 
For example; Xi = β1 + β2Yi Yi = α1 + α2Xi 
 The above equations are known as structural or behavioural equations and β1, β2 , α1 and α2 
are known as structural coefficients. 
Reduced form equations
 The reduced form equations are deduced/derived from the main equations of the given 
model in which the dependent variable is solely expressed in terms of explanatory variables or 
non-stochastic variables.
We cannot apply OLS in simultaneous equation model because it will lead to simultaneous bias.
1. 𝛽s will be biased estimators.
2. The error term and explanatory variable become correlated.
3. 𝛽s will not be an efficient estimator.
Under simultaneous equation models, the stochastic term will only be that term which is 
determined from inside the model, that is, the dependent variable.
IDENTIFICATION PROBLEM 
Variables can be either exogenous or endogenous Structural Equations: 
C = a + bY + u Y = C + I 
C = a + b (C+I) + u 
C – bC = a + bI + u 
C = (a/1-b) + (bI/1-b) + (u/1-b) … (Reduced form Equation) 
Identification: 
G = Total number of equations in the model ( Number of exogenous variables) 
K = Total number of variables in the model 
M = Number of endogenous and exogenous variables in a particular condition 
Order Condition: (K – M) ≥ G – 1 
It is a necessary condition but not
sufficient
K-M = Total number of variables excluded from a particular equation but included in other 
equations
G – 1 = Number of equations – 1 
K – M = G – 1 Just Identified 
K – M > G – 1 Over identified 
K – M < G – 1 Under-Identified
 For a model to be exactly identified, all equations should be exactly identified. 
Rank condition 
Any equation is identified if and only it is possible to construct atleast one non-zero 
determinant. |𝑮 − 𝟏| ≠ 𝟎
Methods to Handle Simultaneous Equation Models 
Methods of estimation are as follows: 
1. Single Equation Method/ Limited Information Method: Each equation will be estimated 
individually, not taking into account the restrictions placed on the other equations.
2. Full information Method: Equations in the model are estimated simultaneously, taking into 
account all restrictions on such equations by the omission or absence of some variables. 
SINGLE EQUATION METHODS OF ESTIMATION 
Indirect Least Squares (ILS) Method: 
The method of ILS is used in case of Just/Exactly Identified Equations. It is the method of 
obtaining the estimates of the structural coefficients from OLS estimates of the reduced form 
coefficient. 
Assumptions in ILS are: 
1. Equation is Just/Exactly identified. 
2. There must be full information about all equations in the model. 
 3. Error term from reduced form equations should be independently, identically distributed. 
4. Equations must be linear. 
5. There should be no multicollinearity among the pre-determined variables of the reduced form coefficients.
Steps to be followed in applying ILS 
Obtain the reduced form equations Apply OLS to the reduced form equations 
individually. Obtain the estimates of the original structural coefficients from the 
estimated reduced form coefficient. 
Properties of ILS Coefficient 
 The ILS coefficients inherit all asymptotic properties like consistency and efficiency; but the 
small sample properties such as unbiasedness do not generally hold true. 
Two Stage Least Squares (2SLS) Method The 2SLS method was introduced by Henri Theil and Robert Bassmann and is mostly used in equations which are over-identified. Under this method, the OLS is applied twice. The method is used to obtain the proxy or instrumental variable for some explanatory variable 
correlated with error term. 
2SLS purifies the stochastic explanatory variables from the influence of stochastic disturbance or random term.
The steps performed under 2SLS are: 
Obtain 𝑦𝑖s from the original equations Replace the y in the original equations 
by 𝑦Apply OLS to the transformed equation 
 Point to note is that least squares is applied twice to the same coefficient. 
Features of 2SLS 
1. The method can be applied to an individual equation in the system without taking into account 
the other equations. 
2. This method is suitable for over-identified equations because it gives one estimate per 
parameter. 
3. This method can also be applied to unidentified equations but in that case ILS estimates will be 
equal to 2SLS estimates. 
4. If R² values in a reduced form regressions are very high, then OLS and 2SLS will be very close. If 
R² values in the 1st regressions are low, then 2SLS estimates will be meaningless.
DYNAMIC ECONOMETRIC MODELS 
  The dynamic econometric models include both the lag and the time element in it. They are of two 
types: 
1. Auto-Regressive Models: These models include the lagged values of the 
dependent/endogenous variable. Example: 
Yt = β1Yt-1 + β2Yt-2 + ......... + βp Yt-p + ut
2. Distributed Lag Models: These models include the lagged values of the explanatory variables. 
If the length of the lag is defined, then it is known as ‘Finite Distributed Lag Models’. If we don’t 
know the length of the lag or it is infinite, then it is known as ‘Infinite Distributed Lag Models’. 
 Main reasons for including lags in a model are: 
1. Psychological Reasons
2. Technological reasons
3. Institutional reasons
1. Psychological Reasons: As a result of habit formation, consumers do not change their 
consumption habits instantly. There is adjustment to changes in income, fashions, etc. over a 
period of time. Therefore, to study the impact of any variable on dependent variable, we need to 
consider lag in time. 
2. Technological reasons: In production, modes of techniques of production can’t change 
instantly. So there is an expected time period lag in adjustment to any kind of demand for a 
product. 
3. Institutional reasons: Most of the time, especially in production, firms have contractual 
obligations with labourers as well as the suppliers of the raw material. 
Koyck Approach 
• Treats distributed lag models βk = β0λk 
• Assumes an infinite distributed lag models. 
• 0 < λ < 1; implies that past elements are given less weightage as compared to those in recent 
times. 
• Sum of βs is finite.
• It is not a linear method. Therefore, we need to apply the least squares method, hence 
transforming the model. 
• We take a one-period lag; 
• yt – λyt-1 becomes an auto-regressive model. 
Problems under Koyck Approach: 
1. Error terms are serially correlated. 
2. yt-1 becomes stochastic. 
3. Cannot use the d-test. Instead, we need to use Durbin-H test. 
Rationalisation of Koyck Approach 
Adaptive Expectation: 
The concept was given by Cagan & Friedman. It is also known as ‘Error learning hypothesis’. 
According to the approach, the present decisions will depend on the past.
Xt* - Xt-1* = λ(Xt – Xt-1*)
If λ = 1, then; Xt* = Xt , 
This implying that everything is adjusted instantaneously and hence there is perfect 
adjustment. 
This model is popularly used but has been criticised for taking into account the past values of the 
variables and ignoring the present expected rate of interest values. Also the error term of this 
model shows that there are going to be major problems in the estimation of the model. 
Partial Stock Adjustment Model: The concept was given by Mark Nerlove. It is based on Flexible Accelerator Model. The model suggests us the way in which capital is adjusted for the desired capital stock.
QUESTIONS FOR  CLARIFICATION
1. Match the following
List 1. List11
1.Box –Jenkins method. 1. Causality
2. Unit –root test. 2. Forecasting
3. Durban –Watson d statistic 3. Stationarity
4. Granger test 4. Autocorrelation
Codes 1. 11. 111. 1V
A.. 3. 1. 2. 4
B. 2. 3. 4. 1
C. 1. 2. 3 4
D. 4. 1. 2. 3
2. Var(X+Y)=?
A. Var(X)+Var(Y) B. Var(X)-Var(Y) C. Var(XY) D. E(X)-E(Y)
3. Which of the following hypothesis are tested by a regression function
A. Inter-relation between two or more variables is significantly different from zero
B. The degree and direction of inter relations between two or more variables are non zero and 
goodness of fit of the regression function is satisfactory
C. Degree of influence exercised by systematic explanatory factors is greater /lesser /equal to the 
influence exercised by random factors
D. All of the above 
4. Which of the following statements is true concerning standard regression model.
A. Y has a probability distribution
B. X has a probability distribution
C. The distribution term is assumed to be correlated with C
D. An adequate model the residual will be zero for all sample data points.
5. Which of the following is not a plausible remedy for Multicollinearity
A. Use principal component analysis
B. Drop one of the collinear variables
C. Use a longer run of data
D. Take logarithm of each of the variables
6. If X is distributed normally, then X² follows
A. T-distribution B. Chi-square distribution
C. F-distribution D. The Poisson distribution
7. In a regression r² is the ratio between
A. explained and total variation B. Explained and unexplained variation
C. Unexplained and total variation D. None of the above
8. Koyck’s approach to econometric analysis deal with relationship involving
A. Lagged explanatory variables
B. Qualitative explanatory variables
C. Exponential explanatory variables
D. None of the above
Answers:-
1) B 2) A 3) B 4) A 5) B 6) B 7) B 8) A
TIME SERIES ECONOMETRICS 
The concept of time series was developed by Engel & Granger. The method tells us the 
arrangement of variables over a period of time. 
Univariate: Analysing one sequence of data over a period of time
Example: Yt = F(Yt-1 , Yt-2, ….., Yt-n) + ut
Multi-variate: Analysing a number of variables over time) 
Example: Yt = β1 + β2 Yt-1 + β3 Xt +ut 
Stationary Time Series 
 A time series is considered to be stationary if its mean and variance do not change over time. 
If we explain it mathematically, the stationary time series will be that series whose: 
E(Yt) = μ which is not a function of time 
Var(Yt) = E(Yt - μ)2 = σ2 which is not a function of time
Cov(Yt+k, Yt) = Ɣab(h); if a is not equal to b, then the function is known as a cross correlation 
function and is the equality is sustained, then autocorrelation is present in the model. 
1. Strictly Stationary: It is an extreme form of stationarity. Any series ( Yt, , Yt+1…., Yt+n) is 
considered to be strictly stationary if the joint distribution of the first ‘n’ observations is equal 
to that of another set of distribution with ‘n’ observations separated by a time lag, say k (Yt+k, 
Yt+k+1,……., Yt+k+n). To explain it mathematically, 
Assume we have two distributions (Xt1) and (Xt2). 
We introduce a lag of say k in the series, (Xt1+k) and (Xt2+k). 
So in a strictly stationary series, even after introducing the lag 
Xt1 = Xt1+k; Xt2 = Xt2+k 
Also in a strictly stationary series, the continuous variables tend to be identical to the 
discrete variables, that is, 
X (t) ≡Xt 
Where: X(t) is the continuous variable and Xt is the discrete variable.
One point to remember is that under the strictly stationary series, higher order moment will 
be constant. 
2. Weak Stationary: A series Yt is considered to be a weak stationary series if it’s mean and 
variance are independent of time, that is, in such type of a series, it’s only the mean and variance 
which do not change. Rest all the variables are a function of time. However, the covariance or 
auto covariance is the function of lag but not of time. Mathematically it can be expressed as: 
E(Yt) = μ which is not a function of time 
Var(Yt) = E(Yt - μ)2 = σ2 which is not a function of time 
Cov(Yt+k, Yt) 
E(Yt - μ)( Yt+k - μ) = Ɣk which is a function of lag but not time 
Non-Stationary Series 
The Non-Stationary Stochastic Process is defined as the process whose mean and variance are 
not constant over the period of time.
For example, if we have a stochastic process Yt. It will be a non-stationary stochastic process if; 
E(Yt) = f{time(t)} 
Variance (Yt) = f{time(t)} 
Process generating Time Series
1. Random or Stochastic Process:
 In such a process, a variable can take any value at a point of time. It is a discrete process 
which consists of a series of independently, identically distributed (iid) variables, also known 
as white noise. 
We consider a process to be white noise process if it has zero mean, constant variance and is 
serially uncorrelated. 
Mathematically it can be expressed as below:
Mean: E(Yt) = 0
Variance = E(Yt – 0)2 = σ2
2. Random Walk: It is a non-stationary process, expressed as below:
Yt = Yt-1 + ut
Where: ut is our white noise error term which is independently, identically and normally 
distributed with mean equal to zero and a constant variance, that is,
ut ~ iid N (0, σ2)
 Current value of the endogenous variable will be embedded in the past term including the 
error term. 
 With reference to out equation above, value of Y in time period t is equal to its value in the 
previous time period, that is, t-1 plus a random shock/term. If we take drift and trend in the 
model, then it would become non-stationary.
Random walk with drift: Yt = Ϩ + Yt-1 + ut
where: Ϩ is the drift parameter
ut is the white noise error term
Random Walk with drift and trend:
Yt = Ϩ + Yt-t + βt + ut
Where: Ϩ is the drift variable, 
β is the trend variable 
3. Integrated Stochastic Process:
Auto-Regressive (AR) Process: 
Any series is said to be generated by Auto-Regressive process if it is defined as :
Yt = βYt-1 + ut
 The above mentioned model is the auto regressive model of order one, denoted as AR(1). 
This is so because we include one lag term of our endogenous variable Yt .
One important point to note here is that if β = 1, then the series becomes a random walk 
process or a unit root process as explained before.
 A series is generated by auto regressive process of order p, that is, AR(p), if;
Yt = β1Yt-1 + β2Yt-2 + .......... + βp Yt-p + ut
Moving Average (MA) Process: 
A series is said to be generated by moving average if it is defined as follows;
Yt = ut + Ɣ1ut-1 + Ɣ2 ut-2 + …. + Ɣq ut-q The above mentioned model is denoted as MA(q); implying Moving Average of order q.
The difference between Auto Regressive and Moving Average Process are:
Auto Regressive Moving Average (ARMA): 
 A series is said to be generate by the process of Auto Regressive Moving Average process if it 
is defined as follows:
Yt = β1Yt-1 + β2Yt-2 + ……..+ βp Yt-p + ut + Ɣ1ut-1 + Ɣ2 ut-2 + …. + Ɣq ut-qThe m
 The above mentioned series is a combination of both Auto Regressive model of order p 
(AR(p)) and Moving Average model of order q (MA(q)).
 The general ARMA model was first described by Peter Whittle in his thesis in 1951. 
 He used mathematical analysis like Laurent Series and Fourier analysis with some statistical 
inference to explain ARMA models. 
 It was further popularized by George E.P. 
 Box and Jenkins, in their book in 1971, also introducing a method named after them (Box-
Jenkins) method for choosing and estimating the ARMA models.
The process of Auto Regressive Moving Average can be summarized as
AR(p) Yt = β1Yt-1 + β2Yt-2 + ……..+ βp Yt-p + ut
MA(q) Yt = ut + Ɣ1ut-1 + Ɣ2 ut-2 + …. + Ɣq ut-q
ARMA(p,q) Yt = β1Yt-1 + β2Yt-2 + ……..+ βp Yt-p + ut +Ɣ1ut-1 + Ɣ2 ut-2 + …. + Ɣq ut-q
Auto Regressive Integrated Moving Average (ARIMA) Process: 
Suppose that ΔdYt is a stationary series that can be represented by an ARMA model of order 
(p,q); then we can say that Yt can be represented by an ARIMA process of order (p,q, d)
 The model is called integrated because the stationary ARMA model which is fitted to the 
difference data has to be integrated to provide a model for non-stationary data.
FUNCTIONAL FORMS 
A linear functional form is of the type: yi = β1 + β2Xi + ui
However, there can be following types of functional forms as well: 
1. Log-Log/ Log-Linear/ Double Log Function: Such type of a functional form is represented as: 
Log yi = log β1 + β2 log Xi + ui
Here, β2 is a slope coefficient and remains constant. It tells us per unit changes in yi with per 
unit changes in Xi. 
 This model is also known as Constant Elasticity Model. 
2. Log-Lin/Lin-Log Models: These are the semi log models
3. Linear Trend Models: In such models, the regression is done on a time trend and is expressed 
as follows: 
yi = β1 + β2 t + ui
4. Reciprocal Models: Under such models, the OLS cannot be applied directly. The model is 
expressed as: 
Yi = β1 + β2 ( 𝟏/ 𝑿𝒊 ) + ui
If Xi approaches to infinity, then yi is asymptotic to β1.
CO-INTEGRATION 
 The term co-integration was introduced by Engel (1987) & Granger (1981). Co-integration 
studies the short run and long run dynamics of the series and links the short run behavior 
with the long run behavior. 
It is said that if there is a long run relationship between the two given series, then they are 
said to be co-integrated. 
For example, Yt = I(1) 
Xt = I(1) 
 Yt – βXt will become I(0), then this process is called co-integration. 
 If it would have been I(1); we won’t call the series to be co-integrated. 
Error Correction Mechanism: It is also known as Granger Representation Theorem. 
Δy = f(ΔX, u) 
Where u is used to correct the
disequilibrium. 
If u ≠ 0, the equilibrium needs to be established and accordingly y changes to get back to equilibrium.
Error correction mechanism 
 This describes the short run dynamics. If Xt, Yt are co-integrated then there is a long term 
relationship between them. However, in the short run, there may be a possibility of 
disequilibrium. So the error term can be treated as an equilibrium error and this can be used 
to tie the short run behavior to the long run value. 
Granger Representation Theorem 
 The theorem states that if two variables are co-integrated, then the relationship between the 
two can be expressed as an error correction mechanism. 
Tests of Co-integration 
1. Dickey-Fuller and Augmented Dickey Fuller Test: It is also called Engel-Granger 
Test/Augmented E-G Test. 
2. Durbin-Watson Test: It is also called co-integrating Regression Durbin-Watson test. 
 Under this we take null hypothesis (H0: d=0). 
 We assume that if d = 0, then we have a unit root, hence the series is non-stationary. 
 If H0 is rejected, then this would imply that the series is stationary in nature and hence no problem. 
VECTOR AUTO REGRESSION (VAR) 
 The concept was given by Sims and is related to simultaneous models and Granger Causalty. 
Under this, we consider a number of simultaneous equations and also that all variables are 
dependent. It is a multiple time series generalization of auto-regressive models. 
 We do not have to differentiate between endogenous and exogenous variables. Estimation 
becomes simple. OLS can be used making our estimators to be precise in nature. We can 
forecast a number of variables at a time.
 We do not consider it as a purely econometric approach because this kind of approach is 
based on less a-priori information.
 It is difficult to handle lags and estimate the number of lags to be taken in a model.
QUESTIONS FOR CLARIFICATION
1. The basic construction of price index number involves which of the following steps
1.Selection of a base year and price of a group of commodities in that year
2.Prices of a group of commodities for the given year that are to be compared
3.Changes in the price of the given year are shown as percentage variation for the base year
4.Index number of price of a given year is denoted as100
Codes:
  A.1 ,2 and3 B.2,3 and4
C.1,3 and4 D.1,2and4
2. if the distribution is skewed to the left, then it is
A. Asymmetrically skewed
B. Symmetrical
C. Negatively skewed
D. Positively skewed
3. Consider the following measures
1.Correlation coefficient 2.Covariance
3.Coefficient of variation 4.Index number
Which of these are unit free
A. 1 and2
B. 2 and3
C. 1 and4
D. All of the above
4. In case of high income inequality ,the income distribution is
A. A symmetric distribution
B. U shaped distribution
C. Inverted J-Shaped distribution
D. None of the above
5. Consider the following statements
1.Quartile deviation is more instructive range as it discards the dispersion of extreme items
2.Coefficient of quartile deviation cannot be used to compare the degree of variation in different 
distributions
3.There are 10 deciles for a series
Codes:
A. 1,2 and 3 B. 2 only
C. 3 only D. 1 only
Answers:-
1) A 2) C 3) D 4) C 5) D
Measurement scales of variables 
1.Ratio scale -for a variable A taking two value a and b then (a/b) and { a-b }are meaningful 
quantities 
2.Interval scale -in interval scale the difference of say two time period (2000-1995) is significant 
but the ratio of two time period is not .
3.Ordinal scale -those variables which satisfies the property of natural ordering belongs to this 
category such as grading system (A,B,C),income class (upper, middle, lower)
4.Nominal scale -variables in this category does not satisfy then property of ratio scale. 
Gender(male, female),martial status (married, unmarried, divorced, separated) belong to this 
category
Coefficient of determination 
The overall goodness of fit of the regression model is measured by the coefficient of 
determination, 𝑟2.It tells what proportion of the variation in the dependent variable 
or regressand, is explained by the explanatory variable, or regressor.
The coefficient of determination is the square of the correlation(r) between predicted y scores 
and actual y scores; This 𝑟² lies between 0 and 1; closer it is to 1, better is the fit.
 With linear regression, the coefficient of determination is also equal to the square of the 
correlation between x and y scores.
 𝑟²of 0 means that the dependent variable cannot be predicted from the independent variable.
 𝑟²of 1 means the dependent variable can be predicted without error from the independent 
variable.
𝑟² between 0 and 1 indicates the extent to which the dependent variable is predictable. An r² 
of 0.10 means that 10 percent of the variance inYis predictable from Xi, an 𝑟² of 0.20 means 
that 20 percent is predictable; and so on.
 A concept related to the coefficient of determination is the coefficient of correlation, r. It is 
measure of linear association between two variables and it lies between -1 and +1.
Coefficient of determination, 𝒓𝟐
TSS = Total Sum of squares
ESS= Explained sum of squares 
RSS= Residual sum od square
𝑹² 𝒂𝒏𝒅 𝑨𝒅𝒋𝒖𝒔𝒕𝒆𝒅 𝑹²
R-squared or 𝑹² explains the degree to which your input variables explain the variation of 
your output / predicted variable. So, if 𝑹² is 0.8, it means 80% of the variation in the output 
variable is explained by the input variables. So, in simple terms, higher the 𝑹²
, the more 
variation is explained by your input variables and hence better is your model.
 However, the problem with 𝑹² is that it will either stay the same or increase with addition of 
more variables, even if they do not have any relationship with the output variables. This is 
where “Adjusted 𝑹²” comes to help. Adjusted R-square penalizes you for adding variables 
which do not improve your existing model.
 Hence, if you are building Linear regression on multiple variable, it is always suggested that 
you use Adjusted𝑹𝟐to judge goodness of model. In case you only have one input variable, 𝑹² 
and Adjusted𝑹𝟐would be exactly same.
Typically, the more non-significant variables you add into the model, the gap in 𝑹² and 
Adjusted 𝑹² increases.
Simultaneous Equations Models
 A system describing the joint dependence of variables is called a system of simultaneous 
equations or simultaneous equations model.
 In contrast to single-equation models, in simultaneous-equation models more than one 
dependent, or endogenous, variable is involved, necessitating as many equations as the 
number of endogenous variables.
 A unique feature of simultaneous-equation models is that the endogenous variable (i.e., 
regressand) in one equation may appear as an explanatory variable (i.e., regressor) in another 
equation of the system.
 As a consequence, such an endogenous explanatory variable becomes stochastic and is 
usually correlated with the disturbance term of the equation in which it appears as an 
explanatory variable.
 In cases where the regressor is correlated with the disturbance term, applying OLS to estimate 
the parameters of a regression equation will give biased and inconsistence estimates
Simultaneous Equation Bias 
The violation of assumption E(𝑈𝑖𝑋𝑖)=0 of OLS creates Simultaneous equation bias. This creates the 
following problems:
i. The problem of identification of the parameters of individual relationship
ii. There arise problem of estimation
iii. The OLS estimates are biased and inconsistent
Methods to Estimate Parameters
Limited Information Maximum Likelihood -Equation is overidentified
 - The estimates are biased for small but consistent
Full Information Maximum Likelihood Method - For small sample estimates are biased
Three stage Least Squares 3 SLS) - overidentified system 
- Estimates are biased but consistent
- Efficient than 2 SLS
Identification
Identification is the problem of finding unique solution for the problem the reduced form 
coefficients
 A model is said to be identified if it has a unique statistical form enabling unique estimates of 
the parameters from the sample
If the model is not identified then estimates of the parameters can not be estimated
In econometric theory two possible situations of identifiability
1. Equation underidentified: An equation is underidentified if its statistical form is not unique
2. Equation Identified: If an equation has a unique statistical form we say it is identified
Tests of Stationarity
The Unit Root Test:
𝑌𝑡=𝜌𝑌𝑡−1+𝑢𝑡………..(1)
If 𝝆=1 then equation (i) will become random walk model without drift which is non-stationary 
stochastic process.
Δ𝑌𝑡=(𝜌−1)𝑌𝑡−1+𝑢𝑡
Δ𝑌𝑡=𝛿𝑌𝑡−1+𝑢𝑡
𝑤ℎ𝑒𝑟𝑒𝛿=𝜌−1
 When 𝛿=0, 𝜌=1 that is we have unit root present.
Methods of Forecasting
The most important aspect of time series is Forecasting
There are two methods of forecasting which are popularly used They are
1. Box Jenkins Methodology (BJ Methodology): Box Jenkins methodology technically known as 
ARIMA methodology
2. Vector Autoregression
Parametric Test v/s Nonparametric Test
 A statistical test, in which specific assumptions are made about the population parameter is 
known as the parametric test.
 A statistical test used in the case of non-metric independent variables is called nonparametric 
test.
 In the parametric test, the test statistic is based on distribution. On the other hand, the test 
statistic is arbitrary in the case of the nonparametric test.
 In general, the measure of central tendency in the parametric test is mean, while in the case of 
the nonparametric test is median.
  In the parametric test, there is complete information about the population. Conversely, in the 
nonparametric test, there is no information about the population.
The applicability of parametric test is for variables only, whereas nonparametric test applies to 
both variables and non variables/attributes.
For measuring the degree of association between two quantitative variables, Pearson’s 
coefficient of correlation is used in the parametric test, while spearman’s rank correlation is 
used in the nonparametric test.
Type I and type II errors
 No hypothesis test is 100% certain. Because the test is based on probabilities, there is always 
a chance of making an incorrect conclusion. When you do a hypothesis test, two types of 
errors are possible: type I and type II.
Type I error
When the null hypothesis is true and you reject it, you make a type I error. 
 When we reject the null hypothesis though it is true.
 The probability of making a type I error is α, which is the level of significance you set for your 
hypothesis test. 
Type II error
When the null hypothesis is false and you fail to reject it, you make a type II error. 
When we accept the null hypothesis though it is false.
The probability of making a type II error is β, which depends on the power of the test.
Level of Significance
This refers to degree of significant with which we accept or reject particular hypothesis.
 Most of the hypothesis testing fixed at 5%, Which means decision would be correct 95%
 Some time it may be fix at 1% and decision would be correct 99% and denoted by α( type 1 
error).
 If there is no level of significance is given then we always take α= 0.05
Critical region or rejection region-
 The statistic which lead to the rejection of null hypothesis H0 gives us a region known as 
rejection region or critical region.
 Those which lead to acceptance of H0 gives us a region called as acceptance region.
One tailed test and two tailed test.
 In this alternative hypothesis expressed by the symbol (<) or (>) is called one tailed test.
 A test of any statistical hypothesis where the alternative is written with a symbol ( ≠ ) is called two tailed test.
QUESTIONS FOR CLARIFICATION
1. Which of the following relationships between the three means is correct
A. H.M.>G.M.>A.M. B. A.M.=G.M#H.M
C. A.M.>G.M.>H.M D. G.M.>A.M.>H.M
2. Variance is 
A. Fourth moment B. Third moment
C. Second moment D. First moment
3. Mean and variance of which of the following distribution is same
A. Bernoulli distribution
B. Binomial distribution
C. Poisson distribution
D. Normal distribution
4. The concept of standard deviation is due to
A. Karl Pearson B. Poisson
C. Samuelson D. Student
5. Root mean square Deviation is known as 
A. Mean deviation
B. Standard deviation
C. Correlation
D. All of the above
6. The efficiency of an estimator depends upon
A. Bias B. Least Variance
C. Sample size D. Small Mean
7. value of mean and standard deviation in normal distribution is 
A.0 and 0. B.1 and 0
C.1 and 1 D.0 and 1.
8. under Poisson distribution the shape of the distribution curve is. 
A. Positively skewed. B. Negatively skewed.
C. Symmetrical. D. All of the above.
Answers: 
1. C 2. C 3. C 4. A 5. B 6. B 7. D 8. A
 


 
   
   

