Open Widget Area

Analyzing Relationship Between Education and Wage Rate Via Simple Linear Regression

  1. Purpose

The purpose of this assignment is to analyze the relationship between education and wage rate by means of simple linear regression. In short, the chief purpose of this task is to analyze how much education affects wage rate.

 

  1. Background

The intellectual abilities of an individual are developed by education. According to Psacharopoulos (1994), every person has loads of potential capabilities that can be enriched via education and appropriate education paves the way and directs individuals to virtue. In accordance with Psacharopoulos (1993), a vital role is played by education in formulating individuals to enter the workforce plus preparing them with the abilities to participate in enduring learning experiences. Thus, one’s income is raised by the achievement of education. Policymakers give emphasis to investment in human capital for the reason that they believe that it upsurges labor efficiency. Hence, it is anticipated that labor with a comparatively higher education should possess higher efficiency as well as an equivalent greater wage.

 

  1. Method

In this assignment, different statistical methods have been used in order to fulfill the research objectives. Initially, summary statistics is presented. Summary statistics abridges and provide information about the considered dataset. It states something concerning the values in the dataset comprising where the average lies and whether the considered dataset is skewed.

 

For modeling an association between a scalar dependent variable and one or more independent variables, linear regression is used. The case of one independent or explanatory variable is referred to as a simple linear regression.

 

Since, a linear regression model is not always suitable for the data; fitness of the model is evaluated by defining residuals and scrutinizing residual plots.

 

Quadratic regression model is taken into account considering that sometimes data fits well with a polynomial curve. 

Graphical representation of the distribution of numerical values obtained from ln (wage rate) is shown by histogram.

 

Log linear regression is also considered. It allows searching for associations among the variables in a multi-way contingency table.

 

  1. Results

In this section, results obtained after applying different statistical procedures are elucidated.

 

Need Urgent Assignment Help?
Get Assignment Help from Australia's No.1 Assignment Help Service
*
*
*

 

  • Data Characteristics

Table 1 gives a broader characteristic of the considered dataset. Summary statistics includes three chief classifications namely; measures of location (also referred to as central tendency), measures of spread, and charts and graphs. Central tendency or measures of location enlightens where the data is concentrated at, or in other words where a trend places. Mean values of Wage and Education are 22.31 and 13.76 respectively. 19.39 & 13 represent the Median values whereas the Mode values of wage and education are 38.45 and 12. Furthermore, measures of spread elucidates how assorted or spread out the dataset is. And such information holds significance. Table 2 and table 3 shows calculations on the basis of which the histograms for wage rate and education are made. The graphical representation of these histograms is shown in appendix A.

 

Need Urgent Assignment Help?
Get Assignment Help from Australia's No.1 Assignment Help Service
*
*
*

 

Table 1: Summary Statistics

Variables

Mean

Median

Mode

Maximum Value

Minimum Value

Wage

22.3081

19.39

38.45

76.39

4.33

Education

13.76

13

12

21

6

 

Table 2: Calculations for Histogram – Wage rate

Bin

Frequency

Cumulative %

Bin

Frequency

Cumulative %

15

37

37.00%

30

43

43.00%

30

43

80.00%

15

37

80.00%

45

12

92.00%

45

12

92.00%

60

6

98.00%

60

6

98.00%

75

1

99.00%

75

1

99.00%

90

1

100.00%

90

1

100.00%

More

0

100.00%

More

0

100.00%

 

Table 3: Calculations for Histogram – Education

Bin

Frequency

Cumulative %

Bin

Frequency

Cumulative %

6

2

2.00%

18

54

54.00%

12

40

42.00%

12

40

94.00%

18

54

96.00%

24

4

98.00%

24

4

100.00%

6

2

100.00%

More

0

100.00%

More

0

100.00%

 

  • Linear Regression Analysis

 

Linear regression is used to model the association between a dependent variable wage rate denoted by ‘y’ and an independent variable education denoted by ‘x’. Considering that only one independent variable is considered, such case is referred to as a simple linear regression. Values of slope (m), y-intercept, b, and correlation are presented in table 4. Besides, graphical representation of linear regression for wage rate and education is presented in Appendix A.

 

Table 4: Linear Regression Analysis

SLOPE (m)

2.123756384

y-intercept, b

-6.914787841

Correlation

0.413051559

 

 

  • Calculation of Residuals

 

Regression statistics is presented in table 5. Residual values are included in table 6. Besides, residuals display how distant the actual data points are from the anticipated data points (by means of the equation). Residual values are plotted against the independent variable i.e. education and is presented in Appendix A.

 

Table 5: Regression Statistics

SUMMARY OUTPUT

  

Regression Statistics

Multiple R

0.41305156

R Square

0.17061159

Adjusted R Square

0.16214844

Standard Error

12.8344151

Observations

100

 

Table 6: ANOVA

ANOVA

     

 

df

SS

MS

F

Significance F

Regression

1

3320.693589

3320.694

20.15936

1.94674E-05

Residual

98

16142.77655

164.7222

  

Total

99

19463.47014

 

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept

-6.9148

6.6339

-1.04234

0.299

-20.08

6.25

-20.08

6.25

X Variable 1

2.12376

0.473

4.489917

1.95E-05

1.1851

3.06

1.1851

3.06

 

  • Quadratic Regression Analysis

 

It has been analyzed that sometimes data corresponds better with a polynomial curve. Considering it, quadratic regression analysis has been applied to the dataset. The processed values are presented in table 7.

 

Table 7: Quadratic Regression Analysis

SUMMARY OUTPUT

  

Regression Statistics

Multiple R

0.413991

R Square

0.171388

Adjusted R Square

0.154303

Standard Error

12.89436

Observations

100

 

Table 8: ANOVA – Quadratic Regression Analysis

ANOVA

     

 

df

SS

MS

F

Significance F

Regression

2

3335.808

1667.904

10.03163

0.00011

Residual

97

16127.66

166.2646

  

Total

99

19463.47

 

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept

-0.87706

21.10524

-0.04156

0.966938

-42.77

41.01099

-42.767

41.01099

X Variable 1

1.249474

2.938401

0.425222

0.671616

-4.58

7.081388

-4.58

7.081388

X Variable 2

0.030465

0.101042

0.301506

0.763674

-0.17

0.231004

-0.17

0.231004

 

  • Analysis of Histogram

 

Histogram for ln (wage rate) is constructed which is presented in Appendix A. Table 9 contains the calculations. 

 

Table 9: Calculations for Histogram – ln (wage rate)

Bin

Frequency

Cumulative %

Bin

Frequency

Cumulative %

1

0

0.00%

3

48

48.00%

2

6

6.00%

4

44

92.00%

3

48

54.00%

2

6

98.00%

4

44

98.00%

5

2

100.00%

5

2

100.00%

1

0

100.00%

More

0

100.00%

More

0

100.00%

 

  • Log Linear Regression Analysis

 

Log-linear regression is estimated and table 10 and 11 contain the values obtained.

 

Table 10: Log Linear Regression Analysis

SUMMARY OUTPUT

  

Regression Statistics

Multiple R

0.423907

R Square

0.179697

Adjusted R Square

0.171327

Standard Error

0.544913

Observations

100

 

Table 11: ANOVA – Log Linear Regression Analysis

ANOVA

     

 

 

df

SS

MS

F

Significance F

Regression

1

6.374513

6.374513

21.46809

1.11E-05

 

Residual

98

29.09911

0.29693

  

 

Total

99

35.47362

 

 

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept

1.65

0.282

5.852

6.45E-08

1.089

2.207

1.089

2.207

X Variable 1

0.093

0.02

4.633

1.11E-05

0.053

0.133

0.053

0.133

 

  1. Discussion

 

The slope ‘m’ obtained is positive. A positive slope expresses that there is a positive association between the two variables namely education and wage rate which means that when years of education increases, the wage rate also increases.

 

While plotting residuals against education, an obvious pattern has been observed. Since, the scatterplots show a pattern, the association may be nonlinear and the model will require to be altered correspondingly. Seeing that a pattern can be observed, there may be ‘heteroscedasticity’ in the errors. In other words, the variance of the residuals may not be uniform. An alteration of the predicted variable (for example, a logarithm or square root) may be needed so as to prevail over this issue.

 

While estimating quadratic regression, it has been observed that R Square equals 0.171, which is not a good fit. It indicates that 17% of the variation in wage rate is explained by the independent variable education. The Adjusted R Square value of 15% and p-value shows that the model is not a good fit for the data. The fact that the p-value both x variables is not near 0 also approves that the quadratic coefficient is not significant. This is more approved by seeing at the scatter diagram in Appendix A, which displays that the quadratic trend line is not a better bit for the data as compared to the linear trend line. The marginal effect of another year of education on wage for a person with 12 years of education, and for a person with 14 years of education is 14.99 and 17.49 respectively. However, considering the analysis of log-linear regression, the marginal effect of another year of education on wage for a person with 12 years of education, and for a person with 14 years of education is 1.12 and 1.3 respectively.

 

Both histograms of wage rate and ln (wage rate) have compared. Both display an example of data that are skewed to the right. Few greater values take the mean up but do not actually shake the median. Thus, the mean is larger than the median when data are skewed right.

 

  1. Recommendations

                                                               

It is recommended that government strategies should be set to upsurge registration in schooling. Well-organized management and checking of such program should be implemented so that the policy would survive to its bated breath. Besides, investments to both education as well as other learning activities should be backed by legislative bodies so that low wage workforces can simply make these investments to advance their economic happiness. Furthermore, it is recommended to increase the sample size so as to get a better picture. More advanced statistical and innovative data mining approaches should be considered to acquire more pertinent and precise results.

 

 

References

 

  • Psacharopoulos, G. (1993). Returns to Investment in Education. The World Bank, Working Papers 1067.

 

  • Psacharopoulos, G.(1994). Returns to Investment in Education: A Global Update. World Development 22(9): 1325-1343.

 

Related Projects

Assignment Studio © Copyright 2012-2018

Menu
CALL NOW
+
Call me!

Assignment Studio