Education and Wage Assignment help
Analyzing Relationship Between Education and Wage Rate Via Simple Linear Regression
Purpose
The purpose of this assignment is to analyze the relationship between education and wage rate by means of simple linear regression. In short, the chief purpose of this task is to analyze how much education affects wage rate.
Background
The intellectual abilities of an individual are developed by education. According to Psacharopoulos (1994), every person has loads of potential capabilities that can be enriched via education and appropriate education paves the way and directs individuals to virtue. In accordance with Psacharopoulos (1993), a vital role is played by education in formulating individuals to enter the workforce plus preparing them with the abilities to participate in enduring learning experiences. Thus, one’s income is raised by the achievement of education. Policymakers give emphasis to investment in human capital for the reason that they believe that it upsurges labor efficiency. Hence, it is anticipated that labor with a comparatively higher education should possess higher efficiency as well as an equivalent greater wage.
Method
In this assignment, different statistical methods have been used in order to fulfill the research objectives. Initially, summary statistics is presented. Summary statistics abridges and provide information about the considered dataset. It states something concerning the values in the dataset comprising where the average lies and whether the considered dataset is skewed.
For modeling an association between a scalar dependent variable and one or more independent variables, linear regression is used. The case of one independent or explanatory variable is referred to as a simple linear regression.
Since, a linear regression model is not always suitable for the data; fitness of the model is evaluated by defining residuals and scrutinizing residual plots.
Quadratic regression model is taken into account considering that sometimes data fits well with a polynomial curve.
Graphical representation of the distribution of numerical values obtained from ln (wage rate) is shown by histogram.
Log linear regression is also considered. It allows searching for associations among the variables in a multi-way contingency table.
Results
In this section, results obtained after applying different statistical procedures are elucidated.
Data Characteristics
Table 1 gives a broader characteristic of the considered dataset. Summary statistics includes three chief classifications namely; measures of location (also referred to as central tendency), measures of spread, and charts and graphs. Central tendency or measures of location enlightens where the data is concentrated at, or in other words where a trend places. Mean values of Wage and Education are 22.31 and 13.76 respectively. 19.39 & 13 represent the Median values whereas the Mode values of wage and education are 38.45 and 12. Furthermore, measures of spread elucidates how assorted or spread out the dataset is. And such information holds significance. Table 2 and table 3 shows calculations on the basis of which the histograms for wage rate and education are made. The graphical representation of these histograms is shown in appendix A.
Table 1: Summary Statistics
Variables |
Mean |
Median |
Mode |
Maximum Value |
Minimum Value |
Wage |
22.3081 |
19.39 |
38.45 |
76.39 |
4.33 |
Education |
13.76 |
13 |
12 |
21 |
6 |
Table 2: Calculations for Histogram – Wage rate
Bin |
Frequency |
Cumulative % |
Bin |
Frequency |
Cumulative % |
15 |
37 |
37.00% |
30 |
43 |
43.00% |
30 |
43 |
80.00% |
15 |
37 |
80.00% |
45 |
12 |
92.00% |
45 |
12 |
92.00% |
60 |
6 |
98.00% |
60 |
6 |
98.00% |
75 |
1 |
99.00% |
75 |
1 |
99.00% |
90 |
1 |
100.00% |
90 |
1 |
100.00% |
More |
0 |
100.00% |
More |
0 |
100.00% |
Table 3: Calculations for Histogram – Education
Bin |
Frequency |
Cumulative % |
Bin |
Frequency |
Cumulative % |
6 |
2 |
2.00% |
18 |
54 |
54.00% |
12 |
40 |
42.00% |
12 |
40 |
94.00% |
18 |
54 |
96.00% |
24 |
4 |
98.00% |
24 |
4 |
100.00% |
6 |
2 |
100.00% |
More |
0 |
100.00% |
More |
0 |
100.00% |
Linear Regression Analysis
Linear regression is used to model the association between a dependent variable wage rate denoted by ‘y’ and an independent variable education denoted by ‘x’. Considering that only one independent variable is considered, such case is referred to as a simple linear regression. Values of slope (m), y-intercept, b, and correlation are presented in table 4. Besides, graphical representation of linear regression for wage rate and education is presented in Appendix A.
Table 4: Linear Regression Analysis
SLOPE (m) |
2.123756384 |
y-intercept, b |
-6.914787841 |
Correlation |
0.413051559 |
Calculation of Residuals
Regression statistics is presented in table 5. Residual values are included in table 6. Besides, residuals display how distant the actual data points are from the anticipated data points (by means of the equation). Residual values are plotted against the independent variable i.e. education and is presented in Appendix A.
Table 7: Quadratic Regression Analysis
SUMMARY OUTPUT |
|
|
|
Regression Statistics |
|
Multiple R |
0.413991 |
R Square |
0.171388 |
Adjusted R Square |
0.154303 |
Standard Error |
12.89436 |
Observations |
100 |
Table 8: ANOVA – Quadratic Regression Analysis
ANOVA |
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
Regression |
2 |
3335.808 |
1667.904 |
10.03163 |
0.00011 |
Residual |
97 |
16127.66 |
166.2646 |
|
|
Total |
99 |
19463.47 |
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
Intercept |
-0.87706 |
21.10524 |
-0.04156 |
0.966938 |
-42.77 |
41.01099 |
-42.767 |
41.01099 |
X Variable 1 |
1.249474 |
2.938401 |
0.425222 |
0.671616 |
-4.58 |
7.081388 |
-4.58 |
7.081388 |
X Variable 2 |
0.030465 |
0.101042 |
0.301506 |
0.763674 |
-0.17 |
0.231004 |
-0.17 |
0.231004 |
Analysis of Histogram
Histogram for ln (wage rate) is constructed which is presented in Appendix A. Table 9 contains the calculations.
Table 9: Calculations for Histogram – ln (wage rate)
Bin |
Frequency |
Cumulative % |
Bin |
Frequency |
Cumulative % |
1 |
0 |
0.00% |
3 |
48 |
48.00% |
2 |
6 |
6.00% |
4 |
44 |
92.00% |
3 |
48 |
54.00% |
2 |
6 |
98.00% |
4 |
44 |
98.00% |
5 |
2 |
100.00% |
5 |
2 |
100.00% |
1 |
0 |
100.00% |
More |
0 |
100.00% |
More |
0 |
100.00% |
- Log Linear Regression Analysis
Log-linear regression is estimated and table 10 and 11 contain the values obtained.
Table 10: Log Linear Regression Analysis
SUMMARY OUTPUT |
|
|
|
Regression Statistics |
|
Multiple R |
0.423907 |
R Square |
0.179697 |
Adjusted R Square |
0.171327 |
Standard Error |
0.544913 |
Observations |
100 |
Table 11: ANOVA – Log Linear Regression Analysis
ANOVA |
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
Regression |
1 |
6.374513 |
6.374513 |
21.46809 |
1.11E-05 |
|
Residual |
98 |
29.09911 |
0.29693 |
|
|
|
Total |
99 |
35.47362 |
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
Intercept |
1.65 |
0.282 |
5.852 |
6.45E-08 |
1.089 |
2.207 |
1.089 |
2.207 |
X Variable 1 |
0.093 |
0.02 |
4.633 |
1.11E-05 |
0.053 |
0.133 |
0.053 |
0.133 |
Discussion
The slope ‘m’ obtained is positive. A positive slope expresses that there is a positive association between the two variables namely education and wage rate which means that when years of education increases, the wage rate also increases.
While plotting residuals against education, an obvious pattern has been observed. Since, the scatterplots show a pattern, the association may be nonlinear and the model will require to be altered correspondingly. Seeing that a pattern can be observed, there may be ‘heteroscedasticity’ in the errors. In other words, the variance of the residuals may not be uniform. An alteration of the predicted variable (for example, a logarithm or square root) may be needed so as to prevail over this issue.
While estimating quadratic regression, it has been observed that R Square equals 0.171, which is not a good fit. It indicates that 17% of the variation in wage rate is explained by the independent variable education. The Adjusted R Square value of 15% and p-value shows that the model is not a good fit for the data. The fact that the p-value both x variables is not near 0 also approves that the quadratic coefficient is not significant. This is more approved by seeing at the scatter diagram in Appendix A, which displays that the quadratic trend line is not a better bit for the data as compared to the linear trend line. The marginal effect of another year of education on wage for a person with 12 years of education, and for a person with 14 years of education is 14.99 and 17.49 respectively. However, considering the analysis of log-linear regression, the marginal effect of another year of education on wage for a person with 12 years of education, and for a person with 14 years of education is 1.12 and 1.3 respectively.
Both histograms of wage rate and ln (wage rate) have compared. Both display an example of data that are skewed to the right. Few greater values take the mean up but do not actually shake the median. Thus, the mean is larger than the median when data are skewed right.
Recommendations
It is recommended that government strategies should be set to upsurge registration in schooling. Well-organized management and checking of such program should be implemented so that the policy would survive to its bated breath. Besides, investments to both education as well as other learning activities should be backed by legislative bodies so that low wage workforces can simply make these investments to advance their economic happiness. Furthermore, it is recommended to increase the sample size so as to get a better picture. More advanced statistical and innovative data mining approaches should be considered to acquire more pertinent and precise results.
References
- Psacharopoulos, G. (1993). Returns to Investment in Education. The World Bank, Working Papers 1067.
- Psacharopoulos, G.(1994). Returns to Investment in Education: A Global Update. World Development 22(9): 1325-1343.