# I. Introduction

In the fields of education and labor economics, the effect of education on a worker’s wage level (the internal rate of return on private education, hereinafter referred to as the return to education) has been attracting considerable attention as an indicator that reflects laborers’ compensation for human capital and the labor supply and demand for highly educated workers. Many studies have been conducted worldwide to empirically examine the return to education using individual or household survey data (Montenegro & Patrinos, 2014). Since 1980, the wage gap between groups with different levels of educational attainment has been pointed out to be rapidly widening, especially in Western countries where technological innovation is advancing. To understand the situation, researchers have examined the differences in the effects on wage levels between secondary and tertiary education (Autor et al., 1998; Bound & Johnson, 1992). Psacharopoulos and Patrinos (2004) have stated that the wage effects differ remarkably between secondary and tertiary education in both developed and developing economies. Wage inequality based on educational background is also considered to be a crucial social and economic problem in China (Ma, 2018).

In China, micro survey data have been developed in recent years, and a number of papers (see, for instance, Churchill & Mishra, 2018; Fleisher et al., 2005; Liu & Zhang, 2013) have been published that examine the return to education based upon the estimation of a Mincer-type wage function. Currently, there is a wealth of empirical evidence on the returns to secondary and tertiary education in the form of estimates of an educational attainment dummy variable. However, due to significant differences in datasets, estimation periods, and empirical methods across studies, the sizes of the effect of secondary and tertiary education on wage levels are unclear (Ma & Zhang, 2017). Whether previous studies have found genuine empirical evidence is also ambiguous. In this paper, we will tackle these issues utilizing the advanced techniques and guidelines of meta-analysis proposed by Stanley and Doucouliagos (2012) and Havránek et al. (2020).

Previous meta-analyses of the return to education in China were conducted by Churchill and Mishra (2018), Fleisher et al. (2005), Liu and Zhang (2013), and Ma and Iwasaki (2021). Fleisher et al. (2005), Liu and Zhang (2013), and Ma and Iwasaki (2021) performed their meta-analyses using estimation results for years of schooling; they therefore did not examine the wage effects of secondary and tertiary education individually. Meanwhile, Churchill and Mishra (2018) did not investigate the return to secondary education and, hence, could not compare it with the return of tertiary education. In addition, the meta-analyses of these previous works are limited to the English literature and did not utilize the rich empirical results available in the Chinese literature. This paper conducts a broad meta-analysis on the returns to secondary and tertiary education by using the empirical results reported in the literature, including Chinese papers.

By synthesizing 1429 estimates extracted from 61 English and Chinese studies　using advanced meta-analytic techniques, this paper contributes to the literature by revealing for the first time the nearly twofold difference in the rate of return between secondary and tertiary education in China. We also find that the test results for publication selection bias confirm that the literature contains genuine empirical evidence. Therefore, significant disparities in educational returns are highly probable. These results could allow us to better understand the wage gap between groups with various educational backgrounds in China.

The remainder of this paper is organized as follows: Section II describes the data and methods of the meta-analysis. Section III reports the results. Section IV summarizes the major findings and concludes the paper.

# II. Data and methods

The returns to secondary and tertiary education obtained by the regression of a Mincer-type wage function have been reported not only in the English literature, but also in many Chinese works. Therefore, following Guo and He (2020), we used EconLit, Web of Science, and the websites of major academic publishers to search for English literature, and the China National Knowledge Infrastructure database to search for Chinese literature.[1] We thus find 33 English papers and 28 Chinese papers that provide estimation results on the returns to upper secondary education (i.e., senior high school and technical high school) and tertiary education (college, university, and graduate school), using a lower secondary education (junior high and elementary school) and less as the reference (default) category. From these 61 studies, we extract 1,429 estimates.

These 1,429 estimates are transformed into partial correlation coefficients (PCCs) using the following formula with t-values and degrees of freedom to address the differences in the units of the wage variables adopted by the studies, with or without logarithmic transformation:

where tk and dfk denote the t-value and degree of freedom of the kth estimate, respectively. The standard error (SEk) of rk is given by $\sqrt{\frac{\left( 1 - r_{k}^{2} \right)}{{df}_{k}}}$. We adopt 0.048, 0.112, and 0.234 as the lowest thresholds of small, medium, and large effects, respectively, as proposed by Doucouliagos (2011) as evaluation criteria for PCCs in labor economics research.

The meta-analysis in this paper will be conducted in two stages: 1) a meta-synthesis of the collected estimates and 2) testing for publication selection bias. To synthesize the collected estimates, in addition to traditional fixed effects and random effects models, we employ the unrestricted weighted least squares average (UWA) method and the UWA synthesis of estimation results with statistical power greater than 0.80—that is, the weighted average of the adequately powered (WAAP) synthesis, as proposed by Stanley and Doucouliagos (2017) and Stanley et al. (2017). The UWA method regards as the synthesized effect size a point estimate obtained from the regression that takes the t-value as the dependent variable and the standard error as the independent variable. Specifically, we estimate the following equation, without the intercept term, and utilize the coefficient α as the synthesized value of the PCCs:

where ɛk is a residual term. Theoretically, α is completely consistent with the estimated value in a traditional fixed effects model. Its standard error, however, is more robust to heterogeneity as identified in the literature. Furthermore, according to Stanley et al. (2017), the influence of publication selection bias on the synthesis results of the WAAP is less than that of the random effects model, which means that the WAAP is more robust than the random effects model. Therefore, we adopt synthesis results using the WAAP method as the most reliable values of meta-synthesis.

To test for publication selection bias, in addition to visual examination using a funnel plot, we conduct a funnel asymmetry test (FAT), a precision-effect test (PET), as well as a precision-effect estimate with standard error (PEESE), as proposed by Stanley and Doucouliagos (2012) and used widely in recent meta-studies. This FAT–PET–PEESE procedure was developed to test publication selection bias and the presence of genuine evidence in a rigid manner.

The FAT can be performed by regressing the t-value of the kth estimate on the inverse of the standard error (1/SE) using the following equation and thereby testing the null hypothesis that the intercept term $\beta_{0}$ is equal to zero:

where $v_k$ is the error term. When the intercept term $\beta_{0}$ is statistically significantly different from zero, we can conclude that the distribution of estimates is asymmetric. Even if there is publication selection bias, a genuine effect could exist in the available empirical evidence. Stanley and Doucouliagos (2012) proposed examining this possibility by testing the null hypothesis that the coefficient $\beta_{1}$ is equal to zero in Eq. (3). The rejection of the null hypothesis implies the presence of genuine empirical evidence. The coefficient $\beta_{1}$ is the coefficient of precision and is therefore a PET.

Further, Stanley and Doucouliagos (2012) stated that an estimate of the publication selection bias–adjusted effect size can be obtained with the following equation, which has no intercept:

If the null hypothesis of $\gamma_{1} = 0$ is rejected, then the non-zero true effect does exist in the literature, and the coefficient $\gamma_{1}$ can be regarded as its estimate.

The method of estimating the genuine effect size using Eq. (4) is called the PEESE approach. To test the robustness of the regression coefficients obtained from the above FAT–PET–PEESE procedure, we estimate Eqs. (3) and (4) using not only the ordinary least squares ordinary least squares estimator, but also the other four models to address the heterogeneity in the literature.

# III. Results

Table 1 shows descriptive statistics of the PCCs of the collected estimates, and Figure 1 illustrates their kernel density estimations. As shown in Table 1, for both secondary and tertiary education, the mean and median of the collected estimates are positive, and the t-test strongly rejects the null hypothesis of a zero mean. In addition, Table 1 and Figure 1 show that the estimates of tertiary education are more skewed in the positive direction than are those of secondary education.[2] In the following sections, we examine whether the above observations are backed by a meta-analysis that considers the heterogeneity in the literature and publication selection bias.

Figure 1.Kernel density estimation of collected estimates

Notes: The vertical axis is the kernel density. The horizontal axis is the variable value. See Table 1 for the number of observations and descriptive statistics.

## A. Meta-synthesis of collected estimates

Table 2 reports the results of the meta-synthesis. The Cochran Q test for homogeneity and the I2 and H2 statistics reported in Column (b) strongly indicate the presence of heterogeneity in the literature for both secondary and tertiary education studies. Therefore, in the traditional synthesis estimations in Column (a), the synthesized values of the random effects model are adopted. Concerning the UWA synthesis results in Column (c), since there are a considerable number of estimates with statistical power 0.80 or above, WAAP synthesis values are preferred. As described in the previous section, we use the WAAP results because we consider them to be the most reliable reference synthesis values. The differences between the random effects and WAAP synthesis values, nevertheless, are not large.

The WAAP synthesis value for secondary education is 0.054, which is above the lower limit of 0.048, whereas, for tertiary education, it is 0.120, which is above the lower limit of the medium range of 0.112 according to the criteria of Doucouliagos (2011).[3] In other words, both secondary and tertiary education in China have a positive impact on the wage levels of graduates that is not only statistically significant, but also economically meaningful. There is also a large gap in the wage effect of education between these two groups. It is worth noting that the return to tertiary education is more than twice that to secondary education in terms of the PCC.

## B. Testing for publication selection bias

For us to accept the above synthesis results as factual findings, 61 selected studies must be free from publication selection bias, or, even if they are affected by publication selection, their empirical results must comprise genuine evidence. The purpose of the funnel plot and the FAT–PET–PEESE procedure is to verify this point.

Funnel plots for secondary and tertiary education studies are displayed in Panels (a) and (b) in Figure 2, respectively. If we assume that the WAAP synthesis value shown as a straight line in the figure approximates the true effect size, the estimates of secondary education are divided into left and right following the ratio 362:439 when 0.054 is used as the boundary line. Therefore, the null hypothesis that the number of right estimates equals the number of left estimates is rejected by the goodness-of-fit test (z = 6.013, p = 0.000). Consistent with the visual impression from Panel (a) of Figure 2, this univariate test result implies the presence of publication selection bias, which indicates a preference for reporting larger wage effects of secondary education in China. On the other hand, the estimates of tertiary education are divided into left and right following the ratio 415:297 between the boundary line of 0.120. Accordingly, the null hypothesis is rejected again (z = -4.422, p = 0.000), suggesting publication selection bias in the study of tertiary education as well.

Table 3 reports the results of the FAT–PET–PEESE procedure. With respect to secondary education, the FAT rejects the null hypothesis that the intercept (β0) is zero for all five models, implying that the collected estimates lack funnel symmetry due to strong publication selection bias. However, the PET rejects the null hypothesis that the coefficient of the inverse of the standard error (β1) is zero in all models, thus confirming credible evidence in the selected literature, despite publication selection bias. The results of the PEESE approach in Panel (b) show that, in all five models, the coefficients (γ1) of 1/SE are estimated to be statistically significant and, according to the estimation results, the true effect size of secondary education should be in a range of 0.0421–0.0527 in terms of the PCC.

With regard to tertiary education, the FAT results indicate that publication selection bias is unlikely in the selected studies, contrary to the results using the funnel plot. Furthermore, as in the case of the secondary education study, the PET rejects the null hypothesis in all five models, and the PEESE approach shows that the true effect size of tertiary education should be in a range of 0.1025–0.1198 in terms of the PCC.

Table 1.Descriptive statistics of the partial correlation coefficients, t-test, and Shapiro–Wilk normality test of collected estimates
Study type K Mean Median S.D. Max. Min. Kurt Skew t-test S-W test
Secondary education 717 0.086 0.067 0.101 0.672 -0.131 14.572 2.988 22.684*** 12.100†††
Tertiary education 712 0.121 0.101 0.105 0.700 -0.130 10.914 2.108 30.795*** 10.428†††

Notes: K represents the number of estimates, S.D. stands for standard deviation, and S-W test is the Shapiro-Wilk Normality test. *** denotes that the null hypothesis that mean is zero is rejected at the 1% level and ††† denotes that the null hypothesis of normal distribution is rejected at the 1% level respectively.

Table 2.Synthesis of estimates
 Study type K Traditional synthesis Heterogeneity test and measures Unrestricted weighted least squares average (UWA) FEZ-stata REZ-stata CQp-valueb I2stat H2statd UWAt-stata,e APEf WAAPt-stata Med SE Med power Secondary education 717 0.056***(150.66) 0.074***(34.83) 13385.650***(0.00) 96.57 29.16 0.056***(35.69) 388 0.054***(27.16) 0.019 0.835 Tertiary education 712 0.120***(277.83) 0.116***(36.48) 36426.250***(0.00) 98.02 50.47 0.120***(38.82) 640 0.120***(37.09) 0.019 1.000

Notes: *** denotes statistical significance at the 1% level. Here, K is the number of estimates, FE is Fixed Effects, RE is Random Effects, CQ is Cochran Q test of homogeneity, APE is the number of adequately powered estimates, WAAP is the weighted average adequately powered estimate, Med is the median, SE is the standard error respectively.
a Null hypothesis: The synthesized effect size is zero.
b Null hypothesis: Effect sizes are homogeneous.
c Ranges between 0 and 100% with larger scores indicating heterogeneity
d Takes zero in the case of homogeneity
e Synthesis method advocated by Stanley and Doucouliagos (2017) and Stanley et al. (2017)
f Denotes number of estimates with statistical power of 0.80 or more, which is computed by referring to the UWA of all collected estimates

Figure 2.Funnel plot of partial correlation coefficients

Note: The solid line indicates the synthesized effect size by WAAP estimation reported in Table 2.

The above results confirm the presence of genuine evidence in the selected literature for both secondary and tertiary education and almost no difference between the publication selection bias–adjusted effect sizes generated by the PEESE method and the WAAP synthesis values as reported in Table 2. In other words, both the meta-synthesis and the test for publication selection bias performed in this section uniformly indicate a notable gap in the rate of return between secondary and tertiary education in China.

# IV. Conclusion

In this paper, we performed a meta-analysis using 1429 empirical results reported in 61 English and Chinese studies to estimate the wage effects of secondary and tertiary education in China. The results show that the returns to both secondary and tertiary education are positive, and the effect size is small for secondary education in terms of the PCC, while it is medium for tertiary education. Moreover, the test results for publication selection bias reveal genuine empirical evidence in the collected estimates and that the effect sizes generated by the PEESE method are more or else consistent with the meta-synthesis values. We find that the impact of tertiary education on wage levels is about twice that of secondary education.

Based on the findings in this study, we conjecture that high-powered industrial upgrades and technological innovation in China recently have led to remarkable economic growth that could increase the demand for highly educated human resources. However, the wage gap between those with secondary and tertiary education could also increase income inequality among Chinese workers. Future research should pay more attention to the relation between industrial and technological evolution and income inequality as mediated by the significant gap in returns to education in China.

Table 3.Meta-regression analysis of publication selection bias: FAT-PET-PEESE approach
 Panel A: FAT﻿-﻿PET test Estimator U﻿-﻿WLS CRU﻿-﻿WLS MME RML CRRE Panel GLS CRFE Panel LSDV Model [1] [2] [3] [4] [5] Secondary education Intercept (FAT: H0: γ0 = 0) 2.029*** 2.029*** 2.112** 2.123** 2.284** (0.232) (0.613) (0.917) (0.931) (0.873) 1/SE (PET: H0: γ1 = 0) 0.042*** 0.042*** 0.040*** 0.040*** 0.038*** (0.004) (0.009) (0.011) (0.011) (0.012) K 717 717 717 717 717 R2 0.382 0.382 - 0.382 0.382 Tertiary education Intercept (FAT: H0: γ0 = 0) -0.046 -0.046 0.465 1.282 1.793 (0.642) (1.219) (3.147) (3.226) (3.810) 1/SE (PET: H0: γ1 = 0) 0.120*** 1.219*** 0.098** 0.097** 0.093* (0.012) (0.024) (0.044) (0.045) (0.056) K 712 712 712 712 712 R2 0.465 0.465 - 0.465 0.465 Panel B: PEESE approach Estimator U﻿-﻿WLS CRU﻿-﻿WLS MME﻿-﻿RML RE﻿-﻿Panel ML Population-averaged Panel GEE Model [6] [7] [8] [9] [10] Secondary education SE 29.804*** 29.804*** -12.180 -12.180 6.767 (3.387) (9.267) (8.131) (9.759) (6.350) 1/SE (H0: γ1= 0) 0.053*** 0.053*** 0.042*** 0.042*** 0.048*** (0.003) (0.009) (0.011) (0.003) (0.010) K 717 717 717 717 717 R2 0.659 0.659 - - - Tertiary education SE 1.150 1.150 3.834 3.834 11.793 (9.164) (22.749) (15.427) (3.834) (14.730) 1/SE (H0: γ1 = 0) 0.120*** 0.120*** 0.102*** 0.102*** 0.108*** (0.008) (0.018) (0.035) (0.006) (0.023) K 712 712 712 712 712 R2 0.679 0.679 - - -

Note: Figures in parentheses beneath the regression coefficients are the standard errors. Except for Model [9], robust standard errors are estimated. Here, U-WLS is Unrestricted WLS, CRU-WLS is Cluster-robust Unrestricted WLS, MME RML is Multi-level Mixed Effects RML, CRRE is Cluster-robust Random Effects, CRFE is Cluster-robust Fixed Effects and RE is Random Effects respectively. *** and ** denote statistical significance at the 1% and 5% levels, respectively.

1. The final literature search was conducted in March 2021.

2. The breakdown of estimates by sign and effect size is as follows: for secondary education, negative 44, positive and weak 205, positive and small 309, positive and medium 128, positive and large 31, and, for tertiary education, negative 29, positive and weak 136, positive and small 223, positive and medium 248, and positive and large 76.

3. The synthesized effect size of Churchill and Mishra (2018, p. 5911, Table 2) is 0.139 for tertiary education, using 205 estimates extracted from 26 English studies, which is somewhat larger than our result.