Ancestry effects in Brazil: economist approach with person names

I have written a lot about ancestry approaches, generally from a psychometric and genomic angle. However, economists can also play this game.

This paper estimates the impact of culture on the academic performance of Brazilian students in standardized tests. Based on data with student identification, we apply an algorithm of surname classification that assigns the student, based on the surnames of his/her parents, to one of the following ancestry groups: Iberian, Japanese, Italian, Germanic, Eastern European and Syrian-Lebanese. We show that students with non-Iberian and Japanese European ancestry obtain statistically and substantively higher scores on 3 rd and 5 th grade standard Math tests, even with a large set of individual, family and municipal controls. We also tested the hypothesis of persistence of local institutions, established during the era of mass immigration in Brazil in the 19th and 20th centuries, and we showed that the mechanisms of family transmission of culture remain robust for students with Japanese and Italian ancestry.

Our results suggests that there is an Ancestry premium in educational performance. Using Iberian Brazilians as our reference group, we provide evidence that students bearing either non-Iberian Europeans (Germanic, Italians, Eastern Europeans) or Japanese ancestries achieve substantially higher scores in Mathematics standardized test conducted at 3rd and 5rd grades. Not only it seems to exist a gap, but as the students move up from 3rd and 5rd the gap gets wider. These results are robust to several specifications, including individual and family controls, as well as municipality and mobility controls. It is noteworthy, that the results stands while using classroom fixed effects (i.e, controlling for teacher quality, classroom size and so forth).

I asked my friend who sent me this whether there was more of that sort. His reply is given below.

I’ve been following this Brazilian economist for some time now. He developed a way to extract ancestry from Brazilian surnames in order to study the benefits of immigration and all of that. His first study using this method (mirror) showed that Japanese and Europeans receive greater wages:

This paper presents a method for classifying the ancestry of Brazilian surnames based on historical sources. The information obtained forms the basis for applying fuzzy matching and machine learning classification algorithms to more than 46 million workers in five categories: Iberian, Italian, Japanese, German and East European. The vast majority (96.4%) of the single surnames were identified using a fuzzy matching and the rest using a method proposed by Cavnar and Trenkle (1994). A comparison of the results of the procedures with data on foreigners in the 1920 Census and with the geographic distribution of non-Iberian surnames underscores the accuracy of the procedure.

His second study (mirror) using extracted ancestry is less about presenting the method and more about properly using it. He found that an increase in “cultural diversity” increases wages in a specific Brazilian state:

This paper estimates the long-term impact of immigration to Rio Grande do Sul/Brazil on contemporary wages. Based on a unique micro-data panel that includes the names of workers, we apply machine learning algorithms to classify surnames and infer each workers’ ancestry in order to calculate the inherited cultural diversity in the workforce by municipality. We address the endogeneity of cultural diversity by using three sets of instruments: distance to settlements created by the government for nonIberian immigrants between 1824-1918, share of street names with foreign surnames and share of foreigners in 1920. Our IV-estimations prove robust to human capital differences, institutions, geography, the spatial sorting of workers based on intrinsic abilities and the diffusion of knowledge through imports. The results clearly indicate that an increase in diversity – exclusively transmitted through the share of workers with non-Iberian ancestry – leads to a positive wage externality.

Yesterday he tweeted a draft of a new study (mirror) — an attempt to measure the impact of immigration in the income per capita in Brazil:

This paper estimates the effect of non-Iberian immigration to Brazil based on historical and contemporary microdata. The historical database encompasses over 1.7 million immigrant records; the contemporary has more than 165 million records. The estimation of immigrant numeracy suggests that Stolz, Baten e Botelho (2013) underestimated their skills and, therefore, their impact on Brazil. An algorithm classified the surnames of contemporary Brazilians according to their ancestral origins. Two counterfactual estimates are constructed in order to estimate the income per capita if there had never been any non-Iberian immigration. The first counterfactual is built upon the regression of income on the percentages of each ancestral group in municipalities. The second, results from the regression of individual wages on the surname ancestry of workers. The coefficients of these regressions are used to estimate the income of a couterfactual Brazil with no descendants of immigrants. It was estimated that in the absence of non-Iberian immigrants today’s income would be from 12.6 % to 17 % lower.

In this last one there’s a citation to the study I sent you earlier. It’s an interesting section, for genetics is always a step away, but it’s never mentioned. I suppose the authors do consider it, but know it would be a public relations disaster to write it down. Google-translated here:

Assuming that the estimates are correct, what are the channels that have made immigration impact on economic growth? The first candidate is, of course, human capital in its broadest sense. Lopes (2017) showed that 8-year-old children with non-Iberian ancestry obtained significantly better performance than white ones with Iberian surnames measured by standardized tests, even with controls for parents’ socioeconomic background and fixed effects by classroom. In substantive terms, in the case of those students with Japanese ancestors, the effect equals one more year of schooling in mathematics. It seems that there is an intergenerational transmission of human capital that is not adequately captured by the schooling proxies used in the regressions.

Ehrl and Monasterio (2017), in turn, analyzed the impact of long-term immigration on productivity through increased local skills diversity of workers in Rio Grande do Sul. The authors found quite robust effects. However, his approach is not able to fully explain the results found here. If diversity were the channel, local controls in individual regressions should capture the full effect of surname variables. The same logic indicates that explanations based on social capital or local institutions are also insufficient.

What other channels would make the estimates misleading? If higher wages are fully explained by firm-level discrimination in favor of non-Iberian surnames, the same econometric results would be obtained in the regressions at the individual level. However, in this case, it would be necessary to explain why municipalities with higher non-Iberian holdings are also associated with higher per capita incomes.

Also relevant:

I think person names are a very rich resource that has only barely been touched. It relates to all kinds of things.

Leave a Reply