{"id":9377,"date":"2021-04-13T23:27:06","date_gmt":"2021-04-13T22:27:06","guid":{"rendered":"https:\/\/emilkirkegaard.dk\/en\/?p=9377"},"modified":"2021-04-27T03:24:36","modified_gmt":"2021-04-27T02:24:36","slug":"different-researchers-same-dataset-and-questions-what-happens","status":"publish","type":"post","link":"https:\/\/emilkirkegaard.dk\/en\/2021\/04\/different-researchers-same-dataset-and-questions-what-happens\/","title":{"rendered":"Different researchers, same dataset and questions: what happens?"},"content":{"rendered":"<p>A new preprint is out on a massive many teams study. In this design, a single dataset (or database) is given to a bunch of researchers, and they are asked to answer some questions (research hypotheses). They are left to their own choices in how to deal with the data, transformations, outliers, models, and so on. These choices are called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Researcher_degrees_of_freedom\">researcher degrees of freedom<\/a> (RDF) by analogy with <a href=\"https:\/\/en.wikipedia.org\/wiki\/Degrees_of_freedom_(statistics)\">statistical degrees of freedom<\/a>. Let&#8217;s look at the three studies so far (<a href=\"https:\/\/emilkirkegaard.dk\/en\/2020\/08\/against-trust-in-neuroscience\/\">I covered one in the prior post against neuroscience<\/a>):<\/p>\n<ul>\n<li>\n<div class=\"gs_citr\" tabindex=\"0\">Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., &#8230; &amp; Nosek, B. A. (2018). <a href=\"https:\/\/journals.sagepub.com\/doi\/10.1177\/2515245917747646\">Many analysts, one data set: Making transparent how variations in analytic choices affect results<\/a>. <i>Advances in Methods and Practices in Psychological Science<\/i>, <i>1<\/i>(3), 337-356.<\/div>\n<\/li>\n<\/ul>\n<blockquote>\n<div class=\"abstractSection abstractInFull\">\n<p>Twenty-nine teams involving 61 analysts used the same data set to address the same research question: whether soccer referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players. Analytic approaches varied widely across the teams, and the estimated effect sizes ranged from 0.89 to 2.93 (<i>Mdn<\/i> = 1.31) in odds-ratio units. Twenty teams (69%) found a statistically significant positive effect, and 9 teams (31%) did not observe a significant relationship. Overall, the 29 different analyses used 21 unique combinations of covariates. Neither analysts\u2019 prior beliefs about the effect of interest nor their level of expertise readily explained the variation in the outcomes of the analyses. Peer ratings of the quality of the analyses also did not account for the variability. These findings suggest that significant variation in the results of analyses of complex data may be difficult to avoid, even by experts with honest intentions. Crowdsourcing data analysis, a strategy in which numerous research teams are recruited to simultaneously investigate the same research question, makes transparent how defensible, yet subjective, analytic choices influence research results.<\/p>\n<\/div>\n<\/blockquote>\n<div class=\"abstractSection abstractInFull\">\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-9380\" src=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-26-11.png\" alt=\"\" width=\"1012\" height=\"675\" srcset=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-26-11.png 1012w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-26-11-300x200.png 300w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-26-11-768x512.png 768w\" sizes=\"auto, (max-width: 1012px) 100vw, 1012px\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-9381\" src=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-26-03.png\" alt=\"\" width=\"946\" height=\"588\" srcset=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-26-03.png 946w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-26-03-300x186.png 300w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-26-03-768x477.png 768w\" sizes=\"auto, (max-width: 946px) 100vw, 946px\" \/><\/p>\n<\/div>\n<ul>\n<li>\n<div class=\"gs_citr\" tabindex=\"0\">Botvinik-Nezer, R., Holzmeister, F., Camerer, C. F., Dreber, A., Huber, J., Johannesson, M., &#8230; &amp; Rieck, J. R. (2020). <a href=\"https:\/\/www.nature.com\/articles\/s41586-020-2314-9\">Variability in the analysis of a single neuroimaging dataset by many teams.<\/a> <i>Nature<\/i>, <i>582<\/i>(7810), 84-88.<\/div>\n<\/li>\n<\/ul>\n<blockquote>\n<div id=\"Abs1-content\" class=\"c-article-section__content\">\n<p>Data analysis workflows in many scientific domains have become increasingly complex and flexible. Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging\u00a0by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses<sup><a id=\"ref-link-section-d17418e5435\" title=\"Botvinik-Nezer, R. et al. fMRI data of mixed gambles from the Neuroimaging Analysis Replication and Prediction Study. Sci. Data 6, 106 (2019).\" href=\"https:\/\/www.nature.com\/articles\/s41586-020-2314-9#ref-CR1\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 1\">1<\/a><\/sup>. The flexibility of analytical approaches is exemplified by the fact that no two teams chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset<sup><a id=\"ref-link-section-d17418e5439\" title=\"Dreber, A. et al. Using prediction markets to estimate the reproducibility of scientific research. Proc. Natl Acad. Sci. USA 112, 15343\u201315347 (2015).\" href=\"https:\/\/www.nature.com\/articles\/s41586-020-2314-9#ref-CR2\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\">2<\/a>,<a id=\"ref-link-section-d17418e5439_1\" title=\"Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433\u20131436 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41586-020-2314-9#ref-CR3\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\">3<\/a>,<a id=\"ref-link-section-d17418e5439_2\" title=\"Camerer, C. F. et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat. Hum. Behav. 2, 637\u2013644 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-020-2314-9#ref-CR4\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\">4<\/a>,<a id=\"ref-link-section-d17418e5442\" title=\"Forsell, E. et al. Predicting replication outcomes in the Many Labs 2 study. J. Econ. Psychol. 75, 102117 (2019).\" href=\"https:\/\/www.nature.com\/articles\/s41586-020-2314-9#ref-CR5\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\">5<\/a><\/sup>. Our findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for performing and reporting multiple analyses of the same data. Potential approaches that could be used to mitigate issues related to analytical variability are discussed.<\/p>\n<\/div>\n<\/blockquote>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-9379\" src=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-24-12.png\" alt=\"\" width=\"441\" height=\"565\" srcset=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-24-12.png 441w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-24-12-234x300.png 234w\" sizes=\"auto, (max-width: 441px) 100vw, 441px\" \/><\/p>\n<div id=\"Abs1-content\" class=\"c-article-section__content\">\n<ul>\n<li>\n<div class=\"gs_citr\" tabindex=\"0\">Breznau, N., Rinke, E. M., Wuttke, A., Adem, M., Adriaans, J., Alvarez-Benjumea, A., &#8230; &amp; van der Linden, M. (2021). <a href=\"https:\/\/osf.io\/preprints\/metaarxiv\/cd5j9\/\">Observing Many Researchers using the Same Data and Hypothesis Reveals a Hidden Universe of Data Analysis<\/a>. MetaArxiv<\/div>\n<\/li>\n<\/ul>\n<blockquote>\n<p class=\"abstract \">Across scientific disciplines, recent studies unambiguously find that different researchers testing the same hypothesis using the same data come to widely differing results. Presumably this outcome variability derives from different research steps, but this has yet to be tested. In a controlled study we open the black box of research by observing 73 research teams as they independently conduct a same-data, same-hypothesis study. We find that major research steps explain at most 2.6% of total variance in effect sizes and 10% of the deviance in subjective conclusions. Expertise, prior beliefs and attitudes explain even less. Each generated model was unique, which points to a vast universe of research design variability normally hidden from view in the presentation, consumption, and perhaps even creation of scientific results.<\/p>\n<\/blockquote>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-9378\" src=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-22-29.png\" alt=\"\" width=\"888\" height=\"808\" srcset=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-22-29.png 888w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-22-29-300x273.png 300w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2021-04-14-00-22-29-768x699.png 768w\" sizes=\"auto, (max-width: 888px) 100vw, 888px\" \/><\/p>\n<p>The conclusions are sobering:<\/p>\n<ul>\n<li>There is often large variation in results from analysis of the same dataset, even when researchers follow what they consider to be normal methods.<\/li>\n<li>Most datasets are only ever analyzed by a single team. This means you are looking at the draw of one result from the distribution of possible results owing to RDF from some combination of dataset and research question.<\/li>\n<li>In fact, most datasets <em>can<\/em> only be analyzed by a single team, because the datasets are not properly shared. <a href=\"https:\/\/www.cell.com\/current-biology\/fulltext\/S0960-9822(13)01400-0\">In facter, most datasets are lost to history<\/a>, so they cannot even potentially be reanalyzed even by the same authors.<\/li>\n<li>Various differences between researchers will be reflected in their choices. This is where political bias comes in. One can often try more than one specification and report preferably the ones that produce the best results for a given interest. This bias is inevitable, and can only be combated by pre-registration of exact methods. For many datasets this is not possible since there are too many choices in a given analysis, one cannot make them all before looking at the data.<\/li>\n<li><strong>A prudent reader of science takes this uncertainty into account. Conclusions are more reliable when they are produced from diverse datasets and by many people, especially diverse people.<\/strong><\/li>\n<\/ul>\n<hr \/>\n<h3>Edited to add: 2021-04-27<\/h3>\n<p>There is a new <a href=\"https:\/\/www.researchgate.net\/publication\/351061145_Guidance_for_Multi-Analyst_Studies\">preprint<\/a> with some guidelines to doing these studies. More useful to us, it contains a longer list of these, more I didn&#8217;t know about. Here they are, bold are the ones I highlighted above:<\/p>\n<ul>\n<li><span dir=\"ltr\">Bastiaansen, J. A. <\/span><span dir=\"ltr\">et al. <\/span><a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S002239992030773X\"><span dir=\"ltr\">Time to get personal? The impact of researchers choices on <\/span><span dir=\"ltr\">th<\/span><\/a><span dir=\"ltr\">e selection of treatment targets using the experience sampling methodology. <\/span><span dir=\"ltr\">J. <\/span><span dir=\"ltr\">Psychosom. Res. <\/span><span dir=\"ltr\">137, 110211 (2020).<\/span><\/li>\n<li><span dir=\"ltr\">Dongen, N. N. N. van <\/span><span dir=\"ltr\">et al. <\/span><a href=\"https:\/\/www.tandfonline.com\/doi\/full\/10.1080\/00031305.2019.1565553\"><span dir=\"ltr\">Multiple Perspectives on Inference for Two Simple <\/span><span dir=\"ltr\">Statistical Scenarios. <\/span><\/a><span dir=\"ltr\">Am. Stat. <\/span><span dir=\"ltr\">73, 328<\/span><span dir=\"ltr\">\u2013<\/span><span dir=\"ltr\">339 (2019).<\/span><\/li>\n<li><span dir=\"ltr\">Sa<\/span><span dir=\"ltr\">lganik, M. J. <\/span><span dir=\"ltr\">et al. <\/span><span dir=\"ltr\"><a href=\"https:\/\/www.pnas.org\/content\/117\/15\/8398\">Measuring the predictability of life outcomes with a scientific mass collaboration<\/a>.<\/span> <span dir=\"ltr\">Proc. Natl. Acad. Sci. <\/span><span dir=\"ltr\">117, 8398<\/span><span dir=\"ltr\">\u2013<\/span><span dir=\"ltr\">8403 (2020).<\/span><\/li>\n<li><span dir=\"ltr\">Silberzahn, R. <\/span><span dir=\"ltr\">et al. <\/span><a href=\"https:\/\/journals.sagepub.com\/doi\/10.1177\/2515245917747646\"><span dir=\"ltr\">Many Analysts, One Data Set: Making Transparent How <\/span><span dir=\"ltr\">Variations in Analytic Choi<\/span><span dir=\"ltr\">ces Affect Results. <\/span><\/a><span dir=\"ltr\">Adv. Methods Pract. Psychol. Sci. <\/span><span dir=\"ltr\">1, 337<\/span><span dir=\"ltr\">\u2013<\/span><span dir=\"ltr\">356 (2018).<\/span><\/li>\n<li><span dir=\"ltr\">Botvinik<\/span><span dir=\"ltr\">&#8211;<\/span><span dir=\"ltr\">Nezer, R. <\/span><span dir=\"ltr\">et al. <\/span><a href=\"https:\/\/www.nature.com\/articles\/s41586-020-2314-9\"><span dir=\"ltr\">Variability in the analysis of a single neuroimaging dataset <\/span><\/a><span dir=\"ltr\"><a href=\"https:\/\/www.nature.com\/articles\/s41586-020-2314-9\">by many teams<\/a>. <\/span><span dir=\"ltr\">Nature <\/span><span dir=\"ltr\">582, 84<\/span><span dir=\"ltr\">\u2013<\/span><span dir=\"ltr\">88 (2020).<\/span><\/li>\n<li><span dir=\"ltr\">Dutilh, G. <\/span><span dir=\"ltr\">et al. <\/span><a href=\"https:\/\/link.springer.com\/article\/10.3758\/s13423-017-1417-2\"><span dir=\"ltr\">The Quality of Response Time <\/span><span dir=\"ltr\">Data Inference: A Blinded, <\/span><span dir=\"ltr\">Collaborative Assessment of the Validity of Cognitive Models. <\/span><\/a><span dir=\"ltr\">Psychon. Bull. Rev. <\/span><span dir=\"ltr\">26, <\/span><span dir=\"ltr\">1051<\/span><span dir=\"ltr\">\u2013<\/span><span dir=\"ltr\">1069 (2019).<\/span><\/li>\n<li><span dir=\"ltr\">Fillard, P. <\/span><span dir=\"ltr\">et al. <\/span><a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/21256221\/\"><span dir=\"ltr\">Quantitative evaluation of 10 tractography algorithms on a realistic <\/span><\/a><span dir=\"ltr\"><a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/21256221\/\">diffusion MR phantom<\/a>. <\/span><span dir=\"ltr\">NeuroImage <\/span><span dir=\"ltr\">56, 220<\/span><span dir=\"ltr\">\u2013<\/span><span dir=\"ltr\">234 (2011).<\/span><\/li>\n<li><span dir=\"ltr\">Starns, J. J. <\/span><span dir=\"ltr\">et al. <\/span><a href=\"https:\/\/journals.sagepub.com\/doi\/full\/10.1177\/2515245919869583\"><span dir=\"ltr\">Assessing theoretical conclusions with blinded inference to <\/span><span dir=\"ltr\">investigate a potential inference crisis. <\/span><\/a><span dir=\"ltr\">Adv. Methods Pract. Psychol. Sci. <\/span><span dir=\"ltr\">2, 335<\/span><span dir=\"ltr\">\u2013<\/span><span dir=\"ltr\">349 <\/span><span dir=\"ltr\">(2019).<\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>A new preprint is out on a massive many teams study. In this design, a single dataset (or database) is given to a bunch of researchers, and they are asked to answer some questions (research hypotheses). They are left to their own choices in how to deal with the data, transformations, outliers, models, and so [&hellip;]<\/p>\n","protected":false},"author":17,"featured_media":9378,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2049],"tags":[2933,2932],"class_list":["post-9377","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-metascience","tag-many-teams","tag-researcher-degrees-of-freedom","entry","has-media"],"_links":{"self":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/9377","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/comments?post=9377"}],"version-history":[{"count":3,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/9377\/revisions"}],"predecessor-version":[{"id":9412,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/9377\/revisions\/9412"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/media\/9378"}],"wp:attachment":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/media?parent=9377"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/categories?post=9377"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/tags?post=9377"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}