{"id":6255,"date":"2016-10-15T23:06:59","date_gmt":"2016-10-15T22:06:59","guid":{"rendered":"http:\/\/emilkirkegaard.dk\/en\/?p=6255"},"modified":"2018-06-26T18:27:00","modified_gmt":"2018-06-26T17:27:00","slug":"sql-server-for-population-frequencies-from-1000-genomes","status":"publish","type":"post","link":"https:\/\/emilkirkegaard.dk\/en\/2016\/10\/sql-server-for-population-frequencies-from-1000-genomes\/","title":{"rendered":"SQL server for population frequencies from 1000 genomes"},"content":{"rendered":"<div class=\"container-fluid main-container\">\n<h4><strong>Note: 2018 June 26<\/strong><\/h4>\n<p>Server down right now, investigating.<\/p>\n<h4><strong>Note: August 16, 2017<\/strong><\/h4>\n<p>server IP changed to 67.207.92.10.<\/p>\n<\/div>\n<h4><strong>Original post<\/strong><\/h4>\n<div class=\"container-fluid main-container\">\n<p>We need dplyr for this:<\/p>\n<pre class=\"r\"><code>library(dplyr)<\/code><\/pre>\n<p>First, use the anon user to log into the SQL server (user = &#8220;anon&#8221;, pass = &#8220;&#8221;, ip = &#8220;67.207.92.10&#8221;, port = 3306):<\/p>\n<pre class=\"r\"><code>sql = src_mysql(\"population_freqs\", host = \"67.207.92.10\", user = \"anon\", port = 3306)<\/code><\/pre>\n<p>Select the 1000 genomes phase 3 table:<\/p>\n<pre class=\"r\"><code>sql_1kg = tbl(sql, \"1000genomes_phase3\")\r\n\r\n#look at the first 10 rows\r\nsql_1kg<\/code><\/pre>\n<pre><code>## Source:   query [?? x 35]\r\n## Database: mysql 10.0.27-MariaDB-0ubuntu0.16.04.1 [anon@67.207.92.10:\/population_freqs]\r\n## \r\n##      CHR       SNP    A1    A2    ACB    ASW    BEB    CDX    CEU    CHB\r\n##    &lt;int&gt;     &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;\r\n## 1      1 rs1000033     G     T 0.4062 0.3279 0.2151 0.3172 0.2071 0.3107\r\n## 2      1 rs1000050     C     T 0.6302 0.5246 0.2267 0.4247 0.1364 0.3932\r\n## 3      1 rs1000070     T     C 0.6146 0.5902 0.5116 0.4946 0.2828 0.3204\r\n## 4      1 rs1000073     G     A 0.4219 0.3689 0.2093 0.1989 0.6717 0.1456\r\n## 5      1 rs1000075     T     C 0.3750 0.4180 0.2733 0.3172 0.3586 0.2573\r\n## 6      1 rs1000085     C     G 0.0781 0.1393 0.0349 0.0645 0.2121 0.0437\r\n## 7      1 rs1000127     C     T 0.2500 0.2787 0.4012 0.4677 0.3081 0.5728\r\n## 8      1 rs1000184     C     G 0.0521 0.0574 0.2442 0.8333 0.2879 0.7767\r\n## 9      1 rs1000211     T     C 0.0260 0.0246 0.0000 0.0000 0.0000 0.0000\r\n## 10     1 rs1000212     A     G 0.0365 0.0246 0.0000 0.0000 0.0000 0.0000\r\n## # ... with more rows, and 25 more variables: CHS &lt;dbl&gt;, CLM &lt;dbl&gt;,\r\n## #   ESN &lt;dbl&gt;, FIN &lt;dbl&gt;, GBR &lt;dbl&gt;, GIH &lt;dbl&gt;, GWD &lt;dbl&gt;, IBS &lt;dbl&gt;,\r\n## #   ITU &lt;dbl&gt;, JPT &lt;dbl&gt;, KHV &lt;dbl&gt;, LWK &lt;dbl&gt;, MSL &lt;dbl&gt;, MXL &lt;dbl&gt;,\r\n## #   PEL &lt;dbl&gt;, PJL &lt;dbl&gt;, PUR &lt;dbl&gt;, STU &lt;dbl&gt;, TSI &lt;dbl&gt;, YRI &lt;dbl&gt;,\r\n## #   EAS &lt;dbl&gt;, EUR &lt;dbl&gt;, AFR &lt;dbl&gt;, AMR &lt;dbl&gt;, SAS &lt;dbl&gt;<\/code><\/pre>\n<p>The entire file is really large, about 3.6 GB in memory. You often only need a few (1-1000) SNPs, so let\u2019s try downloading only a few:<\/p>\n<pre class=\"r\"><code>#first 10 of the hits from the latest height GWAS\r\n#http:\/\/www.nature.com\/ng\/journal\/v46\/n11\/full\/ng.3097.html\r\nsome_snps = c(\"rs425277\", \"rs9434723\", \"rs10779751\", \"rs2284746\", \"rs12137162\", \r\n\"rs212524\", \"rs1014987\", \"rs2806561\", \"rs4601530\", \"rs926438\")\r\n\r\n#fetch from SQL server\r\n(sql_height_freqs = sql_1kg %&gt;% filter(SNP %in% some_snps) %&gt;% collect())<\/code><\/pre>\n<pre><code>## # A tibble: 10 x 35\r\n##      CHR        SNP    A1    A2    ACB    ASW    BEB    CDX    CEU    CHB\r\n##    &lt;int&gt;      &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;\r\n## 1      1  rs1014987     G     C 0.1458 0.1475 0.1802 0.5000 0.2525 0.4612\r\n## 2      1 rs10779751     A     G 0.6406 0.5984 0.1279 0.0591 0.2677 0.0631\r\n## 3      1 rs12137162     A     C 0.1198 0.1557 0.2384 0.1935 0.2677 0.2621\r\n## 4      1   rs212524     T     C 0.1823 0.1967 0.3605 0.2473 0.4293 0.1796\r\n## 5      1  rs2284746     G     C 0.1927 0.1475 0.4012 0.1882 0.5657 0.2573\r\n## 6      1  rs2806561     A     G 0.3073 0.3934 0.3488 0.5591 0.5909 0.5388\r\n## 7      1   rs425277     T     C 0.0990 0.0984 0.3023 0.1452 0.2576 0.1990\r\n## 8      1  rs4601530     T     C 0.4271 0.3852 0.4535 0.5538 0.2475 0.4515\r\n## 9      1   rs926438     T     C 0.8177 0.7541 0.2093 0.1774 0.5455 0.3010\r\n## 10     1  rs9434723     A     G 0.2240 0.2541 0.1744 0.0538 0.1414 0.0922\r\n## # ... with 25 more variables: CHS &lt;dbl&gt;, CLM &lt;dbl&gt;, ESN &lt;dbl&gt;, FIN &lt;dbl&gt;,\r\n## #   GBR &lt;dbl&gt;, GIH &lt;dbl&gt;, GWD &lt;dbl&gt;, IBS &lt;dbl&gt;, ITU &lt;dbl&gt;, JPT &lt;dbl&gt;,\r\n## #   KHV &lt;dbl&gt;, LWK &lt;dbl&gt;, MSL &lt;dbl&gt;, MXL &lt;dbl&gt;, PEL &lt;dbl&gt;, PJL &lt;dbl&gt;,\r\n## #   PUR &lt;dbl&gt;, STU &lt;dbl&gt;, TSI &lt;dbl&gt;, YRI &lt;dbl&gt;, EAS &lt;dbl&gt;, EUR &lt;dbl&gt;,\r\n## #   AFR &lt;dbl&gt;, AMR &lt;dbl&gt;, SAS &lt;dbl&gt;<\/code><\/pre>\n<p>All the atomic populations are there as well as the 5 super populations (\u2018macro races\u2019). The numbers for the super populations differ slightly from those that can be seen on ensembl because they used weighted means and I used unweighted means.<\/p>\n<\/div>\n<p><script>\/\/ <![CDATA[ \/\/ add bootstrap table styles to pandoc tables $(document).ready(function () { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); }); \/\/ ]]><\/script><\/p>\n<p><!-- dynamically load mathjax for compatibility with self-contained --><br \/>\n<script>\/\/ <![CDATA[ (function () { var script = document.createElement(\"script\"); script.type = \"text\/javascript\"; script.src = \"https:\/\/cdn.mathjax.org\/mathjax\/latest\/MathJax.js?config=TeX-AMS-MML_HTMLorMML\"; document.getElementsByTagName(\"head\")[0].appendChild(script); })(); \/\/ ]]><\/script><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Note: 2018 June 26 Server down right now, investigating. Note: August 16, 2017 server IP changed to 67.207.92.10. Original post We need dplyr for this: library(dplyr) First, use the anon user to log into the SQL server (user = &#8220;anon&#8221;, pass = &#8220;&#8221;, ip = &#8220;67.207.92.10&#8221;, port = 3306): sql = src_mysql(&#8220;population_freqs&#8221;, host = &#8220;67.207.92.10&#8221;, [&hellip;]<\/p>\n","protected":false},"author":17,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2418,2382],"tags":[2419,2210,2420,2190,2421],"class_list":["post-6255","post","type-post","status-publish","format-standard","hentry","category-population-genetics","category-r","tag-1000-genomes","tag-gwas","tag-population-frequencies","tag-snp","tag-sql","entry"],"_links":{"self":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/6255","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/comments?post=6255"}],"version-history":[{"count":5,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/6255\/revisions"}],"predecessor-version":[{"id":7278,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/6255\/revisions\/7278"}],"wp:attachment":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/media?parent=6255"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/categories?post=6255"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/tags?post=6255"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}