OCTOBER 29, 2013 AWK PRACTICE EXERCISES Use awk to do the following: (1) Extract from the snplist file (located in directory hpc13f52/example_files/datafiles) all snp sites where at least 10 strains showed coverage at that site. (2) Determine the mean nucleotide diversity across all SNP sites. (3) How many sites have nucleotide diversity greater than 0.2? (4) Dicty chromosomes can be indicated with a “DDB” id or the chromosome number. In snpfile, I used the DDB ID. Generate a new version of the file that replaces the DDB number with the chromosome number (see Table below). You don’t need to them all – just verify that you can do it for one or two of chromosomes. a) Using awk b) Using emacs Table 1. Mapping of DDB to Chromosome number in Dictyostelium DDB number Chromosome DDB0169550 M DDB0237465 R DDB0215018 2F DDB0215151 3F DDB0220052 BF DDB0232428 1 DDB0232429 2 DDB0232430 3 DDB0232431 4 DDB0232432 5 DDB0232433 6 A “pileup” file for a genome-resequencing project has the following columns: (1) chromosome, (2) position, (3) nucleotide in the reference genome, (4) number of reads covering the site in query genome, (5) nucleotide observed at each of the reads covering the site (“.” Same as reference genome on +strand, “,” means same as reference genome on -strand), (6) quality scores for each nucleotide covering the site. Using QS73.pileup (in hpc13f52/example_files/datafiles): (5) The tgrC1 gene is located on chromosome 3 from positions 3630672 to 3633406. Extract the portion of the pileup file that cover the tgrC1 gene.