close

Вход

Забыли?

вход по аккаунту

код для вставкиСкачать
OCTOBER 29, 2013
AWK PRACTICE EXERCISES
Use awk to do the following:
(1) Extract from the snplist file (located in directory hpc13f52/example_files/datafiles) all snp
sites where at least 10 strains showed coverage at that site.
(2) Determine the mean nucleotide diversity across all SNP sites.
(3) How many sites have nucleotide diversity greater than 0.2?
(4) Dicty chromosomes can be indicated with a “DDB” id or the chromosome number. In
snpfile, I used the DDB ID. Generate a new version of the file that replaces the DDB
number with the chromosome number (see Table below). You don’t need to them all – just
verify that you can do it for one or two of chromosomes.
a) Using awk
b) Using emacs
Table 1. Mapping of DDB to Chromosome number in Dictyostelium
DDB number
Chromosome
DDB0169550
M
DDB0237465
R
DDB0215018
2F
DDB0215151
3F
DDB0220052
BF
DDB0232428
1
DDB0232429
2
DDB0232430
3
DDB0232431
4
DDB0232432
5
DDB0232433
6
A “pileup” file for a genome-resequencing project has the following columns: (1) chromosome,
(2) position, (3) nucleotide in the reference genome, (4) number of reads covering the site in
query genome, (5) nucleotide observed at each of the reads covering the site (“.” Same as
reference genome on +strand, “,” means same as reference genome on -strand), (6) quality
scores for each nucleotide covering the site.
Using QS73.pileup (in hpc13f52/example_files/datafiles):
(5) The tgrC1 gene is located on chromosome 3 from positions 3630672 to 3633406. Extract
the portion of the pileup file that cover the tgrC1 gene.
1/--страниц
Пожаловаться на содержимое документа