SNP calling by sequencing pooled samples

Miguel Perez-Enciso, Anna Esteve-Codina, Luca Ferretti, Emanuele Raineri, Bruno Nevado, Simon Heath

Producció científica: Contribució a revistaArticleRecercaAvaluat per experts

54 Cites (Scopus)

Resum

Performing high throughput sequencing on samples pooled from different individuals is a strategy to characterize genetic variability at a small fraction of the cost required for individual sequencing. In certain circumstances some variability estimators have even lower variance than those obtained with individual sequencing. SNP calling and estimating the frequency of the minor allele from pooled samples, though, is a subtle exercise for at least three reasons. First, sequencing errors may have a much larger relevance than in individual SNP calling: while their impact in individual sequencing can be reduced by setting a restriction on a minimum number of reads per allele, this would have a strong and undesired effect in pools because it is unlikely that alleles at low frequency in the pool will be read many times. Second, the prior allele frequency for heterozygous sites in individuals is usually 0.5 (assuming one is not analyzing sequences coming from, e.g. cancer tissues), but this is not true in pools: in fact, under the standard neutral model, singletons (i.e. alleles of minimum frequency) are the most common class of variants because P(f) ∝ 1/f and they occur more often as the sample size increases. Third, an allele appearing only once in the reads from a pool does not necessarily correspond to a singleton in the set of individuals making up the pool, and vice versa, there can be more than one read - or, more likely, none - from a true singleton
Idioma originalAnglès
RevistaBMC Bioinformatics
Volum13
DOIs
Estat de la publicacióPublicada - 2012

Fingerprint

Navegar pels temes de recerca de 'SNP calling by sequencing pooled samples'. Junts formen un fingerprint únic.

Com citar-ho