UTILIZATION OF THE CHANNEL CATFISH GENOME ASSEMBLY IN A SELECTIVE BREEDING PROGRAM.  

Geoffrey C. Waldbieser* and Brian G. Bosworth
USDA, Agricultural Research Service, Warmwater Aquaculture Research Unit, Stoneville, MS USA
Geoff.Waldbieser@ars.usda.gov

We have produced a catfish genome sequence assembly in cooperation with scientists from Auburn University in order to support genomic research in channel catfish.  The assembly is based on Illumina sequencing data obtained from a homozygous, doubled haploid channel catfish. Contiguous sequence was assembled using the MaSuRCA pipeline, and contigs were scaffolded using paired Illumina sequences from libraries of 3 kb, 8 kb, and 34 kb insert size. The final phase of automated assembly utilized Illumina and PacBio sequence to fill intrascaffold gaps, and these gaps comprised only 1.4% of the assembly. Scaffolds were aligned to the high density genetic map and scaffolding errors were corrected manually. The resulting assembly contained 645 scaffolds (761 Mb) linked to chromosomes with an average scaffold length of 1.2 Mb.  An additional 9,408 small scaffolds (average length 2.7 kb) contained 22 Mb that could not be linked to chromosomes. The homozygous reference genome, PCR-Free Illumina libraries, long PacBio sequences, and scaffold correction guided by the genetic map contributed to a high level of contiguity and accuracy of the assembly.

In order to identify sequence variation within our Delta Select breeding population, genomic DNA was isolated from 48 founder animals, sheared, and fragments from 500-600bp length were isolated with a BluePippin instrument. PCR-Free libraries were produced and sequenced on the Illumina NextSeq 500 platform to provide 25 to 40 million paired reads per animal. Sequences were aligned to the reference genome and processed through the Genome Analysis Toolkit (GATK) variant discovery pipeline for identification of single nucleotide polymorphisms (SNP) and structural variants, such as insertions and deletions, that are segregating in this population. Twelve million raw SNPs were filtered to remove strand bias, excessive depth of coverage, and loci within 50 bp of other SNPs or structural variants. This resulted in 1.6 million SNPs with a minor allele frequency of at least 0.05.

Integrating the genome assembly with the genetic map thus increased the number of localized SNP markers with higher accuracy in marker position along the scaffold. For example, chromosome 1 contained 1,016 SNP markers on the genetic map, but 79,273 SNP markers on the chromosome 1 scaffolds. Thus the number of SNPs per centimorgan distance increased from 4 to 570 on this chromosome. Current research involves the independent validation of SNP markers through a genotyping array. A SNP array will then be designed for genotyping of phenotyped animals to resolve genomic relationships between tested animals in order to improve the accuracy of estimated breeding values. We are also identifying larger structural variants in the Delta Select population and smaller structural variants (such as indels and microsatellite loci) useful for targeted analysis.