View ORCID ProfileAlexander Brandt, View ORCID ProfilePatrick Tran Van, View ORCID ProfileChristian Bluhm, Yoann Anselmetti, View ORCID ProfileZoé Dumas, Emeric Figuet, View ORCID ProfileClémentine M. François, Nicolas Galtier, View ORCID ProfileBastian Heimburger, View ORCID ProfileKamil S. Jaron, View ORCID ProfileMarjorie Labédan, View ORCID ProfileMark Maraun, View ORCID ProfileDarren J. Parker, View ORCID ProfileMarc Robinson-Rechavi, View ORCID ProfileIna Schaefer, View ORCID ProfilePaul Simion, View ORCID ProfileStefan Scheu, View ORCID ProfileTanja Schwander, and View ORCID ProfileJens Bast
Edited by Michael R. Strand, University of Georgia, Athens, GA, and approved July 28, 2021 (received for review January 25, 2021)
Reference Genome Assemblies and Contaminant Removal.
You are watching: Which of the following is least like the other three?
For genome sequencing, extracted DNA from single individuals was amplified in two independent reactions using the SYNGIS TruePrime WGA kit and then pooled. Four libraries were generated for each reference genome (three paired end libraries with average insert sizes of 180, 350, and 550 bp, respectively, and a mate-pair library with 3,000-bp insert size). Libraries were prepared using the Illumina TruSeq DNA or Nextera Mate Pair Library Prep Kits, following manufacturer instructions, and sequenced on the Illumina HiSeq 2500 system, using v4 chemistry and 2× 125-bp reads at FASTERIS SA. This resulted in a total number of 451*106 reads for O. nova with a total read coverage of 490-fold and 387*106 reads for O. subpectinata with a total read coverage of 420-fold (for details, see SI Appendix, Table S2). Read quality trimming and adapter clipping of paired reads were done using Trimmomatic v0.36 (59) with the following options: ILLUMINACLIP:/all-PE.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:3:20 MINLEN:100. This resulted in 56% and 46% surviving read pairs (for details, see SI Appendix, Table S2). For mate pair quality trimming, Nxtrim v0.4.1 (60) with options–separate–preserve-mp–minlength 40, followed by Trimmomatic v0.36 with options ILLUMINACLIP:/all-PE.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:60 were used to identify properly paired reads and to remove low-quality bases and adapters. This resulted in 54% and 48% surviving read mate pairs (for details, see SI Appendix, Table S2).
With the available read data, we tested a range of assembly strategies. The best assemblies were generated using normalized overlapped reads, because whole-genome amplification introduces overrepresented genomic regions, which leads to coverage bias that is problematic for assembly. Overlapped read libraries were generated by merging the paired forward and reverse reads of the 180-bp read libraries and additionally merging unpaired reads, followed by normalization using BBnorm v37.82 (61). These normalized overlap read libraries were assembled into contigs using SPAdes v3.10.1, a multi k-mer assembler (62), with options -m 400–careful -k 21, 33, 55, 77, 99, 111, 127. The resulting contigs were ordered into scaffolds using the 350-, 500-, and 3,000-bp read libraries using SSPACE v3.0 (63) with default parameters. To close gaps emerging during scaffolding, GapCloser v1.12 (64) with option -l 125 was run. For details, see https://github.com/AsexGenomeEvol/HD_Oppiella: assembly and mites.
Scaffolds that were likely from contaminants (e.g., bacteria, fungi) were removed by first annotating and visualizing contaminations using BlobTools v1.0 (65), followed by custom filtering. For this, coverage of each scaffold was estimated by mapping reads back to the scaffolds using bwa mem v0.7.15 (66) and coverage calculated with BBTools v73.82 (61). Additionally, for annotation, scaffolds were blasted using ncbi-blast v2.7.1+ blastn with options -outfmt “6 qseqid staxids bitscore evalue std sscinames sskingdoms stitle” -max_target_seqs 10 -max_hsps 1 -evalue 1e-25 against the nt database v 2016–06. Scaffolds without hits to metazoans were filtered out from the assemblies using a custom script (see https://github.com/AsexGenomeEvol/HD_Oppiella: contamination_filtration.py). Next, scaffolds were sorted by decreasing length, scaffold headers renamed and scaffolds shorter than 500 bp removed, resulting in the final assemblies (v03). The assemblies were checked for quality and completeness by calculating standard genome statistics and by checking presence, fragmentation, and duplication of arthropod core genes using CEGMA v2.5 and BUSCO v3.0.2 (67, 68). For details, see SI Appendix, Table S1.
Pairwise Divergence between Sister Species.
Transcript sequences were reconstructed from annotated reference genomes using GffRead (option -w) (78). Single-copy orthologs were identified using Orthofinder