fastp: the FASTQ pre-processor

hoffmanlab 1,011 views 28 slides Feb 10, 2022
Slide 1
Slide 1 of 28
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28

About This Presentation

Hoffman Lab Tech Talk


Slide Content

fastp : the FASTQ pre-processor Coby Viner Lab meeting: tech – Wednesday, June 30, 2021 Chen, Zhou, Chen, and Gu. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 34:i884–90 (2018). https://github.com/OpenGene/fastp

FASTQ output

Head et al. Library construction for next-generation sequencing: Overviews and challenges. BioTechniques 56 (2014).

Adapters Head et al. Library construction for next-generation sequencing: Overviews and challenges. BioTechniques 56 (2014).

Adapters https://support.illumina.com/bulletins/2020/06/illumina-adapter-portfolio.html https://support.illumina.com/bulletins/2016/12/what-sequences-do-i-use-for-adapter-trimming.html

Quality trimming https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html Read position (bp) Quality score

fastp Chen, et al. Bioinformatics , 34:i884–90 (2018).

fastp Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018).

fastp: auto. adapter trimming Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018).

fastp Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018).

fastp: base correction Looks for read overlaps If mismatches found within overlap: Only corrects if imbalanced quality score Only corrects if total mismatches below threshold Reduces false corrections Default: 5 Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018).

fastp: sliding window QC trim. The window can slide from either read direction Evaluates average quality score within the window If below threshold, discarded and move forward If above threshold, trimming ends Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018).

fastp Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018).

fastp: polyG/polyX tail trimming https://sequencing.qcfail.com/articles/illumina-2-colour-chemistry-can-overcall-high-confidence-g-bases/ PolyGs are common in 2-colour sequencing NextSeq/NovaSeq, but not HiSeq (4-colour)

fastp: polyG correction example Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018).

fastp: polyX Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018). Can be enabled, to trim low-complexity A/T/G/C, at 3′ end of read Enable via: -x / --trim_poly_x Can also use: --poly_x_min_len

Unique Molecular IDs (UMIs) https://dnatech.genomecenter.ucdavis.edu/wp-content/uploads/2020/01/UMIs3.png

fastp: UMIs -U / --umi --umi_loc index1/index2/read1/read2/per_index/per_read --umi_len --umi_prefix f (e.g., UMI_AATTCG) --umi_skip Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018).

fastp: additional features Output splitting by file lines or line numbers Duplication evaluation Overrepresented seq. analysis ( -p ) FASTQC only tracks the first 1M reads fastp performs uniform sampling Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018).

fastp: duplication example Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018).

fastp: overrep. seq. example Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018).

fastp is very fast (C++; multi-threaded) Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018).

Chen, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34:i884–90 (2018). fastp trims well

fastp does not convert qualities https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html Supports phred64 scoring (converts to phread33), via -6 / --phred64

fastp: MultiQC integration https://gannet.fish.washington.edu/Atumefaciens/20200414_cbai_RNAseq_fastp_trimming/multiqc_report.html

fastp: MultiQC integration https://gannet.fish.washington.edu/Atumefaciens/20200414_cbai_RNAseq_fastp_trimming/multiqc_report.html

fastp: MultiQC integration https://gannet.fish.washington.edu/Atumefaciens/20200414_cbai_RNAseq_fastp_trimming/multiqc_report.html

fastp: MultiQC integration https://gannet.fish.washington.edu/Atumefaciens/20200414_cbai_RNAseq_fastp_trimming/multiqc_report.html