(Advanced Bioinformatics )
NGS Data analysis using Ubuntu and Galaxy Flow Tool (FastQC, Trimmomatic, BWA, BBtools)
Size: 21.45 MB
Language: en
Added: Jul 17, 2024
Slides: 35 pages
Slide Content
ADVANCED
BIOINFORMATICS
Edufabrica
Group 5 Activity 1
Mentor: S. Ajay Sammuel
PRESENTATION TOPIC :- DATA ANALYSIS TOOLS IN BIOINFORMATICS : UBUNTU AND GALAXY
Topics to be
covered
Trimmomatic in Ubuntu
Galaxy flow tool intro and uses
FastQC in Galaxy
Trimmomatic in Galaxy
BWA tool in galaxy
BB tool in galaxy
IGV- BAM file analysis
IGV- CIS and Trans interactions
FastQC in Ubuntu
Ubuntu and Linux Introduction And uses
Errors and how we resolved them
The different NGS sequence files
which were selected from SRA resource
(Using NCBI- National Center for Biotechnology
Information https://www.ncbi.nlm.nih.gov/)
Are as follows :-
1. SRR17305961 ( Sequencing of human rhinovirus sp.)
2. SRR29261844 ( mNGS of Mycobacterium lentiflavum:
nasopharynx)
3.SRR28844856 ( Rna seq of human adolescent female stool )
4.SRR29383410 ( ChIRP_CRISPRa_24h_rep1; Homo sapiens)
5.SRR29546162 (Genomic Sequencing of SARS-Cov-2)
6.SRR9071773 (NGS raw data of NDV MT15)
Ubuntu and Linux Introduction And uses :
Ubuntu is a Linux distribution derived from Debian and
composed mostly of free and open-source software.
Linux is an open-source operating system for
servers, computers, mainframes, mobile systems,
and embedded systems.
FastQC in Ubuntu :
Purpose:
- Quality control tool for high throughput sequence data
Features:
- Provides a modular set of analyses
- Generates summary reports
Usage:
- Command-line tool
- Installation: `sudo apt-get install fastqc`
- Running: `fastqc <file>`
Fastqc report of SRR17305961 ( Sequencing of human rhinovirus sp.)with ubuntu
The per base sequence quality as well as
per base n content has been passed
FastQC for : SRR29261844 ( mNGS of Mycobacterium lentiflavum: nasopharynx)
Fastqc report of SRR29261844 ( mNGS of Mycobacterium lentiflavum: nasopharynx) with ubuntu :
This is good file almost everything is
passed so there is no contamination or
errors
Fastqc report of : SRR9071773 (NGS raw data of NDV MT15) with ubuntu :
Trimmomatic in Ubuntu :
Purpose:
- Trimming and filtering of Illumina sequence data
Features:
- Handles paired-end and single-end data
- Flexible adapter trimming
Usage:
- Command-line tool
- Installation: Download and extract from the official website
- Running: `java -jar trimmomatic-0.39.jar <parameters>`
Trimmomatic for : SRR29261844 ( mNGS of Mycobacterium lentiflavum: nasopharynx)
with Ubuntu
A breakdown of the command and its parameters:
java -jar trimmomatic-0.39.jar: Executes Trimmomatic (make sure to adjust the version number if you are using
a different version).
SE: Indicates that the reads are single-end.
-phred33: Specifies the quality encoding format of the input FASTQ file. If your data uses the Phred+64 quality
encoding, replace this with -phred64.
example.fastq: The input FASTQ file containing the raw reads.
output_trimmed.fastq: The output FASTQ file where the trimmed reads will be saved.
ILLUMINACLIP:Truseq3-SE.fa:2:30:10: Adapter trimming parameters.
Truseq3-SE.fa: The file containing adapter sequences.
2: Number of seed mismatches allowed.
30: Palindrome clip threshold.
10: Simple clip threshold.
LEADING:3: Removes low-quality bases from the start of the read (leading bases with a quality score below 3).
TRAILING:3: Removes low-quality bases from the end of the read (trailing bases with a quality score below 3).
SLIDINGWINDOW:4:15: Performs a sliding window trimming approach, cutting once the average quality within
the window falls below a threshold (window size 4, quality threshold 15).
MINLEN:36: Discards reads that are shorter than the specified length after trimming (minimum length 36).
OUTPUT TRIMMED FILE of SRR29261844 ( mNGS of Mycobacterium lentiflavum: nasopharynx) with ubuntu :
Here the sequence length distribution didn’t pass
completely and the adapter content as well as
overrepresented sequences did and trimmed.
OUTPUT TRIMMED FILE of SRR17305961 ( Sequencing of human rhinovirus sp.)with ubuntu :
OUTPUT TRIMMED FILE of : SRR9071773 (NGS raw data of NDV MT15) with ubuntu :
Galaxy flow tool intro and uses :
Introduction:
- Web-based platform for bioinformatics
- Supports reproducible and transparent computational research
Uses:
- Accessible interface for bioinformatics tools
- Workflow creation and sharing
- Integration with various data sources
Purpose:
- Same as Ubuntu: quality control for sequence data
Usage:
- Accessible through Galaxy interface
- Import data and run FastQC module
- View and interpret results within Galaxy
FastQC in Galaxy:
FastQC analysis in galaxy for : SRR29383410 ( ChIRP_CRISPRa_24h_rep1; Homo sapiens )
with ubuntu :
Trimmomatic in Galaxy:
Purpose:
- Same as Ubuntu: trimming and filtering sequence data
Usage:
- Accessible through Galaxy interface
- Configure parameters and run Trimmomatic module
- Analyze trimmed data within Galaxy
Trimmomatic in galaxy (trimmed file) for SRR29383410
( ChIRP_CRISPRa_24h_rep1; Homo sapiens ) :
Trimmomatic (galaxy)for : HUMAN rhinovirus output trimmed file
BWA (Burrows-Wheeler Aligner) Tool in Galaxy:
Purpose:
- Burrows-Wheeler Aligner for mapping sequences against a reference genome
Features:
- Supports large genomes
- Fast and accurate
Usage:
- Accessible through Galaxy interface
- Configure and run BWA module
- Analyze alignment results within Galaxy
BWA TOOL Mapping in GALAXY for SRR28844856
( Rna seq of human adolescent female stool ) :
Bam file for BWA tool in galaxy for SRR29383410 ( ChIRP_CRISPRa_24h_rep1;
Homo sapiens )
BB Tool in Galaxy:
- Purpose:
- Suite of bioinformatics tools for sequence data analysis
- Features:
- Includes tools for quality control, trimming, alignment, and more
- Usage:
- Accessible through Galaxy interface
- Select and run specific BB tools
- Analyze results within Galaxy
BB Tool analysis in GALAXY for SRR28844856
( Rna seq of human adolescent female stool ) :
SRR29546162 (Genomic Sequencing of SARS-Cov-2):
The Integrative Genomics Viewer (IGV) is a high-performance, easy-to-use, interactive tool for the visual
exploration of genomic data. It supports flexible integration of all the common types of genomic data and
metadata, investigator-generated or publicly available .
Features:
- Visualization of sequence alignments, annotations, and variants
- Supports multiple data types
Usage:
-Install on local machine
-Load and explore data files (BAM, VCF, etc.)
IGV (Integrative Genomics Viewer)
IGV - CIS and Trans Interactions:
CIS Interactions:
- Interactions within the same chromosome
Trans Interactions:
- Interactions between different chromosomes
Visualization:
- Use IGV to explore and visualize these interactions
- Analyze patterns and implications in genomic context
IGV- CIS and Trans interactions :
SRR9071773 (NGS raw data of NDV MT15):
IGV -Visualization of sequence alignments for
SRR28844856( Rna seq of human adolescent female stool ):
IGV- Cis and trans interactions for SRR28844856
( Rna seq of human adolescent female stool ):
Errors and Resolutions:
Common Errors:
- Installation issues
- Configuration problems
- Data import/export errors
Resolutions:
- Step-by-step troubleshooting guides
- Community forums and support
- Documentation and user manuals