Advanced Bioinformatics- NGS Data analysis

107AyusheeJain 74 views 35 slides Jul 17, 2024
Slide 1
Slide 1 of 35
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35

About This Presentation

(Advanced Bioinformatics )
NGS Data analysis using Ubuntu and Galaxy Flow Tool (FastQC, Trimmomatic, BWA, BBtools)


Slide Content

ADVANCED
BIOINFORMATICS
Edufabrica
Group 5 Activity 1
Mentor: S. Ajay Sammuel
PRESENTATION TOPIC :- DATA ANALYSIS TOOLS IN BIOINFORMATICS : UBUNTU AND GALAXY

Topics to be
covered
Trimmomatic in Ubuntu
Galaxy flow tool intro and uses
FastQC in Galaxy
Trimmomatic in Galaxy
BWA tool in galaxy
BB tool in galaxy
IGV- BAM file analysis
IGV- CIS and Trans interactions
FastQC in Ubuntu
Ubuntu and Linux Introduction And uses
Errors and how we resolved them

The different NGS sequence files
which were selected from SRA resource
(Using NCBI- National Center for Biotechnology
Information https://www.ncbi.nlm.nih.gov/)
Are as follows :-
1. SRR17305961 ( Sequencing of human rhinovirus sp.)
2. SRR29261844 ( mNGS of Mycobacterium lentiflavum:
nasopharynx)
3.SRR28844856 ( Rna seq of human adolescent female stool )
4.SRR29383410 ( ChIRP_CRISPRa_24h_rep1; Homo sapiens)
5.SRR29546162 (Genomic Sequencing of SARS-Cov-2)
6.SRR9071773 (NGS raw data of NDV MT15)

Ubuntu and Linux Introduction And uses :
Ubuntu is a Linux distribution derived from Debian and
composed mostly of free and open-source software.
Linux is an open-source operating system for
servers, computers, mainframes, mobile systems,
and embedded systems.

FastQC in Ubuntu :
Purpose:
- Quality control tool for high throughput sequence data
Features:
- Provides a modular set of analyses
- Generates summary reports
Usage:
- Command-line tool
- Installation: `sudo apt-get install fastqc`
- Running: `fastqc <file>`

Fastqc report of SRR17305961 ( Sequencing of human rhinovirus sp.)with ubuntu
The per base sequence quality as well as
per base n content has been passed

FastQC for : SRR29261844 ( mNGS of Mycobacterium lentiflavum: nasopharynx)

Fastqc report of SRR29261844 ( mNGS of Mycobacterium lentiflavum: nasopharynx) with ubuntu :
This is good file almost everything is
passed so there is no contamination or
errors

Fastqc report of : SRR9071773 (NGS raw data of NDV MT15) with ubuntu :

Trimmomatic in Ubuntu :
Purpose:
- Trimming and filtering of Illumina sequence data
Features:
- Handles paired-end and single-end data
- Flexible adapter trimming
Usage:
- Command-line tool
- Installation: Download and extract from the official website
- Running: `java -jar trimmomatic-0.39.jar <parameters>`

Trimmomatic for : SRR29261844 ( mNGS of Mycobacterium lentiflavum: nasopharynx)
with Ubuntu

A breakdown of the command and its parameters:
java -jar trimmomatic-0.39.jar: Executes Trimmomatic (make sure to adjust the version number if you are using
a different version).
SE: Indicates that the reads are single-end.
-phred33: Specifies the quality encoding format of the input FASTQ file. If your data uses the Phred+64 quality
encoding, replace this with -phred64.
example.fastq: The input FASTQ file containing the raw reads.
output_trimmed.fastq: The output FASTQ file where the trimmed reads will be saved.
ILLUMINACLIP:Truseq3-SE.fa:2:30:10: Adapter trimming parameters.
Truseq3-SE.fa: The file containing adapter sequences.
2: Number of seed mismatches allowed.
30: Palindrome clip threshold.
10: Simple clip threshold.
LEADING:3: Removes low-quality bases from the start of the read (leading bases with a quality score below 3).
TRAILING:3: Removes low-quality bases from the end of the read (trailing bases with a quality score below 3).
SLIDINGWINDOW:4:15: Performs a sliding window trimming approach, cutting once the average quality within
the window falls below a threshold (window size 4, quality threshold 15).
MINLEN:36: Discards reads that are shorter than the specified length after trimming (minimum length 36).

OUTPUT TRIMMED FILE of SRR29261844 ( mNGS of Mycobacterium lentiflavum: nasopharynx) with ubuntu :
Here the sequence length distribution didn’t pass
completely and the adapter content as well as
overrepresented sequences did and trimmed.

OUTPUT TRIMMED FILE of SRR17305961 ( Sequencing of human rhinovirus sp.)with ubuntu :

OUTPUT TRIMMED FILE of : SRR9071773 (NGS raw data of NDV MT15) with ubuntu :

Galaxy flow tool intro and uses :
Introduction:
- Web-based platform for bioinformatics
- Supports reproducible and transparent computational research
Uses:
- Accessible interface for bioinformatics tools
- Workflow creation and sharing
- Integration with various data sources

Purpose:
- Same as Ubuntu: quality control for sequence data
Usage:
- Accessible through Galaxy interface
- Import data and run FastQC module
- View and interpret results within Galaxy
FastQC in Galaxy:

FastQC analysis in galaxy for : SRR29383410 ( ChIRP_CRISPRa_24h_rep1; Homo sapiens )
with ubuntu :

Trimmomatic in Galaxy:
Purpose:
- Same as Ubuntu: trimming and filtering sequence data
Usage:
- Accessible through Galaxy interface
- Configure parameters and run Trimmomatic module
- Analyze trimmed data within Galaxy

Trimmomatic in galaxy (trimmed file) for SRR29383410
( ChIRP_CRISPRa_24h_rep1; Homo sapiens ) :

Trimmomatic (galaxy)for : HUMAN rhinovirus output trimmed file

BWA (Burrows-Wheeler Aligner) Tool in Galaxy:
Purpose:
- Burrows-Wheeler Aligner for mapping sequences against a reference genome
Features:
- Supports large genomes
- Fast and accurate
Usage:
- Accessible through Galaxy interface
- Configure and run BWA module
- Analyze alignment results within Galaxy

BWA TOOL Mapping in GALAXY for SRR28844856
( Rna seq of human adolescent female stool ) :

Bam file for BWA tool in galaxy for SRR29383410 ( ChIRP_CRISPRa_24h_rep1;
Homo sapiens )

BB Tool in Galaxy:
- Purpose:
- Suite of bioinformatics tools for sequence data analysis
- Features:
- Includes tools for quality control, trimming, alignment, and more
- Usage:
- Accessible through Galaxy interface
- Select and run specific BB tools
- Analyze results within Galaxy

BB Tool analysis in GALAXY for SRR28844856
( Rna seq of human adolescent female stool ) :

SRR29546162 (Genomic Sequencing of SARS-Cov-2):

The Integrative Genomics Viewer (IGV) is a high-performance, easy-to-use, interactive tool for the visual
exploration of genomic data. It supports flexible integration of all the common types of genomic data and
metadata, investigator-generated or publicly available .
Features:
- Visualization of sequence alignments, annotations, and variants
- Supports multiple data types
Usage:
-Install on local machine
-Load and explore data files (BAM, VCF, etc.)
IGV (Integrative Genomics Viewer)

IGV - CIS and Trans Interactions:
CIS Interactions:
- Interactions within the same chromosome
Trans Interactions:
- Interactions between different chromosomes
Visualization:
- Use IGV to explore and visualize these interactions
- Analyze patterns and implications in genomic context

IGV- CIS and Trans interactions :
SRR9071773 (NGS raw data of NDV MT15):

IGV -Visualization of sequence alignments for
SRR28844856( Rna seq of human adolescent female stool ):
IGV- Cis and trans interactions for SRR28844856
( Rna seq of human adolescent female stool ):

Errors and Resolutions:
Common Errors:
- Installation issues
- Configuration problems
- Data import/export errors
Resolutions:
- Step-by-step troubleshooting guides
- Community forums and support
- Documentation and user manuals

1. Aarushi Desai
2. ⁠Ananya Ramgopal
3. ⁠Asmita Datta
4. ⁠Ayushee Jain
5. ⁠Barsha Saraogi
6. ⁠Dhormale Pratik Laxman
7. ⁠Komal Kala
8. ⁠Kumar Anirudh
9. ⁠Mishthi Khulla
10. ⁠Priyanka Sahoo
11. ⁠Shajiya Khan
12. ⁠Shashank Maurya
13. Tubbai
14. ⁠Siddhi Kadre
Group 5
Team Members:

Thank You !