Visualize the Bulk ATAC-seq

Visualize the Bulk ATAC-seq#

Many times, we want to visualise bigwig files from bulk ATAC-seq, and we usually use Integrative Genomics Viewer (IGV) for this step, while code visualisation in Python, has been lacking at the moment. We will demonstrate the visualisation process in the bulk module of epiverse.

import os
# EPIONE_DATA_DIR left unset — not needed for this tutorial

import epione as epi
import snapatac2 as snap
import scanpy as sc
import matplotlib.pyplot as plt

epi.pl.plot_set(font_path='Arial')

└─ 🔬 Starting plot initialization...
  ├─ Apply Scanpy/matplotlib settings
  ├─ Custom font setup
  └─ Using already downloaded Arial font from: /tmp/omicverse_arial.ttf
  └─ Registered as: Arial
  ├─ Suppress warnings
  ├─ 
___________      .__                      
\_   _____/_____ |__| ____   ____   ____  
 |    __)_\____ \|  |/  _ \ /    \_/ __ \ 
 |        \  |_> >  (  <_> )   |  \  ___/ 
/_______  /   __/|__|\____/|___|  /\___  >
        \/|__|                  \/     \/ 

  ├─ 🔖 Version: 0.0.1rc1   📚 Tutorials: https://epione.readthedocs.io/
└─ ✅ plot_set complete.

Here, we use the bigwig data from Widespread enhancer dememorization and promoter priming during parental-to-zygotic transition

#!wget ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3359nnn/GSM3359495/suppl/GSM3359495_oocyte_H3K4me1_rep1.bw
#!wget ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3359nnn/GSM3359496/suppl/GSM3359496_oocyte_H3K4me1_rep2.bw

Construct the bigwig object#

We first prepare a directory of bigwig files that we need to visualise. epiverse supports visualising multiple files at the same time, but of course you can also visualise just one file. We provide the directory in dictionary format.

bw_dict={
    'oocyte_H3K4me1_rep1':'data/GSM3359495_oocyte_H3K4me1_rep1.bw',
    'oocyte_H3K4me1_rep2':'data/GSM3359496_oocyte_H3K4me1_rep2.bw'
}
bw_obj=epi.bulk.bigwig(bw_dict)
bw_obj.read()

└─ Load bigWig files
  ├─ Loading oocyte_H3K4me1_rep1...
  └─ Loading oocyte_H3K4me1_rep2...

Read the gene annotation#

(nonecessary) When we visualise chromosomal regions, we usually want to see genetic information as well. This is where we can provide gtf files as input.

bw_obj.load_gtf('genome/danRer7.refGene.gtf.gz')

└─ Load GTF file
  ├─ Reading GTF...
  └─ Reading GTF file from genome/danRer7.refGene.gtf.gz...
└─ GTF file read successfully
└─ GTF loaded

Plot track#

plot obtional#

If you wish to specify personalised plot colours, we provide here a few interesting parameters as input to plot_track.

color_dict: the color of each bigwig
region_dict: the region we interested
plot_names: the names of bigwig, if not provided ,it will use the keys of bigwig_dict

color_dict={
    'oocyte_H3K4me1_rep1':epi.pl.green_color[3],
    'oocyte_H3K4me1_rep2':epi.pl.green_color[3]
}
region_dict={
    'region1':[46050000, 46050000+5000],
    'region2':[46056000,46057000]
}

bw_obj.plot_track(chrom='chr1',chromstart=46039573,chromend=46089418,
                  plot_names=['Rep1','Rep2'],figwidth=6,figheight=1.5,
                  color_dict=color_dict,bp_per_bin=500,
                  region_dict=region_dict)

(<Figure size 480x180 with 3 Axes>,
 array([<Axes: ylabel='Rep1'>, <Axes: ylabel='Rep2'>, <Axes: >],
       dtype=object))

../../_images/07e17a24d4219da156fc8bef77a7b74af8f9a1534dbfbefa6822a04b045b4cf1.png

bw_obj.plot_track(chrom='chr1',chromstart=46039573,chromend=46089418,
                  plot_names=['Rep1','Rep2'],figwidth=6,figheight=1.5,
                  color_dict=color_dict,bp_per_bin=500,
                  region_dict=region_dict)

(<Figure size 480x180 with 3 Axes>,
 array([<Axes: ylabel='Rep1'>, <Axes: ylabel='Rep2'>, <Axes: >],
       dtype=object))

Plot Matrix#

Compute the tss/tes enrichment matrix#

Different genes have inconsistent chromatin activity in Chip-seq/ATAC-seq, as evidenced by differences in chromatin signaling before and after the TSS/TES locus, and we can find sample-specific TSS/TES signaling for downstream analysis. Here, we provide the function of compute_matrix to get the tss/tes enrichment matrix

bw_obj.compute_matrix('oocyte_H3K4me1_rep1',nbins=100,n_jobs=8)

└─ Compute matrix: oocyte_H3K4me1_rep1
  ├─ Prepare features
  ├─ Build matrices
└─ Finalize
└─ oocyte_H3K4me1_rep1 matrix finished
  └─ oocyte_H3K4me1_rep1 tss matrix in bw_tss_scores_dict[oocyte_H3K4me1_rep1]
  └─ oocyte_H3K4me1_rep1 tes matrix in bw_tes_scores_dict[oocyte_H3K4me1_rep1]
  └─ oocyte_H3K4me1_rep1 body matrix in bw_body_scores_dict[oocyte_H3K4me1_rep1]

(AnnData object with n_obs × n_vars = 33737 × 100
     obs: 'region_start', 'region_end'
     uns: 'range', 'bins',
 AnnData object with n_obs × n_vars = 33737 × 100
     obs: 'region_start', 'region_end'
     uns: 'range', 'bins',
 AnnData object with n_obs × n_vars = 33737 × 100
     obs: 'region_start', 'region_end'
     uns: 'range', 'bins')

The tss result will be stored in bw_tss_scores_dict['oocyte_H3K4me1_rep1'], and tes stored in bw_tes_scores_dict['oocyte_H3K4me1_rep1']. The obs represent the genes

bw_obj.bw_tss_scores_dict['oocyte_H3K4me1_rep1']

AnnData object with n_obs × n_vars = 33737 × 100
    obs: 'region_start', 'region_end'
    uns: 'range', 'bins'

After compute the tss/tes enrichment matrix, we can use save/load to save/load the result.

bw_obj.save_bw_result('result')

└─ Saving oocyte_H3K4me1_rep1 results...

bw_obj.load_bw_result('result')

└─ Load computed matrices
  ├─ Loading oocyte_H3K4me1_rep1 results...
└─ Loading oocyte_H3K4me1_rep2 results...
    └─ ⚠️  You need to run the compute_matrix function first for oocyte_H3K4me1_rep2!
    └─ ⚠️  You need to run the compute_matrix function first for oocyte_H3K4me1_rep2!
    └─ ⚠️  You need to run the compute_matrix function first for oocyte_H3K4me1_rep2!

Plot the tss/tes enrichment matrix#

Here, we provide plot_matrix and plot_matrix_line to visualize the signal of tss/tes enrichment

fig,ax=bw_obj.plot_matrix(bw_name='oocyte_H3K4me1_rep1',bw_type='TSS',
                    figsize=(2,4),cmap='Greens',
                    vmax='auto',vmin='auto',
                   fontsize=12,title='oocyte_H3K4me1')
#fig.savefig('data/oocyte_H3K4me1_rep1.png',dpi=300,bbox_inches='tight')

../../_images/9a6520baf5e7e07f7b517f6d79a1549a31c871537a9b3c62c753fee2b3e5ed90.png

bw_obj.plot_matrix_line(bw_name='oocyte_H3K4me1_rep1',bw_type='TSS',
                    figsize=(3,3),color='#0d6a3b')

(<Figure size 240x240 with 1 Axes>,
 <Axes: title={'center': 'oocyte_H3K4me1_rep1'}>)

../../_images/af0a4ae2a891cb76d45f60152821f794413e4b50c176da3966d8ad5ffc3cc86b.png

bw_obj.plot_matrix(bw_name='oocyte_H3K4me1_rep1',bw_type='TES',
                    figsize=(2,4),cmap='Blues',
                    vmax=2,vmin=1,
                   fontsize=12,title='oocyte_H3K4me1')

(<Figure size 160x320 with 2 Axes>, <Axes: title={'center': 'oocyte_H3K4me1'}>)

../../_images/d79b00f01eb058f2fb34f9f3f25dbcfad3212aa80f9a06ff3f903637e47e0e3b.png

bw_obj.plot_matrix_line(bw_name='oocyte_H3K4me1_rep1',bw_type='TES',
                    figsize=(3,3),color='#2c50a4')

(<Figure size 240x240 with 1 Axes>,
 <Axes: title={'center': 'oocyte_H3K4me1_rep1'}>)

../../_images/b0b6557ae9fc8136d04e25fbf0631a4adc27de21d70951b8b5b70f8114bf1aaa.png

bw_obj.plot_matrix(bw_name='oocyte_H3K4me1_rep1',bw_type='all',
                    figsize=(2.5,4),cmap='Blues',
                    vmax='auto',vmin='auto',
                   fontsize=12,title='oocyte_H3K4me1')

(<Figure size 200x320 with 2 Axes>, <Axes: title={'center': 'oocyte_H3K4me1'}>)

../../_images/d2203474ffe723284bb615aa00e543400e86b3ab0e8ee059ad4a4120faaba5a8.png

bw_obj.plot_matrix_line(bw_name='oocyte_H3K4me1_rep1',bw_type='all',
                    figsize=(3,3),color='#2c50a4')

(<Figure size 240x240 with 1 Axes>,
 <Axes: title={'center': 'oocyte_H3K4me1_rep1'}>)

../../_images/563a1f2262773500a8b2ced6906295ff58d11a625562b4b993fee28873205fe8.png

Plot the correlation between bigwig#

In addition to exploring the properties of the bigwig itself, sometimes we need to calculate the correlation between different bigwigs.

We set the size of each bin to 10000, i.e., the size of the region on the chromosome, and calculate the average chromosome signal value for each bin. and store it in the scoreperbindata

scoreperbindata=bw_obj.getscoreperbin(bin_size=10000,
                      number_thread=4,)

scoreperbindata.loc[scoreperbindata['chrom']=='chr24'].head()

	chrom	start	end	oocyte_H3K4me1_rep1	oocyte_H3K4me1_rep2
21338	chr24	0	10000	1.075000	1.244246
21339	chr24	10000	20000	1.860320	2.034103
21340	chr24	20000	30000	1.771524	1.696709
21341	chr24	30000	40000	1.186645	1.600000
21342	chr24	40000	50000	2.283304	2.912770

scoreperbindata.head()

	chrom	start	end	oocyte_H3K4me1_rep1	oocyte_H3K4me1_rep2
0	chr4	0	10000	4.174897	6.547932
1	chr4	10000	20000	4.620073	6.031214
2	chr4	20000	30000	3.288747	4.141955
3	chr4	30000	40000	2.233796	2.431286
4	chr4	40000	50000	3.510958	3.689173

We provide a nice visualization function to compare the correlation of two bigwigs, and for the purpose of visualization parameter tuning, we separate the value of the correlation from the visualization step to reduce the time spent repeating the computation when tuning the visualization parameters

bw_obj.compute_correlation(bw_names=['oocyte_H3K4me1_rep1','oocyte_H3K4me1_rep2'])

└─ The correlation between oocyte_H3K4me1_rep1 and oocyte_H3K4me1_rep2 is 0.93
  └─ Now you can use plot_correlation() to plot the correlation scatter plot

np.float64(0.930985887843024)

fig,ax=bw_obj.plot_correlation_bigwig(figsize=(3.5,3),
                                scatter_size=5,scatter_alpha=0.8,
                                fontsize=14)
#fig.savefig('data/correlation_bigwig.png',dpi=300,bbox_inches='tight')

../../_images/2d48100729989d1b2bb0936f3f83d15a38789976d04a14cde55eb9108a8c0441.png