About HRPDviewer

Construction of HRPDviewer

Human ribosome profiling data collection

Calculation of translational levels of mRNA transcripts and genes

Usage of HRPDviewer

Motivation of HRPDviewer

A lot of ribo-seq data have been generated in various studies, so databases are needed for depositing and visualizing the published ribo-seq data. Nowadays, GWIPS-viz and RPFdb are the two largest databases that were developed for this purpose. However, two challenges remain to be addressed.

1. Both databases align the published ribo-seq data to genome. Since ribo-seq data aim to reveal the actively translated mRNA transcripts, to transcriptome rather than to genome should the ribo-seq data be aligned.

2. Both databases can only view the ribo-seq data around a specific genomic location. Neither of them can simultaneously visualize the ribo-seq data on mRNA transcripts produced from different genes located far away from each other in the genome.

To address these two challenges, we develop HRPDviewer (Human Ribosome Profiling Data viewer)

What is HRPDviewer?

HRPDviewer

1. collects 610 published human ribo-seq data from Gene Expression Omnibus (GEO),

2. aligns the data to transcriptome,

3. provides visualization of the data on mRNA transcripts.

Users can compare and visualize the ribo-seq data mapped on different mRNAs under different physiological conditions. This kind of visualization provides novel biological insights. By viewing the ribo-seq data mapped on the mRNAs of different genes, users can know which genes’ mRNAs are highly translated under a specific physiological condition. By viewing the ribo-seq data mapped on different mRNA isoforms of the same gene, users can know which mRNA isoforms are highly translated under a specific physiological condition. In the future, we will keep updating HRPDviewer if new human ribo-seq data are found in the literature. We believe that HRPDviewer is a useful resource for studying translational regulation in human.

Human ribosome profiling data collection

610 human ribosome profiling data sets from 64 studies were collected from Gene Expression Omnibus (GEO). We assigned these 610 ribo-seq data sets to 14 research topics.

Research Topic	# of ribo-seq data sets	# of publications
Apoptosis	6	1
Cancer Mechanism	85	6
Cell Cycle	27	4
Circadian Rhythms	48	1
Disease	62	4
microRNA Regulatory Effect	24	3
Mitochondrial Translation	4	1
mRNA Modification	34	3
mTOR Pathway	62	3
Protein Stability	25	2
RPF Methodology	20	8
Stress Condition	139	8
Translational Regulation Mechanism	177	22
Virus Infection	106	6

The details of these 610 collected human ribosome profiling data sets can be found in Supplementary Table 1.

Data processing

Step1. Install the following software tools.

1. SRAtoolkit v2.6.3 (https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/)

2. Cutadapt v1.4.2 (http://cutadapt.readthedocs.io/en/stable/guide.html)

3. RSEM (https://github.com/bli25wisc/RSEM/archive/master.zip)

4. SAMtools (http://www.htslib.org/)

Step2. Download and decompress our pipeline.zip (http://cosbi4.ee.ncku.edu.tw/HRPDviewer/pipeline.zip), then you will have a pipeline folder.

Step3. Put your ribo-seq data (e.g. SRR493747.sra) in our pipeline folder.

Step4. In our pipeline folder, start to process SRR493747.sra using the following procedure.

1. SRAtoolkit v2.6.3 was used to convert the .sra files to .fastq files.

Input	SRR493747.sra
Output	SRR493747.fastq
Command	$ fastq-dump SRR493747.sra

2. Cutadapt v1.4.2 was used to trim adaptor linker sequences or poly-(A) tails from the 3’ ends of reads.

Input	SRR493747.fastq
Output	SRR493747_trimmed.fastq
Command	$ cutadapt \ -a CTGTAGGCACCATCAAT \ -u 1 \ --minimum-length 27 \ --maximum-length 40 \ --discard-untrimmed \ -o SRR493747_trimmed.fastq \ SRR493747.fastq

3. RSEM was used to align the reads to the reference human transcriptome and generate two files (the readdepth file and bam file).

Input	SRR493747_trimmed.fastq, ref_transcriptome_folder
Output	SRR493747_result.transcript.bam, …
Command	$ rsem-calculate-expression \ -p 8 \ --strand-specific \ SRR493747_trimmed.fastq \ ref_transcriptome_folder/ref_transcriptome \ SRR493747_result

Input	SRR493747_result, gene_id.txt
Output	SRR493747_result.transcript.readdepth, …
Command	$ rsem-plot-transcript-wiggles \ --gene-list \ --show-unique \ SRR493747_result \ gene_id.txt \ figure.pdf

4. SAMtools was used to convert the .bam file to .sam file.

Input	SRR493747_result.transcript.bam
Output	SRR493747_result.transcript.sam
Command	$ samtools view \ SRR493747_result.transcript.bam \ -o SRR493747_result.transcript.sam

5. Our python scripts were used to generate two folders (called SRR493747_NRPM_folder and SRR493747_TL_folder). The first folder (SRR493747_NRPM_folder) contains 38401 files, each of which contains the NRPM (normalized reads per million mapped reads) values on all nucleotide positions of an mRNA transcript (e.g. NM_00014). The second folder (SRR493747_TL_folder) contains two files. The file (Isoform_TL.csv) contains the translational levels (TLs) of 38401 isoforms. The other file (Gene_TL.csv) contains the TLs of 19242 genes.

Input	SRR493747_result.transcript.sam, SRR493747_result.transcript.readdepth, human_rna.coord
Output	SRR493747_NRPM_folder, SRR493747_TL_folder
Command	$ bash norm_exp.sh \ SRR493747_result.transcript.sam \ SRR493747_result.transcript.readdepth \ human_nm_rna.coord \ SRR493747_NRPM_folder \ SRR493747_TL_folder

Calculation of translational levels of mRNA transcripts and genes

The translational level (TL) of a mRNA transcript in a RPD is defined as the average NRPKM (normalized reads per kilobase per million mapped reads) value of its coding region (CDS) in that RPD and calculated by the following formula

where L is the length (in bps) of the coding region and i is the i-th position of the coding region. For example, in RPD (G1-1 synchronized Hela cells), the TL of NM_004060 (one mRNA isoform of the gene CCNG1) is 2899.23 where L=887.

The translational level of a gene (denoted as

) in a RPD is defined as the sum of the translational levels of all its mRNA isoforms in that RPD. For example, the gene CCNG1 has two mRNA isoforms (NM_199246 and NM_004060). In RPD (G1-1 synchronized Hela cells), the

of NM_199246 and

of NM_004060 are 485.741 and 2899.23, respectively. Therefore, the

of CCNG1 is 3384.971 which equals the sum of 485.741 and 2899.23.

Implementation of HRPDviewer website

HRPDviewer was built using the scripting language PHP and Codelgniter framework. All tables were produced by JavaSscript and jQuery (a JavaScript library). All figures were generated using PHP GD library.

Database interface

HRPDviewer provides both a search mode and a browse mode.

Search Mode:

Users have to select the mRNA transcripts and RPDs of interest.

After submission, HRPDviewer returns a result page containing two parts. The first part provides the information of the selected mRNA transcripts and RPDs.

The second part provides two different views of the ribosome occupancy patterns on the selected mRNA transcripts in the selected RPDs:

1. Viewing different selected mRNA transcripts in the same RPD.
This kind of view allows users to compare the ribosome occupancy patterns on different mRNA transcripts in the same RPD. Users then can know the translation of different mRNA transcripts under a specific physiological condition. For example, NM_199246 (one mRNA isoform of the gene CCNG1) is more actively translated than NM_004354 (one mRNA isoform of the gene CCNG2) is in G1 synchronized Hela cells. On the contrary, NM_004354 is more actively translated than NM_199246 is in S phase synchronized Hela cells.

2. Viewing the same mRNA transcript in different RPDs.
The kind of view allows users to compare the ribosome occupancy patterns on an mRNA transcript in different RPD. Users then can know the translation of a specific mRNA transcript under different physiological condition. For example, NM_199246 (one mRNA isoform of the gene CCNG1) is more actively translated in G1 than in S phase of the cell cycle in Hela cells.

Browse Mode:

In the browse mode, users have to (i) input a list of genes of interest and (ii) select RPDs to be shown.

After submission, HRPDviewer returns a page containing information (gene name, the number of mRNA isoforms, the translational levels in the selected RPDs) of each gene to be shown.

When clicking on the “Gene Name” (e.g. CCNG1), HRPDviewer returns a page showing how the translational levels of the gene CCNG1 in different RPDs are calculated.

When clicking on the “# of mRNA Isoforms”, HRPDviewer returns a page containing information (isoform ID and the translational level in the selected RPDs) of each mRNA isoform of the selected gene.

When clicking the “isoform ID” (e.g. NM_004060), HRPDviewer returns a page showing how the translational level of NM_004060 in different RPDs are calculated.