p53BLD  ( p53 B inding  L oci  D atabase)
Usage of P53BLD
Motivation of P53BLD
The importance of TP53 is evident by the fact that TP53 is the most commonly mutated gene found in cancers. The tumor suppressor TP53 responds to numerous stress stimuli, including DNA damage and hypoxia. It acts as a transcription factor and regulates the expression of a variety of genes, leading to enhanced DNA repair, control of cell cycle, and apoptosis. TP53 contains a DNA binding domain that binds to a specific consensus sequence RRRCWWGYYY (R=A/G, W=A/T, Y=C/T). The sequencing technologies combined with chromatin immunoprecipitation (ChIP-seq) have been used to identify TP53 binding loci and have revealed that more than 3000 genes are bound by TP53 in their promoter regions. While the majority of genes were identified in some cell lines, only about 60 genes were identified as common targets. It remains unclear how TP53 discriminates these binding targets in different cell types, especially, the binding differences between normal cells and cancer cells. So far, nearly 2000 different single missense mutations in TP53 have been reported in tumor cells. Mutations in the DNA binding domain of TP53, such as R273H, R248Q, R248W and R249S, are associated with more aggressive malignancies and could confer novel phenotypes in vivo, including an increased metastatic capacity and resistance to chemotherapies. Specific TP53 mutants which acquire these phenotypes are generally referred to as gain-of-function (GOF) mutants. As TP53 binds to the specific consensus sequence, it remains unclear how these GOF mutants change the binding specificity and target genes to promote tumorigenesis. Recently, the number of genome-wide ChIP-seq datasets of TP53 derived from different normal and cancer cell lines harboring the wild type or mutant GOF TP53 has rapidly increased. These datasets provide unmatched opportunities for analysis and comparison of the genome-wide TP53 binding patterns under different experimental conditions and in different cell types. Currently, there is no database available for easily comparing and analyzing these ChIP-seq datasets derived from different cell types. As these datasets are scattering among different literatures, it needs extensive work to collect and process these datasets in a uniform way for further analysis. To solve these problems, it is an urgent need to provide TP53 research community a database which comprehensively collects the publicly available TP53 ChIP-seq datasets and processes all the datasets using the same pipeline for further analysis and comparison.
What is P53BLD?
We developed a novel database of the genome-wide binding loci of human TP53 (p53BLD). We collected 13 publicly available TP53 ChIP-seq datasets derived from different normal and cancer cell lines harboring either the wild type or mutant GOF TP53. As these published ChIP-seq datasets were originally mapped to the earlier reference human genome hg18 (2006 assembly), in order to keep up with the current reference human genome, we re-mapped these ChIP-seq datasets to the most updated reference human genome hg38 (2013 assembly). Our p53BLD provides a browse mode to visualize the binding loci of TP53 in the genome and a search mode to retrieve genes whose promoters are bound by TP53. The search mode is very powerful. Users can apply union, intersect, or/and difference operations on the 13 ChIP-seq datasets to generate a list of TP53 binding target genes that satisfies the users’ specifications. The generated gene list can be downloaded for further analysis. Therefore, our p53BLD can also be regarded as a discovery tool that helps users to generate interesting gene lists for studying TP53.
Collection of 13 human TP53 ChIP-seq data
We comprehensively collected 13 human TP53 ChIP-seq data from Gene Expression Omnibus (GEO). As shown in the following figure, the collected human TP53 ChIP-seq data include two normal cell lines (neonatal foreskin keratinocytes and human lung fibroblast IMR90), three cancer cell lines harboring the wild type TP53 (colon cancer cell line HCT116, osteosarcoma cell line U2OS and breast cancer cell line MCF7), and four cancer cell lines harboring the mutant GOF TP53 (Li-Fraumeni fibroblast MDA-H087 and three breast cancer cell lines HCC70, BT-549 and MDA-MB-468)


Data processing
Several tools were used to process the ChIP-seq data downloaded from GEO.
1. SRAtoolkit.2.3.4 was used to transform the ChIP-seq reads from sra format into fastq format.
2. Bowtie.1.0.0 was used to map the read data to reference human genome (hg38 downloaded for UCSC) to generate the map file.
3. Homer (v4.9) was used to transform the map file into the bedgraph file and to generate a list of peaks (regions with significant enrichment     of TP53 binding). The bedgraph file contains the density of reads at each nucleotide in the human genome and the peak list contains all     the peak regions in the human genome.
4. For visualization, PHP GD library was used to implement a genome browser which can show the information contained in the bedgraph     file and the peak list.
Peak annotation
A peak in the peak list was annotated to a gene if it is overlapped with the promoter (20kb region centered at the TSS) of that gene. The 23666 human genes (GRCh38.p10) which have both NCBI gene ID and HGNC symbol were downloaded from Ensembl. In this study, the TSS of the longest transcript of a gene was used as the TSS of that gene.
Implementation of the web interface of p53BLD
The web interface of p53BLD was constructed using the PHP language with the CodeIgniter MVC framework. The bedgraph files and the peak lists of the 13 ChIP-seq datasets were deposited in MySQL. Tables were produced by the JavaSscript and feature-rich JavaScript libraries (jQuery and DataTables). Figures were generated using GD library of PHP.
Database interface
p53BLD provides both a search mode and a browse mode.

Search Mode:

In the search mode, users can survey the TP53 binding target genes derived from different cell lines by selecting the ChIP-seq datasets of interest. Applying the union, intersect, and difference operations, users can identify TP53 binding target genes occurred in at least one dataset, in all datasets, and specific to certain datasets, respectively.




After submission, p53BLD returns a result page containing a list of TP53 binding target genes that satisfy the operation settings.




When clicking the “Detail Link” of a particular gene (e.g. CDKN1A), a result page of two parts (“Peak Calling” and “Track View”) is given. The “Peak Calling” contains the basic information of CDKN1A, the peaks found in the promoter of CDKN1A with peak locations, peak fold enrichments and p-values for each chosen ChIP-seq data. High peak fold enrichments and low p-values indicate the highly statistical significance of TP53 binding.






The “Track View” contains the density of reads at each nucleotide in the promoter of CDKN1A for each chosen ChIP-seq data.





Browse Mode:

In the browse mode, users have to select the ChIP-seq data of interest and specify the “Center” and the “Range” to be shown.




For the “Center” specification, users can input a genomic coordinate (e.g. chr12:68808176) or a gene name (e.g. MDM2) whose TSS coordinate will be used. After submission, p53BLD returns a result page containing two views.

Peak Calling:

This view shows the locations and fold enrichments of the peaks in the region [Center-0.5*Range, Center+0.5*Range] for each chosen ChIP-seq data.





Track View:

This view shows the density of reads at each nucleotide in the region [Center-0.5*Range, Center+0.5*Range] for each chosen ChIP-seq data.