Identify Enriched/Depleted Histone Modifications and Enriched Histone/Chromatin Regulators in Your Gene List
Step 1
Users need to input a list of N genes, which will be analyzed by YHMI.
Standard names, systematic names or aliases are all acceptable.
Step 2
Users need to define the sets of genes (in the yeast genome) whose promoters/coding regions contain specific histone modifications by setting the thresholds.
For example, by setting log2(H3K9ac/H3)≥1 (meaning the two-fold enrichment over the background) in the promoters, a set of 2129 yeast genes whose promoters contain H3K9ac could be defined.
Then the expected ratio of promoters having H3K9ac in the yeast genome is equal to 0.32 (2129/6572).
Further, by intersecting the input list of N genes and the set of 2129 genes, the number (denoted as M) of input genes whose promoters having H3K9ac can be calculated.
Then the observed ratio of promoters having H3K9ac in the input list of genes is equal to M/N.
Finally, the input list of N genes is said to be enriched with H3K9ac in the promoters if the observed ratio (M/N) is much larger than the expected ratio (2129/6572).
The statistical significance is calculated using hypergeometric testing.
H3K14ac [H2O2]: The yeast cells are grown in the rich medium adding H2O2.
log2(H2AK5ac / Input): "Input" means the control experiment, which is the ChIP-chip/ChIP-seq experiment without using any anti-histone modification (e.g. anti-H3K79me2) antibody.
MAT score (H3K79me2 / Input): MAT stands for Model-based Analysis of Tiling-arrays, which is an algorithm for reliably detecting enriched regions. The higher the MAT score, the higher the enrichment.
Step 3
Since YHMI tests the enrichment/depletion of many histone modifications (i.e. multiple hypotheses testing), users have to select a statistical method (Bonferroni correction or FDR) for multiple hypotheses correction and set the p-value threshold (0.01 is used as the default).
Bonferroni correction is more conservative than FDR. That is, Bonferroni correction has a smaller type I error rate, resulting in a smaller power, than FDR does.
The p-value threshold determines the statistical significance of the identified enriched/depleted histone modifications.
The more stringent the p-value threshold, the higher the statistical significance of the identified enriched/depleted histone modifications.
Therefore, if a user cannot find any enriched/depleted histone modifications for their gene list, they may want to loosen the p-value threshold to find some less statistically significant enriched/depleted histone modifications.
If a user wants to see all the results without defining the statistical significance level, he/she can choose "No p-value cutoff".