NOrMAL: Accurate Nucleosome Positioning
using a Modified Gaussian Mixture Model
Anton Polishko*,Karine Le Roch**, Nadia Ponts** and Stefano Lonardi*
*Department of Computer Science and Engineering, University of California, Riverside CA 92521
**Department of Cell Biology and Neuroscience, University of California, Riverside CA 92521
Nucleosomes are the basic elements of DNA chromatin
structure. They control the packaging of DNA and play a critical
role in gene regulation by allowing physical access to transcription
factors. The advent of second-generation sequencing has enabled
landmark genome-wide studies of nucleosome position for several
model organisms. Current methods to determine nucleosome
positioning first compute an occupancy coverage profile by mapping
nucleosome-enriched sequenced reads to a reference genome;
then, nucleosomes are placed according to the peaks of the
coverage profile. These methods are quite accurate on placing
isolated nucleosomes, but they do not properly handle “overlapping”
nucleosomes. Also, they can only provide the positions of
nucleosomes and their occupancy level, while it is very beneficial to
supply molecular biologists additional information about nucleosomes
like the probability of placement, the size of DNA fragments enriched
for nucleosomes, and/or whether nucleosome are well-positioned or
“fuzzy” in the sequenced cell sample.
Results: We address these issues by providing a novel method
based on a parametric probabilistic model. An expectation
maximization (EM) algorithm is used to infer the parameters of
the mixture of distributions.
Description
NOrMAL is a command line tool for accurate placing of the nucleosomes.
It was designed to resolve overlapping nucleosomes and extract extra information ("fuzziness", probability, etc.) of nucleosome placement.
To achieve this goal the tool clusters the input tags according to Nucleosome Model (see the paper for detailed description) using EM learning process.
The tool is written in C++. There are no special requirements except for g++ compiler and *nix environment to compile and use the tool. It was checked to compile using g++ compiler under Ubuntu 11.04 and Mac OS X 10.6
The software is freely available for academic use. The software is still in development and may contain bugs and not 100% bulletproof.
How to install?
- Download the latest source code here
- Unpack the archive to preferred location
*folder*
- Compile using make command within the folder
$> cd *folder*
$> make
- To check the compiled executable evoke
$> make test
The tool will process the small test case and will produce test_results.txt file. It should take not more than ~20 sec.
NOrMAL in use
Input
As the input the executable has 3 input files: configuration, forward and reverse tags.
Configuration file consists of the algorithm parameters. The config.txt is provided, all the parameters are self-explanatory and could be adjusted to your needs.
The main input for the tool is the set of 5' end positions of the forward and reverse mapped nucleosome reads (tags). The tags should be specified in two separate files as the simple list of locations (numbers).
Usage
- Specify the parameters of the algorithm in *config_file* (check provided default "config.txt").
- Provide forward and reverse tags as the simple list of numbers in files *forward_tags* and *reverse_tags* respectively
- Run the tool using command
./NOrMAL *config_file* *forward_tags* *reverse_tags* *output_file*
- The output will be printed in *output_file* using next column format:
<Position of the nucleosome center> <"Fuzziness"> <Nucleosome Size> <confidence score> <Forward votes> <Reverse votes>