The problem of characterizing and detecting over- or under-represented words in sequences arises ubiquitously in diverse applications and has been studied rather extensively in Computational Molecular Biology. In most approaches to the detection of unusual frequencies of words in sequences, the words (up to a certain length) are enumerated more or less exhaustively and individually checked in terms of observed and expected frequencies, variances, and scores of discrepancy and significance thereof.

We take instead the global approach of annotating a suffix trie or automaton of a sequence with some such values and scores, with the objective of using it as a collective detector of all unexpected behaviors, or perhaps just as a preliminary filter for words suspicious enough to warrant further and more accurate scrutiny.

What's new

Web Server

Documentation and Papers


Last updated: Tue Nov 21 20:27:37 EST 2000. For any question, send Email to the stelo @