|
This method is an
improving of Graziano Pesole's method
[1].
It's based on cluster analysis (Complete Linkage algorithm), that
requires a similarity matrix D containing distance between each
pair of sequences of mRNA. It is a bottom-up method. It starts with n
atomic cluster containing a single sequence of mRNA. For every step two
clusters r and s such as
D(r,s) = Max{di,j: where sequence r
is in cluster r and sequence j is in cluster s}
is minimum are merged in a new cluster. The process is
stopped when diameter of new cluster obtained by every possible merging
is greater than threshold (it's an input parameter). After
cluster analysis, backtranslation is performed on sequences of
homogeneous pool. In BBOCUS homogeneous pool is the cluster that
has maximum diameter. Because it contains sequences aren't similar too
much. Instead in Graziano Pesole's method the choose of homogeneous pool
was made by a biologist. Codon Usage Table (CUT) is created
through sequences of homogeneuos pool. CUT contains several fields:
frequency, standard deviation, etc. Generally, to backtranslate an
amino acid in a protein the codon chosen it is the one has maximum
frequency.
|