******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 2.2 (Release date: 1998/05/05 20:35:42) For further information on how to interpret these results or to get a copy of the MEME software please access http://www.sdsc.edu/MEME. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://www.sdsc.edu/MEME. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= crp0.s (deleted by web version of MEME) ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ ce1cg 1.0000 105 ara 1.0000 105 bglr1 1.0000 105 crp 1.0000 105 cya 1.0000 105 deop2 1.0000 105 gale 1.0000 105 ilv 1.0000 105 lac 1.0000 105 male 1.0000 105 malk 1.0000 105 malt 1.0000 105 ompa 1.0000 105 tnaa 1.0000 105 uxu1 1.0000 105 pbr322 1.0000 105 trn9cat 1.0000 105 tdc 1.0000 105 ******************************************************************************** ******************************************************************************** EXPLANATION OF RESULTS ******************************************************************************** For each motif that it discovers in the training set, MEME prints the following information: Summary Line This line gives the width (`width') and expected number of occurrences in the training set (`sites') of the motif. MEME numbers the motifs consecutively from one as it finds them. MEME usually finds the most statistically significant motifs first. Each motif describes a pattern of a fixed width--no gaps are allowed in MEME motifs. MEME estimates the number of places the motif occurs in the training set. This need not be an integer value. Simplified Motif Letter-probability Matrix MEME motifs are represented by letter-probability matrices that specify the probability of each possible letter appearing at each possible position in an occurrence of the motif. In order to make it easier to see which letters are most likely in each of the columns of the motif, the simplified motif shows the letter probabilities multiplied by 10 rounded to the nearest integer. Zeros are replaced by ":" (the colon) for readability. Information Content Diagram The information content diagram provides an idea of which positions in the motif are most highly conserved. Each column (position) in a motif can be characterized by the amount of information it contains (measured in bits). Highly conserved positions in the motif have high information; positions where all letters are equally likely have low information. The diagram is printed so that each column lines up with the same column in the simplified motif letter-probability matrix above it. Summing the information content for each position in the motif gives the total information content of the motif (shown in parentheses to the left of the diagram). This gives a measure of the usefulness of the motif for database searches. For a motif to be useful for database searches, it must as a rule contain at least log_2(N) bits of information where N is the number of sequences in the database being searched. For example, to effectively search a database containing 100,000 sequences for occurrences of a single motif, the motif should have an IC of at least 16.6 bits. Motifs with lower information content are still useful when a family of sequences shares more than one motif since they can be combined in multiple motif searches (using MAST). Multilevel Consensus Sequence The multilevel consensus sequence corresponding to the motif is an aid in remembering and understanding the motif. It is calculated from the motif letter-probability matrix as follows. Separately for each column of the motif, the letters in the alphabet are sorted in decreasing order by the probability with which they are expected to occur in that position of motif occurrences. The sorted letters are then printed vertically with the most probable letter on top. Only letters with probabilities of 0.2 or higher at that position in the motif are printed. As an example, the multilevel consensus sequence of motif 2 in the sample output is: Multilevel LITGAASGIG consensus V GS sequence G This multilevel consensus sequence says several things about the motif. First, the most likely form of the motif can be read from the top line as LITGAASGIG. Second, that only letter L has probability more than 0.2 in position 1 of the motif, both I and V have probability greater than 0.2 in position 2, etc. Third, a rough approximation of the motif can be made by converting the multilevel consensus sequence into the Prosite signature L-[IV]-T-G-[AG]-[ASG]-S-G-I-G. The multilevel consensus sequence is printed so that each column lines up with the same column in the simplified motif and information content diagrams above it. Motif in BLOCKS or FASTA format For use with the BLOCKS (http://www.blocks.fhcrc.org/blocks) tools, MEME prints the sites in the sequences which were used to construct the motif in BLOCKS format. The sites reported are, for the different model types: OOPS position with highest z_i in each sequence, ZOOPS position with highest z_i > 0.5 in each sequence, TCM all positions with z_i > 0.5, where z_i is the probability that an occurrence of the motif starts at position i in the sequence given the sequence and the motif model. If you inlcude the -print_fasta switch on the command line, MEME prints the motif sites in FASTA format instead of BLOCKS format. Possible Examples of the Motif As a further aid in understanding the motif, MEME displays a list of possible occurrences of the motif in the training set. This list is made by converting the motif letter-probability matrix into a position-dependent scoring matrix (log-odds matrix) and using that to compute a match score between each position in the training set and the motif. All positions which score above a threshold score are listed. (The threshold score is chosen by MEME such that the expected number of non-motif positions listed in error will equal the number of actual motif positions not listed.) The format of the list is sequence name, starting position of the (putative) occurrence, match score of the position, and the actual sequence including the ten positions before and after the motif occurrence (`site'). Position-dependent Scoring Matrix The position-dependent scoring matrix corresponding to the motif is printed for use by database search programs such as MAST. This matrix is a log-odds matrix calculated by taking the log (base 2) of the ratio p/f at each position in the motif where p is the probability of a particular letter at that position in the motif, and f is the average frequency of that letter in the non-redundant database as of 9/22/96. The scoring matrix is printed "sideways"--columns correspond to the letters in the alphabet (in the same order as shown in the simplified motif) and rows corresponding to the positions of the motif, position one first. The scoring matrix is preceded by a line starting with "log-odds matrix:" and containing the length of the alphabet, width of the motif, number of characters in the training set and the scoring threshold used in the list of possible motif examples. Motif Letter-probability Matrix The motif itself is a position-dependent letter-probability matrix giving, for each position in the pattern, the probabilities of each possible letter occurring there. The letter-probability matrix is printed "sideways"--columns correspond to the letters in the alphabet (in the same order as shown in the simplified motif) and rows corresponding to the positions of the motif, position one first. The motif is preceded by a line starting with "letter-probability matrix:" and containing the length of the alphabet, width of the motif and number of characters in the training set. ******************************************************************************** ******************************************************************************** MOTIF 1 width = 14 sites = 44.7 ******************************************************************************** Simplified A 4:111628244435 motif letter- C :2134:4152:::4 probability G 135:22:11:2231 matrix T 5536324:23444: bits 2.2 2.0 1.7 1.5 Information 1.3 content 1.1 (5.9 bits) 0.9 * 0.7 * * 0.4 ** * * * 0.2 **** ********* 0.0 -------------- Multilevel TTGTCACACATTTA consensus AGTCTTT ATAAGC sequence C GGA CG A -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=14 seqs=18 ce1cg ( 19) TTGTGGCATCGGGC 0.819857 ce1cg ( 72) TTTTCACAAAAATG 0.815484 ara ( 66) GCGTCACACTTTGC 0.972698 bglr1 ( 30) TTGTTATATATAAC 0.823655 bglr1 ( 60) ATTACACAAAGTTA 0.558457 bglr1 ( 87) TGGTCATATTTTTA 0.87717 crp ( 16) ATGCTAAAACAGTC 0.657244 crp ( 33) ATGCTACAGTAATA 0.701969 cya ( 4) GTGCTACACTTGTA 0.875662 cya ( 50) AGGTGTTAAATTGA 0.693371 cya ( 87) TCGTGAAACTAAAA 0.847961 deop2 ( 21) TCGCATTACAGTGA 0.756776 gale ( 53) ATGTCACACTTTTC 0.989933 gale ( 89) ATTTCATACCATAA 0.9214 ilv ( 29) ATTCAGTACAAAAC 0.615542 ilv ( 50) ACCCCTCAATTTTC 0.554738 ilv ( 85) TTGTCTCCCCTGTA 0.847879 male ( 25) AGATCACACAAAGC 0.941829 male ( 76) TTGCCGTATAAAGA 0.83799 malk ( 72) TGCTTGCAAAAATC 0.569536 malt ( 43) TTGTGACACAGTGC 0.987459 malt ( 77) ACGTCATCGCTTGC 0.542809 ompa ( 23) ACCTTATACAAGAC 0.656809 ompa ( 59) AGTTCACACTTGTA 0.941051 tnaa ( 2) TTTTTAAACATTAA 0.832301 tnaa ( 36) TCTTTAAAAAAAGC 0.621773 tnaa ( 80) TCGATTCACATTTA 0.650799 uxu1 ( 57) ATGTCTTACCAAAA 0.905621 pbr322 ( 2) TGGCTTAACTATGC 0.810534 pbr322 ( 62) ATACCGCACAGATG 0.686647 pbr322 ( 88) ATACCGCATCAGGC 0.521535 trn9cat ( 16) ACTTCGCAGAATAA 0.567719 trn9cat ( 47) TGTTGATACCGGGA 0.835315 trn9cat ( 78) TGGCGAAAATGAGA 0.671073 tdc ( 3) TTTTTATACTTTAA 0.851808 tdc ( 42) TTGTAATAACGATA 0.682724 tdc ( 87) TGGTCGCACATATC 0.984354 // --------------------------------------------------------------------------- Possible examples of motif 1 in the training set --------------------------------------------------------------------------- Sequence name Start Score Site ------------- ----- ----- -------------- ce1cg 19 6.61 GTGCTGGTTT TTGTGGCATCGGGC GAGAATAGCG ce1cg 72 8.23 TTTTTGATCG TTTTCACAAAAATG GAAGTCCACA ara 66 9.37 TATTTGCACG GCGTCACACTTTGC TATGCCATAG ara 92 5.69 TGCCATAGCA TTTTTATCCATAAG bglr1 28 6.72 AATTATTGGG ATTTGTTATATATA ACTTTATAAA bglr1 30 9.01 TTATTGGGAT TTGTTATATATAAC TTTATAAATT bglr1 51 6.04 AACTTTATAA ATTCCTAAAATTAC ACAAAGTTAA bglr1 60 6.29 AATTCCTAAA ATTACACAAAGTTA ATAACTGTGA bglr1 87 9.53 ACTGTGAGCA TGGTCATATTTTTA TCAAT crp 16 6.12 AGCGAAAGCT ATGCTAAAACAGTC AGGATGCTAC crp 33 6.97 AACAGTCAGG ATGCTACAGTAATA CATTGATGTA cya 4 8.03 ACG GTGCTACACTTGTA TGTAGCGCAT cya 50 7.47 TCAATCAGCA AGGTGTTAAATTGA TCACGTTTTA cya 52 5.97 AATCAGCAAG GTGTTAAATTGATC ACGTTTTAGA cya 87 8.09 CATTTTTTCG TCGTGAAACTAAAA AAACC deop2 9 5.76 AGTGAATT ATTTGAACCAGATC GCATTACAGT deop2 21 6.72 TTGAACCAGA TCGCATTACAGTGA TGCAAACTTG gale 53 11.97 AATTTATTCC ATGTCACACTTTTC GCATCTTTGT gale 62 6.67 CATGTCACAC TTTTCGCATCTTTG TTATGCTATG gale 73 5.62 TTTCGCATCT TTGTTATGCTATGG TTATTTCATA gale 89 9.43 TGCTATGGTT ATTTCATACCATAA GCC ilv 29 6.08 GTTATCTGCA ATTCAGTACAAAAC GTGATCAACC ilv 85 7.17 AAATTTTCCA TTGTCTCCCCTGTA AAGCTGT male 14 5.78 TTACCGCCAA TTCTGTAACAGAGA TCACACAAAG male 25 8.51 TCTGTAACAG AGATCACACAAAGC GACGGTGGGG male 76 7.89 TGGAAAGAGG TTGCCGTATAAAGA AACTAGAGTC malk 72 5.71 TTCGTGATGT TGCTTGCAAAAATC GTGGCGATTT malt 35 5.37 GTTAATAAAG ATTTGGAATTGTGA CACAGTGCAA malt 43 11.17 AGATTTGGAA TTGTGACACAGTGC AAATTCAGAC malt 60 5.32 ACAGTGCAAA TTCAGACACATAAA AAAACGTCAT ompa 23 5.90 GATTAAACAT ACCTTATACAAGAC TTTTTTTTCA ompa 59 9.43 TGCCTGACGG AGTTCACACTTGTA AGTTTTCAAC tnaa 1 6.14 TTTTTTAAACATTA AAATTCTTAC tnaa 2 9.31 T TTTTTAAACATTAA AATTCTTACG tnaa 36 6.68 TAATTTATAA TCTTTAAAAAAAGC ATTTAATATT tnaa 80 6.68 CGATTGTGAT TCGATTCACATTTA AACAATTTCA uxu1 57 8.79 CGGGATTGAC ATGTCTTACCAAAA GGTAGAACTT uxu1 70 6.80 TCTTACCAAA AGGTAGAACTTATA CGCCATCTCA pbr322 2 7.32 C TGGCTTAACTATGC GGCATCAGAG pbr322 27 5.63 GCATCAGAGC AGATTGTACTGAGA GTGCACCATA pbr322 62 5.52 GCGGTGTGAA ATACCGCACAGATG CGTAAGGAGA trn9cat 16 5.57 ACGGAAGATC ACTTCGCAGAATAA ATAAATCCTG trn9cat 47 7.42 CTGGTGTCCC TGTTGATACCGGGA AGCCCTGGGC trn9cat 78 6.29 GGCCAACTTT TGGCGAAAATGAGA CGTTGATCGG tdc 3 9.78 GA TTTTTATACTTTAA CTTGTTGATA tdc 19 6.85 TACTTTAACT TGTTGATATTTAAA GGTATTTAAT tdc 42 7.41 AGGTATTTAA TTGTAATAACGATA CTCTGGAAAG tdc 87 10.63 AATTTGTGAG TGGTCGCACATATC CTGTT --------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 14 n= 1656 bayes= 5.17312 0.475 -5.068 -1.005 0.867 -4.774 -0.084 0.270 0.916 -1.358 -1.411 1.128 0.197 -1.902 0.389 -4.219 1.217 -1.054 0.755 -0.153 0.086 0.991 -4.971 -0.129 -0.259 -0.383 0.904 -4.930 0.433 1.539 -1.050 -2.176 -3.527 -0.261 1.202 -1.742 -0.526 0.661 -0.072 -4.343 0.313 0.333 -5.046 0.060 0.581 0.384 -4.717 -0.272 0.700 -0.001 -2.729 0.402 0.517 0.741 0.731 -0.793 -3.226 letter-probability matrix: alength= 4 w= 14 n= 1656 0.391635 0.006620 0.114033 0.487713 0.010298 0.209503 0.275889 0.504310 0.109941 0.083508 0.500094 0.306458 0.075374 0.290797 0.012292 0.621537 0.135689 0.374691 0.205887 0.283733 0.560221 0.007081 0.209306 0.223392 0.216022 0.415487 0.007506 0.360985 0.818959 0.107212 0.050636 0.023194 0.235175 0.510783 0.068433 0.185609 0.445420 0.211229 0.011275 0.332076 0.354858 0.006720 0.238572 0.399850 0.367677 0.008442 0.189549 0.434332 0.281581 0.033482 0.302351 0.382586 0.470819 0.368497 0.132109 0.028575 Time 15.70 secs. ******************************************************************************** MOTIF 2 width = 11 sites = 13.0 ******************************************************************************** Simplified A 51342:::18: motif letter- C :1::11:::2: probability G 1112:1818:4 matrix T 475478191:6 bits 2.2 2.0 1.7 1.5 Information 1.3 content 1.1 ** (8.2 bits) 0.9 **** 0.7 ****** 0.4 * ******* 0.2 *********** 0.0 ----------- Multilevel ATTTTTGTGAT consensus T AAA G sequence -------------------------------------------------------------------------------- Motif 2 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 2 width=11 seqs=18 ce1cg ( 3) ATGTTTGTGCT 0.915992 ce1cg ( 59) GTTTTTTTGAT 0.670678 bglr1 ( 19) ATTATTGGGAT 0.706672 bglr1 ( 74) ATAACTGTGAG 0.742856 deop2 ( 58) TTAATTGTGAT 0.684781 ilv ( 13) TTTTTTGTTAT 0.810379 lac ( 7) ATTAATGTGAG 0.947176 malk ( 93) TTTTATGTGCG 0.642473 malt ( 12) TTTTAGGTGAG 0.625713 tnaa ( 69) ACGATTGTGAT 0.792456 uxu1 ( 15) ATTGTTGTGAT 0.887523 tdc ( 76) TAATTTGTGAG 0.840163 // ------------------------------------------------------------------------ Possible examples of motif 2 in the training set ------------------------------------------------------------------------ Sequence name Start Score Site ------------- ----- ----- ----------- ce1cg 3 11.33 TA ATGTTTGTGCT GGTTTTTGTG ce1cg 59 9.79 GTGAAAGACT GTTTTTTTGAT CGTTTTCACA bglr1 19 11.03 CAATAACTTA ATTATTGGGAT TTGTTATATA bglr1 74 10.44 CACAAAGTTA ATAACTGTGAG CATGGTCATA deop2 25 7.13 ACCAGATCGC ATTACAGTGAT GCAAACTTGT deop2 58 12.87 GTAGATTTCC TTAATTGTGAT GTGTATCGAA gale 69 8.91 CACTTTTCGC ATCTTTGTTAT GCTATGGTTA ilv 13 11.68 TCCGGCGGGG TTTTTTGTTAT CTGCAATTCA lac 7 12.47 AACGCA ATTAATGTGAG TTAGCTCACT lac 79 8.30 TATGTTGTGT GGAATTGTGAG CGGATAACAA malk 59 8.63 TCATGTAAGG AATTTCGTGAT GTTGCTTGCA malk 93 10.68 ATCGTGGCGA TTTTATGTGCG CA malt 12 9.10 ATCAGCGTCG TTTTAGGTGAG TTGTTAATAA tnaa 69 9.96 TGCTCCCCGA ACGATTGTGAT TCGATTCACA uxu1 15 13.40 TGAGAGTGAA ATTGTTGTGAT GTGGTTAACC tdc 38 9.03 TTAAAGGTAT TTAATTGTAAT AACGATACTC tdc 76 10.48 TATTGAAAGT TAATTTGTGAG TGGTCGCACA ------------------------------------------------------------------------ log-odds matrix: alength= 4 w= 11 n= 1710 bayes= 7.0297 0.794 -3.618 -0.675 0.389 -1.017 -1.340 -1.798 1.404 -0.058 -2.182 -0.631 0.994 0.402 -3.387 -0.523 0.742 -0.266 -1.241 -3.565 1.287 -3.027 -1.973 -1.658 1.645 -2.878 -3.508 1.891 -1.518 -3.002 -3.399 -1.598 1.699 -2.108 -3.100 1.732 -0.850 1.419 -0.299 -3.311 -2.635 -2.916 -3.529 0.751 1.062 letter-probability matrix: alength= 4 w= 11 n= 1710 0.488497 0.018083 0.143317 0.350103 0.139261 0.087717 0.065806 0.707216 0.270695 0.048924 0.147828 0.532554 0.372293 0.021221 0.159238 0.447249 0.234346 0.093942 0.019341 0.652371 0.034557 0.056553 0.072538 0.836352 0.038318 0.019515 0.848814 0.093353 0.035169 0.021049 0.075614 0.868168 0.065371 0.025893 0.760448 0.148287 0.753401 0.180483 0.023067 0.043050 0.037324 0.019234 0.385282 0.558159 Time 29.11 secs. Stopped because nmotifs = 2 reached. CPU: ghidorah ******************************************************************************** DEBUG INFORMATION ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. model: mod= tcm nmotifs= 2 chi= 1 width: minw= 8 maxw= 57 shorten= yes lambda: minsites= 0 maxsites= 0 theta: prob= 1 spmap= uni spfuzz= 0.5 em: prior= dirichlet b= 1 maxiter= 50 distance= 0.001 data: n= 1890 N= 18 strands: w53 sample: seed= 0 seqfrac= 1 LRT: adj= root Letter frequencies: A 0.303 C 0.183 G 0.209 T 0.306 Non-redundant database letter frequencies: A 0.282 C 0.222 G 0.229 T 0.267 Effective length of alphabet = 4 Entropy of dataset (bits) = -1.96 meme crp0.s -mod tcm -dna -nostatus -nmotifs 2 -gcg ********************************************************************************