******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 2.2 (Release date: 1998/05/05 20:35:42) For further information on how to interpret these results or to get a copy of the MEME software please access http://www.sdsc.edu/MEME. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://www.sdsc.edu/MEME. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= crp0.s (deleted by web version of MEME) ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ ce1cg 1.0000 105 ara 1.0000 105 bglr1 1.0000 105 crp 1.0000 105 cya 1.0000 105 deop2 1.0000 105 gale 1.0000 105 ilv 1.0000 105 lac 1.0000 105 male 1.0000 105 malk 1.0000 105 malt 1.0000 105 ompa 1.0000 105 tnaa 1.0000 105 uxu1 1.0000 105 pbr322 1.0000 105 trn9cat 1.0000 105 tdc 1.0000 105 ******************************************************************************** ******************************************************************************** EXPLANATION OF RESULTS ******************************************************************************** For each motif that it discovers in the training set, MEME prints the following information: Summary Line This line gives the width (`width') and expected number of occurrences in the training set (`sites') of the motif. MEME numbers the motifs consecutively from one as it finds them. MEME usually finds the most statistically significant motifs first. Each motif describes a pattern of a fixed width--no gaps are allowed in MEME motifs. MEME estimates the number of places the motif occurs in the training set. This need not be an integer value. Simplified Motif Letter-probability Matrix MEME motifs are represented by letter-probability matrices that specify the probability of each possible letter appearing at each possible position in an occurrence of the motif. In order to make it easier to see which letters are most likely in each of the columns of the motif, the simplified motif shows the letter probabilities multiplied by 10 rounded to the nearest integer. Zeros are replaced by ":" (the colon) for readability. Information Content Diagram The information content diagram provides an idea of which positions in the motif are most highly conserved. Each column (position) in a motif can be characterized by the amount of information it contains (measured in bits). Highly conserved positions in the motif have high information; positions where all letters are equally likely have low information. The diagram is printed so that each column lines up with the same column in the simplified motif letter-probability matrix above it. Summing the information content for each position in the motif gives the total information content of the motif (shown in parentheses to the left of the diagram). This gives a measure of the usefulness of the motif for database searches. For a motif to be useful for database searches, it must as a rule contain at least log_2(N) bits of information where N is the number of sequences in the database being searched. For example, to effectively search a database containing 100,000 sequences for occurrences of a single motif, the motif should have an IC of at least 16.6 bits. Motifs with lower information content are still useful when a family of sequences shares more than one motif since they can be combined in multiple motif searches (using MAST). Multilevel Consensus Sequence The multilevel consensus sequence corresponding to the motif is an aid in remembering and understanding the motif. It is calculated from the motif letter-probability matrix as follows. Separately for each column of the motif, the letters in the alphabet are sorted in decreasing order by the probability with which they are expected to occur in that position of motif occurrences. The sorted letters are then printed vertically with the most probable letter on top. Only letters with probabilities of 0.2 or higher at that position in the motif are printed. As an example, the multilevel consensus sequence of motif 2 in the sample output is: Multilevel LITGAASGIG consensus V GS sequence G This multilevel consensus sequence says several things about the motif. First, the most likely form of the motif can be read from the top line as LITGAASGIG. Second, that only letter L has probability more than 0.2 in position 1 of the motif, both I and V have probability greater than 0.2 in position 2, etc. Third, a rough approximation of the motif can be made by converting the multilevel consensus sequence into the Prosite signature L-[IV]-T-G-[AG]-[ASG]-S-G-I-G. The multilevel consensus sequence is printed so that each column lines up with the same column in the simplified motif and information content diagrams above it. Motif in BLOCKS or FASTA format For use with the BLOCKS (http://www.blocks.fhcrc.org/blocks) tools, MEME prints the sites in the sequences which were used to construct the motif in BLOCKS format. The sites reported are, for the different model types: OOPS position with highest z_i in each sequence, ZOOPS position with highest z_i > 0.5 in each sequence, TCM all positions with z_i > 0.5, where z_i is the probability that an occurrence of the motif starts at position i in the sequence given the sequence and the motif model. If you inlcude the -print_fasta switch on the command line, MEME prints the motif sites in FASTA format instead of BLOCKS format. Possible Examples of the Motif As a further aid in understanding the motif, MEME displays a list of possible occurrences of the motif in the training set. This list is made by converting the motif letter-probability matrix into a position-dependent scoring matrix (log-odds matrix) and using that to compute a match score between each position in the training set and the motif. All positions which score above a threshold score are listed. (The threshold score is chosen by MEME such that the expected number of non-motif positions listed in error will equal the number of actual motif positions not listed.) The format of the list is sequence name, starting position of the (putative) occurrence, match score of the position, and the actual sequence including the ten positions before and after the motif occurrence (`site'). Position-dependent Scoring Matrix The position-dependent scoring matrix corresponding to the motif is printed for use by database search programs such as MAST. This matrix is a log-odds matrix calculated by taking the log (base 2) of the ratio p/f at each position in the motif where p is the probability of a particular letter at that position in the motif, and f is the average frequency of that letter in the non-redundant database as of 9/22/96. The scoring matrix is printed "sideways"--columns correspond to the letters in the alphabet (in the same order as shown in the simplified motif) and rows corresponding to the positions of the motif, position one first. The scoring matrix is preceded by a line starting with "log-odds matrix:" and containing the length of the alphabet, width of the motif, number of characters in the training set and the scoring threshold used in the list of possible motif examples. Motif Letter-probability Matrix The motif itself is a position-dependent letter-probability matrix giving, for each position in the pattern, the probabilities of each possible letter occurring there. The letter-probability matrix is printed "sideways"--columns correspond to the letters in the alphabet (in the same order as shown in the simplified motif) and rows corresponding to the positions of the motif, position one first. The motif is preceded by a line starting with "letter-probability matrix:" and containing the length of the alphabet, width of the motif and number of characters in the training set. ******************************************************************************** ******************************************************************************** MOTIF 1 width = 15 sites = 18.0 ******************************************************************************** Simplified A ::129321512:181 motif letter- C 211:1232:2319:8 probability G :7:7:2342522:21 matrix T 8291:3233237111 bits 2.2 2.0 1.7 1.5 Information 1.3 * * content 1.1 * * * * (10.5 bits) 0.9 ***** * * 0.7 ***** **** 0.4 ***** **** 0.2 ***** ** **** 0.0 --------------- Multilevel TGTGAACGAGTTCAC consensus TGTTCC sequence CA G A -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=15 seqs=18 ce1cg ( 64) TTTGATCGTTTTCAC 0.998638 ara ( 58) TTTGCACGGCGTCAC 0.997718 bglr1 ( 79) TGTGAGCATGGTCAT 0.998894 crp ( 66) TGCAAAGGACGTCAC 0.99809 cya ( 53) TGTTAAATTGATCAC 0.998939 deop2 ( 10) TTTGAACCAGATCGC 0.966396 gale ( 27) TGTAAACGATTCCAC 0.980127 ilv ( 42) CGTGATCAACCCCTC 0.984289 lac ( 12) TGTGAGTTAGCTCAC 0.999869 male ( 17) TGTAACAGAGATCAC 0.999982 malk ( 64) CGTGATGTTGCTTGC 0.745014 malt ( 44) TGTGACACAGTGCAA 0.999343 ompa ( 51) CCTGACGGAGTTCAC 0.999679 tnaa ( 74) TGTGATTCGATTCAC 0.978118 uxu1 ( 20) TGTGATGTGGTTAAC 0.998883 pbr322 ( 56) TGTGAAATACCGCAC 0.999975 trn9cat ( 87) TGAGACGTTGATCGG 0.997936 tdc ( 81) TGTGAGTGGTCGCAC 0.993283 // ---------------------------------------------------------------------------- Possible examples of motif 1 in the training set ---------------------------------------------------------------------------- Sequence name Start Score Site ------------- ----- ----- --------------- ce1cg 64 13.65 AGACTGTTTT TTTGATCGTTTTCAC AAAAATGGAA ara 58 10.28 ACATTGATTA TTTGCACGGCGTCAC ACTTTGCTAT bglr1 79 10.99 AGTTAATAAC TGTGAGCATGGTCAT ATTTTTATCA crp 66 10.75 ACTGCATGTA TGCAAAGGACGTCAC ATTACCGTGC cya 53 12.44 ATCAGCAAGG TGTTAAATTGATCAC GTTTTAGACC deop2 10 12.98 AGTGAATTA TTTGAACCAGATCGC ATTACAGTGA deop2 63 9.05 TTTCCTTAAT TGTGATGTGTATCGA AGTGTGTTGC gale 27 12.52 TAAATTCTTG TGTAAACGATTCCAC TAATTTATTC ilv 42 7.64 CAGTACAAAA CGTGATCAACCCCTC AATTTTCCCT lac 12 16.46 ACGCAATTAA TGTGAGTTAGCTCAC TCATTAGGCA male 17 14.79 CCGCCAATTC TGTAACAGAGATCAC ACAAAGCGAC malk 32 7.05 ACACGGCTTC TGTGAACTAAACCGA GGTCATGTAA malk 64 8.65 TAAGGAATTT CGTGATGTTGCTTGC AAAAATCGTG malt 44 11.14 GATTTGGAAT TGTGACACAGTGCAA ATTCAGACAC ompa 51 12.66 TTTTCATATG CCTGACGGAGTTCAC ACTTGTAAGT tnaa 74 12.26 CCCGAACGAT TGTGATTCGATTCAC ATTTAAACAA tnaa 90 6.82 TCGATTCACA TTTAAACAATTTCAG A uxu1 20 12.91 GTGAAATTGT TGTGATGTGGTTAAC CCAATTAGAA pbr322 56 14.47 CCATATGCGG TGTGAAATACCGCAC AGATGCGTAA trn9cat 87 7.23 TTGGCGAAAA TGAGACGTTGATCGG CACG tdc 81 12.44 AAAGTTAATT TGTGAGTGGTCGCAC ATATCCTGTT ---------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 15 n= 1638 bayes= 6.49185 -4.226 -0.502 -4.238 1.610 -4.245 -1.779 1.707 -0.626 -2.030 -1.784 -4.214 1.676 -0.666 -4.132 1.699 -2.002 1.689 -1.779 -4.239 -4.193 0.263 0.015 -0.404 -0.014 -0.306 0.592 0.220 -0.646 -1.196 -0.403 0.733 0.297 0.824 -4.239 -0.039 -0.005 -1.900 -0.001 1.199 -0.593 -0.262 0.233 -0.416 0.315 -4.223 -0.835 -0.426 1.368 -2.057 1.964 -4.234 -2.311 1.416 -4.169 -0.432 -2.017 -1.844 1.822 -1.813 -1.909 letter-probability matrix: alength= 4 w= 15 n= 1638 0.015057 0.156827 0.012126 0.815991 0.014855 0.064680 0.747249 0.173216 0.068989 0.064491 0.012330 0.854191 0.177560 0.012667 0.743013 0.066760 0.908567 0.064699 0.012121 0.014612 0.338016 0.224332 0.172959 0.264693 0.227945 0.334674 0.266551 0.170831 0.122998 0.167964 0.380513 0.328525 0.498983 0.011756 0.222798 0.266463 0.075482 0.221794 0.525492 0.177232 0.234991 0.261012 0.171486 0.332511 0.015085 0.124455 0.170390 0.690069 0.067725 0.866257 0.012163 0.053855 0.751943 0.012346 0.169675 0.066037 0.078484 0.785185 0.065152 0.071179 Time 2.81 secs. ******************************************************************************** MOTIF 2 width = 29 sites = 16.7 ******************************************************************************** Simplified A 112254:2261:131552145:6613456 motif letter- C 13:::11211121:32:222::1161111 probability G 325121832215361:122113:225211 matrix T 65463514417241434452463111242 bits 2.2 2.0 1.7 1.5 Information 1.3 content 1.1 * (11.0 bits) 0.9 * 0.7 * * * 0.4 * * * ** * ** * 0.2 ******* *** * ** * ****** ** 0.0 ----------------------------- Multilevel TTGTATGTTATGTGTAATTAATAACGAAA consensus GCTATA GA TGACTTAGTTGTGGAGTT sequence G C C C -------------------------------------------------------------------------------- Motif 2 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 2 width=29 seqs=18 ce1cg ( 17) TTTTGTGGCATCGGGCGAGAATAGCGCGT 0.998757 ara ( 10) GCGTAACAAAAGTGTCTATAATCACGGCA 0.985606 bglr1 ( 19) ATTATTGGGATTTGTTATATATAACTTTA 0.997234 crp ( 7) GCGAAAGCTATGCTAAAACAGTCAGGATG 0.983896 cya ( 13) TTGTATGTAGCGCATCTTTCTTTACGGTC 0.998765 deop2 ( 58) TTAATTGTGATGTGTATCGAAGTGTGTTG 0.944264 gale ( 74) TGTTATGCTATGGTTATTTCATACCATAA 0.99921 ilv ( 13) TTTTTTGTTATCTGCAATTCAGTACAAAA 0.993791 lac ( 66) TCGTATGTTGTGTGGAATTGTGAGCGGAT 0.990948 male ( 43) GTGGGGCGTAGGGGCAAGGAGGATGGAAA 0.99294 malk ( 14) GGATGAGAACACGGCTTCTGTGAACTAAA 0.817941 malt ( 9) TCGTTTTAGGTGAGTTGTTAATAAAGATT 0.85896 ompa ( 68) TTGTAAGTTTTCAACTACGTTGTAGACTT 0.998448 tnaa ( 19) TCTTACGTAATTTATAATCTTTAAAAAAA 0.999254 uxu1 ( 42) GAATTCGGGATTGACATGTCTTACCAAAA 0.971604 pbr322 ( 78) TAAGGAGAAAATACCGCATCAGGCGCTCA 0.143855 trn9cat ( 56) CGGGAAGCCCTGGGCCAACTTTTGGCGAA 0.996117 tdc ( 27) TTTAAAGGTATTTAATTGTAATAACGATA 0.999313 // -------------------------------------------------------------------------------- Possible examples of motif 2 in the training set -------------------------------------------------------------------------------- Sequence name Start Score Site ------------- ----- ----- ----------------------------- ce1cg 17 11.19 TTGTGCTGGT TTTTGTGGCATCGGGCGAGAATAGCGCGT ara 10 11.93 GACAAAAAC GCGTAACAAAAGTGTCTATAATCACGGCA bglr1 19 11.61 CAATAACTTA ATTATTGGGATTTGTTATATATAACTTTA crp 7 9.33 CACAAA GCGAAAGCTATGCTAAAACAGTCAGGATG cya 13 13.34 GGTGCTACAC TTGTATGTAGCGCATCTTTCTTTACGGTC deop2 58 11.75 GTAGATTTCC TTAATTGTGATGTGTATCGAAGTGTGTTG gale 15 6.60 ATAAAAAACG GCTAAATTCTTGTGTAAACGATTCCACTA gale 74 17.69 TTCGCATCTT TGTTATGCTATGGTTATTTCATACCATAA ilv 13 21.81 TCCGGCGGGG TTTTTTGTTATCTGCAATTCAGTACAAAA lac 66 18.78 TGCTTCCGGC TCGTATGTTGTGTGGAATTGTGAGCGGAT lac 73 12.65 GGCTCGTATG TTGTGTGGAATTGTGAGCGGATAACAATT male 43 6.76 CAAAGCGACG GTGGGGCGTAGGGGCAAGGAGGATGGAAA malk 14 7.64 GGAGGCGGGA GGATGAGAACACGGCTTCTGTGAACTAAA malt 9 10.67 GATCAGCG TCGTTTTAGGTGAGTTGTTAATAAAGATT malt 16 7.96 GCGTCGTTTT AGGTGAGTTGTTAATAAAGATTTGGAATT ompa 68 9.95 GAGTTCACAC TTGTAAGTTTTCAACTACGTTGTAGACTT tnaa 19 15.59 ACATTAAAAT TCTTACGTAATTTATAATCTTTAAAAAAA uxu1 42 11.79 AACCCAATTA GAATTCGGGATTGACATGTCTTACCAAAA trn9cat 56 6.43 CTGTTGATAC CGGGAAGCCCTGGGCCAACTTTTGGCGAA tdc 27 18.52 CTTGTTGATA TTTAAAGGTATTTAATTGTAATAACGATA -------------------------------------------------------------------------------- ---------- ---------- GGTGTGAAAG GAAAAGTCCA TAAATTCCTA CTACAGTAAT AATCAGCAAG CGGAGTAGAT ATTTATTCCA GCC CGTGATCAAC AACAATTTCA TCAC GAGGTTGCCG CCGAGGTCAT TGGAATTGTG GTGACACAGT TACATCGCC GCATTTAATA GGTAGAACTT AATGAGACGT CTCTGGAAAG ---------- log-odds matrix: alength= 4 w= 29 n= 1386 bayes= 6.35455 -1.961 -1.691 0.341 1.090 -1.985 0.400 -0.403 0.791 -0.723 -4.145 0.990 0.438 -0.244 -4.146 -0.854 1.220 0.913 -4.140 -0.410 0.090 0.315 -0.838 -1.716 0.769 -4.146 -0.738 1.782 -2.069 -0.754 -0.297 0.360 0.423 -0.230 -0.826 -0.014 0.609 1.162 -0.951 -0.392 -1.734 -1.243 -1.684 -1.568 1.459 -4.129 0.087 1.149 -0.153 -1.217 -0.827 0.605 0.601 0.074 -4.135 1.298 -1.063 -1.127 0.641 -0.858 0.576 0.737 0.097 -4.115 0.065 0.723 -4.127 -0.938 0.592 -0.233 -0.321 -0.338 0.587 -1.799 -0.294 0.049 0.907 0.497 0.092 -0.877 -0.149 0.721 -4.145 -0.867 0.573 -4.139 -4.146 0.596 1.226 0.990 -0.750 -4.117 0.142 1.014 -0.844 0.043 -1.904 -1.216 1.345 0.104 -1.962 0.119 -1.690 1.149 -1.185 0.665 -0.820 0.055 -0.496 0.690 -1.695 -1.705 0.605 1.010 -1.685 -0.812 -0.197 letter-probability matrix: alength= 4 w= 29 n= 1386 0.072364 0.068753 0.289863 0.569020 0.071199 0.293039 0.173140 0.462622 0.170763 0.012552 0.454492 0.362192 0.237939 0.012538 0.126666 0.622857 0.530584 0.012590 0.172315 0.284511 0.350550 0.124186 0.069646 0.455619 0.015919 0.133074 0.787297 0.063710 0.167126 0.180762 0.293688 0.358424 0.240171 0.125245 0.226730 0.407853 0.630315 0.114880 0.174430 0.080375 0.119021 0.069111 0.077172 0.734696 0.016099 0.235741 0.507732 0.240429 0.121216 0.125146 0.348182 0.405457 0.296636 0.012635 0.562735 0.127995 0.129003 0.346246 0.126308 0.398443 0.469647 0.237470 0.013207 0.279676 0.464945 0.012704 0.119423 0.402929 0.239695 0.177671 0.181024 0.401610 0.080947 0.181093 0.236734 0.501225 0.397634 0.236657 0.124605 0.241104 0.464357 0.012545 0.125519 0.397579 0.015998 0.012539 0.346039 0.625423 0.559766 0.132023 0.013191 0.295021 0.569029 0.123728 0.235804 0.071439 0.121271 0.564184 0.245935 0.068610 0.305929 0.068787 0.507724 0.117560 0.446815 0.125804 0.237815 0.189566 0.454705 0.068593 0.070206 0.406496 0.567408 0.069030 0.130328 0.233234 Time 5.77 secs. Stopped because nmotifs = 2 reached. CPU: ghidorah ******************************************************************************** DEBUG INFORMATION ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. model: mod= oops nmotifs= 2 chi= 1 width: minw= 8 maxw= 57 shorten= yes lambda: minsites= 18 maxsites= 18 theta: prob= 1 spmap= uni spfuzz= 0.5 em: prior= dirichlet b= 1 maxiter= 50 distance= 0.001 data: n= 1890 N= 18 strands: w53 sample: seed= 0 seqfrac= 1 LRT: adj= root Letter frequencies: A 0.303 C 0.183 G 0.209 T 0.306 Non-redundant database letter frequencies: A 0.282 C 0.222 G 0.229 T 0.267 Effective length of alphabet = 4 Entropy of dataset (bits) = -1.96 meme crp0.s -mod oops -dna -nostatus -nmotifs 2 -gcg ********************************************************************************