BIOL 471
Multiple sequence alignment and analysis
exercise
Nucleotide and amino acid sequences are commonly used to reconstruct the phylogenetic history of organisms. Sequences obtained by researchers are deposited in various online databases and are freely available for use by others. The analysis of sequence data frequently begins by obtaining an alignment, in which multiple sequences are arranged to show positional homology. Since sequences vary in length because of mutations (insertions or deletions--indels, point mutations, translocations, etc.), gaps are added to sequences to maintain positional homology. However, positional homology is not known with certainty; it must be inferred using an analytical approach. In this exercise, you will use an online program that aligns sequences by matching similar portions of sequences as closely as possible, but with the fewest gaps possible. You will use the alignment to create a preliminary tree representing the phylogeny of a group of organisms.
Nucleotide sequence data can also be obtained from unrecognized or undescribed organisms to provide information about their identity or their proper placement among groups of organisms whose identity is already known. Analysis typically includes alignment of sequence information obtained for the same gene region in several organisms to see how homologous sites in the sequences compare. If the sequences are not properly aligned, valid comparisons cannot be made and identifications will be incorrect. In this exercise, you will also attempt to identify an unknown organism based on preliminary phylogenetic information obtained using related sequences obtained from online databases.
The most commonly used program to align sequences is ClustalW (or ClustalX, a similar text-based program), which is freely available online as source code or an executable program. Clustal aligns input sequences in pairs to produce a distance matrix, which is then used to produce a simple unrooted NJ guide tree. Using this tree, Clustal gradually builds an alignment by following the branching order on the tree. So the most terminal groups are aligned first, then treated as one (gaps are kept in place) and aligned with the next most closely related sequences. This continues until all sequences are aligned. The process is fast and can be used to align hundreds of input sequences.
Aligning existing sequences and creating a preliminary phylogeny:
Below are sequences of a gene amplified from the mitochondrial genome of representatives of the Ursidae (bear family), the relationships of which were studied in a series of papers by Talbot and Shields (1996a,b), and discussed in class. A recent paper (Lindqvist et al. 2010) reported the entire mitochondrial genome sequence of a Pleistocene polar bear, and I’ve included the orthologous portion of this sequence for comparison. You will use these to create an initial multiple sequence alignment for analysis later in class. Based on simple tests of relationships you should be able to answer the questions below. Since these are published sequences, you can access information about them online at the NCBI (National Center for Biotechnology Information) website using the access numbers (GenBank access numbers in this case will start with letters such as U***** DQ****** or EF******); the nucleotide database also provides information about the collection localities, depositors, literature, etc.
Once you have a preliminary alignment, you can produce a phylogeny using available online freeware. One such program is SplitsTree, available from www.splitstree.org. This is a free program, but you must register a site license. It will take an alignment formatted in Clustal (which you will have), or one of many other common formats (NEXUS, PHYLIP, FASTA) and create trees or networks using a variety of analytical approaches (background discussed by the developers in Huson and Bryant, 2006). You will not be required to use this program for this exercise, but I will briefly introduce its use in class.
Another online source is RAxML Blackbox, which permits a user to upload an aligned sequence file and analyze the sequences remotely at one of several RAxML webservers. For this exercise, we can use this resource (since our files are small and won’t take up much of the server’s time). The program can also be downloaded for free and used on one’s personal computer. RAxML (Randomized Axelerated Maximum Likelihood) is a tool developed by Alexandros Stamatakis to rapidly estimate phylogenies using maximum likelihood (Stamatakis et al. 2008). It can make use of either nucleotide or amino acid sequences. We will use it to analyze our aligned sequences and produce a preliminary rooted ML tree.
Identifying an unknown from sequence information:
You will also be given unpublished sequences obtained from unknown organisms, and you will be asked to find out as much as you can about these unknowns. These sequences were obtained with either fungal primers (nuITS) or universal bacterial primers (16S), so you should be retrieving either fungal or bacterial sequences.
To do these exercises, you must visit GenBank (to access deposited sequences) and also the European Bioinformatics Institute nucleotide database EMBL-EBI website that provides access to ClustalW, which will enable you to get an initial multiple alignment of your sequences.
Follow the procedure outlined below and answer the questions at the end of the exercise.
Procedure:
1. Visit GenBank and get information about the bear sequences (using either the Latin names or the accession numbers). Here is where you can find out who deposited them, what gene(s) were sequenced, if they were published and where, etc.
2. Copy and paste the sequence information below (it is already formatted in FASTA format, which can be recognized by ClustalW) into the ClustalW submission form window. Set the input sequence format to “DNA”. Use all default settings EXCEPT “Output Format” (under “More Options”). Under “FORMAT”, this should be set to “PHYLIP” so that we can use the alignment file in RAxML Blackbox later. Push "RUN." Wait for the program to align the sequences. When it is finished, you will see a listing of your sequences, now aligned so that columns represent homologous positions. Notice that they are not all the same length, so the alignment inserts gaps where something is missing or added (indels, positions where a base or bases were inserted or deleted in one or more sequences). The process used by the program to gap the sequences is complicated, but we would like it to minimize this as much as possible so as to create the most likely representation of the homologies among the sequences.
3. Look first at the “Result Summary.” The "Scores Table" tells you the percentage similarity of each pair of sequences (which may or may not be evolutionarily significant). If you sort by alignment score, you can see which sequences are most similar to each other, or to unknowns. Click on “Guide Tree” to see a simple NJ guide tree produced from the pairwise distance measures (a cladogram or phylogram which may or may not represent actual evolutionary relationships). Clustal uses this to construct the alignment in the sequence order defined by the tree. Keep in mind that this tree is unrooted and represents only the most simplistic relationships of the very limited number of specimens examined in this exercise. A more complicated analysis requires a different program and careful consideration of the sequences included in the analysis.
4. Save the alignment file to your computer and call it something you can remember later (e.g., “My bear alignment”). Right-hand click the link under “Alignments in PHYLIP format” and save to your computer.
5. Now produce a reduced version of your alignment by aligning only half (approximately) of each sequence below in ClustalW. Make sure you copy the front end of each sequence that includes the accession information. How does the alignment change (if at all)? How does the guide tree change?
6. What happens if you use only 10 bp sequences? Can you detect a significant loss of signal?
7. Now go to the RAxML Blackbox and upload your aligned sequences for ML analysis. Paste (or browse and choose) your named alignment file (remember, it must have been saved in phylip format). Indicate which sequence you will use as an outgroup to root the tree (“panda”, chosen because previous work has established that pandas are outside the bear clade), and choose “maximum likelihood”. Push ‘run’. Your analysis should not take long to run (less than a minute). You may have to check the indicated output page several times before it finishes. When finished, you will have 100 bootstrap trees, a consensus tree and the best-scoring ML tree. You can look at these trees directly or save them to files.
8. If you want to go further with this analysis, you can access a program at www.splitstree.org and produce preliminary visualizations of trees and networks from your aligned sequences. I will do this for the class so you can see how it works, but it is not required for this exercise.
9. To find out the possible identity of the fungal unknown, use the BLAST search tool at GenBank (see below). This tool searches nucleotide databases for sequences similar to yours. These presumably represent potential close relatives of the unknown (perhaps even the same species as the unknown, if sequences have been deposited previously).
To BLAST search the sequences, visit the BLAST page of the NCBI website where GenBank is located. BLAST (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. For a better understanding of BLAST you can refer to the BLAST Course which explains the basics of the BLAST algorithm, or to the NCBI BLAST tutorial. For our purposes, it will be sufficient to do the following:
10. For one of the unknowns, add the first five sequences to the unknown and produce an alignment of the sequences in Clustal. You must save the sequences you want to align in FASTA format. Produce a NJ guide tree to see how these sequences cluster together.
Questions:
1. Which mitochondrial gene is used for this alignment? By looking at the alignment, can you tell whether the sequences all cover the same portion of the gene? Are they all the same length? What can cause the sequences to be unequal in length (why must they be gapped in the alignment)?
2. A quick look at the “Scores Table” in Clustal gives you an idea of the pairwise relationships of the sequences. Based on these scores, can you tell whether the brown bear sequences are the most similar to each other? Are there situations where brown bears are more closely related to some other species than to other brown bears?
3. The brown bear sequences represent various populations distributed
throughout
4. We used the giant panda sequence as an outgroup to ‘root’ the tree. What is an ‘outgroup’ and how is it chosen? What will happen if we change the outgroup?
5. What are the bootstrap values on the ML tree an indication of? What does ‘bootstrapping’ mean in phylogenetic analysis? Which groups are best supported? Which are least supported? What do you think could be done to improve the statistical support for our phylogenetic groups?
6. How does the Pleistocene polar bear sequence fit with present-day polar bear and brown bear sequences?
7. What would happen if you used unaligned sequences in the RAxML analysis? How would the support (bootstrap) values change? [In this particular case, there is little difference since the sequences are so similar to each other]
8. Based on your BLAST search, which unknown is a basidiomycete fungus? Which one is a bacterium (actually a cyanobacterium)? Which is a plant? How similar are they to sequences retrieved from genbank (use scores)?
9. For the unknowns, different genes were sequenced. For which unknown was a mitochondrial gene (small subunit ribosomal) sequenced and used in the search? For which unknown was a bacterial gene (small subunit ribosomal) used? For which was a chloroplast gene used?
10. What information is gained about the unknowns by creating an alignment using additional sequences from GenBank?
Bear Sequence Data:
AY390359_A_melanoleuca_giant_panda
U18870_Ursus_arctos_GB01
U18886_Ursus_arctos_GB17
U18888_Ursus_arctos_GB19
U18878_Ursus_arctos_GB09
U18897_Ursus_arctos_GB28
EU567096_U_maritimus_polar_bear
AF007937_U_americanus_black_bear
U23554_Tremarctos_ornatus_speckled_bear
U23562_Melursus_ursinus_sloth_bear
U23558_S_thibetanus_Asian_black_bear
Helarctos_malayanus_sun_bear
Ursus_arctos_paleo_76824
>panda
ATGATCAACATCCGAAAAACTCATCCATTAGTTAAAATTATCAACAACTCATTCATTGACCTTCCAACAC
CATCAAACATTTCAACATGATGGAACTTTGGGTCTCTGTTAGGAGTGTGTCTGATCTTGCAAATCTTAAC
AGGCTTATTTCTAGCCATACACTATACATCAGATACAGCTACAGCCTTTTCATCAGTCGCACACATTTGT
CGAGACGTCAACTATGGTTGATTTATCCGATATATACATGCCAATGGGGCCTCTATATTTTTTATCTGCC
TATTTATACACGTAGGGCGAGGCTTATACTATGGATCATACCTATTTCCAGAGACATGGAATATCGGAAT
TATTCTCCTACTTACAGTTATAGCCACAGCATTCATAGGGTATGTACTACCTTGAGGACAAATATCCTTC
TGAGGAGCAACCGTCATTACTAACCTACTATCAGCAATTCCTTACATTGGCACTAATCTAGTGGAGTGAG
TCTGAGGGGGTTTCTCCGTAGATAAAGCAACACTAACCCGATTTTTTGCTTTTCACTTTATCCTTCCATT
TATCATCTCAGCACTAGCAATAGTCCATCTATTATTCCTTCACGAAACAGGATCTAATAACCCCTCCGGA
ATTCCATCTGACCCAGACAAAATCCCATTTTACCCCTATCATACAATTAAAGACATCCTAGGCGTCCTAT
TTCTTGTCCTCGCCTTAATAACCCTGGCTTTATTCTCACCAGACCTGTTAGGAGACCCTGATAACTATAC
CCCTGCAAATCCACTAAGTACCCCGCCACATATTAAGCCTGAATGGTACTTTCTATTTGCCTACGCTATC
CTGCGATCTATTCCTAATAAACTAGGAGGGGTGCTAGCTCTAATCTTCTCTATTCTAATTCTAACTATTA
TTCCACTATTACATACATCCAAACAACGAAGCATGATATTCCGACCTCTAAGTCAATGCTTATTCTGACT
CCTAGTAGCAGACCTACTCACACTAACATGAATTGGAGGACAGCCAGTAGAACACCCCTTCATTATTATT
GGGCAATTGGCCTCTATTCTCTACTTTACAATTCTTCTAGTACTTATACCTATCACTAGCATTATTGAGA
ATAGCCTCTCAAAATGAAGA
>BrownGB01
ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC
CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTGTGTTTAATTCTACAGATTCTAAC
AGGCCTGTTTCTAGCCATACACTATACATCAGACACAACCACAGCTTTTTCATCAGTCACCCACATTTGC
CGAGACGTTCACTACGGGTGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATGTTCTTTATCTGCC
TATTCATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT
TATTCTCCTATTTACAGTTATAGCCACCGCATTTATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTC
TGAGGAGCAACTGTCATCACCAATCTACTATCGGCCGTTCCCTATATCGGAACGGACCTGGTAGAATGAA
TCTGAGGGGGCTTTTCCGTAGATAAGGCGACTCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT
CATCATCCTAGCACTAGCAGCAGTCCACCTATTATTCCTACACGAAACAGGATCCAACAACCCCTCTGGA
ATCCCATCTGACTCAGACAAAATCCCATTCCACCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC
TTCTCACCCTAGCCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGACAACTATAT
CCCCGCAAATCCACTGAGCACCCCACCCCACATCAAACCCGAGTGGTACTTTCTATTTGCCTACGCTATC
CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCCTCA
TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGGCCCCTAAGCCAATGCCTATTTTGACT
TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACACCCCTTCATTATTAT
GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCCTAGTACTTATACCCACCGCTGGAATTATTGAAA
ACAACCTCTTAAAGTGGAGA
>BrownGB17
ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC
CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTATGTTTAATTCTACAGATTCTAAC
AGGCCTGTTCCTAGCCATACACTATACACCAGACACAACCACAGCTTTTTCATCGGTCACCCACATTTGC
CGAGACGTTCACTACGGATGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATCTTCTTTATCTGCC
TATTTATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT
TATTCTCCTATTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGGGGCCAAATGTCCTTC
TGAGGAGCGACTGTCATCACCAATCTACTATCGGCCATTCCCTACATCGGAACGGACCTGGTAGAATGAA
TCTGAGGGGGCTTTTCCGTAGATAAGGCGACCCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT
CATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCTAACAACCCCTCTGGA
ATCCCATCTGACTCAGACAAAATCCCATTCCATCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC
TTCTCGCCCTAACCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGATAACTATAC
CCCCGCAAATCCACTGAGCACTCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC
CTACGATCTATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCATCA
TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGACCCCTAAGCCAATGCCTATTTTGACT
TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTCATTATTATC
GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCTTAGTACTTATACCTATCGCTGGAATTATCGAAA
ACAACCTCTTAAAGTGGAGA
>BrownGB10
ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC
CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTATGTTTAATTCTACAGATTCTAAC
AGGCCTGTTCCTAGCCATACACTATACACCAGACACAACCACAGCTTTTTCATCGGTCACCCACATTTGC
CGAGACGTTCACTACGGGTGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATCTTCTTTATCTGCC
TATTTATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCCCAGAAACATGAAACATTGGCAT
TATTCTCCTATTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGGGGCCAAATGTCCTTC
TGAGGAGCGACTGTCATCACCAACCTACTATCGGCCATTCCCTACATCGGAACGGACCTGGTAGAATGAA
TCTGAGGGGGCTTTTCCGTAGATAAGGCGACCCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT
CATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCTAACAACCCCTCTGGA
ATCCCATCTGACTCAGACAAAATCCCATTCCATCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC
TTCTCGCCCTAACCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGATAACTATAC
CCCCGCAAATCCACTGAGCACTCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC
CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCATCA
TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGACCCCTAAGCCAATGCCTATTTTGACT
TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTCATTATTATC
GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCTTAGTACTTATACCTATCGCTGGAATTATCGAAA
ACAACCTCTTAAAGTGGAGA
>BrownGB09
ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC
CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTATGTTTAATTCTACAGATTCTAAC
AGGCCTGTTCCTAGCCATACACTATACACCAGACACAACCACAGCTTTTTCATCGGTCACCCACATTTGC
CGAGACGTTCACTACGGATGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATCTTCTTTATCTGCC
TATTTATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT
TATTCTCCTATTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGGGGCCAAATGTCCTTC
TGAGGAGCGACTGTCATCACCAATCTACTATCGGCCATTCCCTACATCGGAACGGACCTGGTAGAATGAA
TCTGAGGGGGCTTTTCCGTAGATAAGGCGACCCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT
CATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCTAACAACCCCTCTGGA
ATCCCATCTGACTCAGACAAAATCCCCTTCCATCCATACTATACAATTAAAGATATTCTAGGCGCCCTAC
TTCTCGCCCTAACCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGATAACTATAC
CCCCGCAAATCCACTGAGCACTCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC
CTACGATCTATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCATCA
TTCCCCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGACCCCTAAGCCAATGCCTATTTTGACT
TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTCATTATTATC
GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCCTAGTACTTATACCTATCGCTGGAATTATTGAAA
ACAACCTCTTAAAGTGGAGA
>BrownGB28
ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC
CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTATGTTTAATTCTACAGATTCTAAC
AGGCCTGTTCCTAGCCATACACTATACACCAGACACAACCACAGCTTTTTCATCGGTCACCCACATTTGC
CGAGACGTTCACTACGGGTGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATCTTCTTTATCTGCC
TATTTATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT
TATTCTCCTATTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGGGGCCAAATGTCCTTC
TGAGGAGCGACTGTCATCACCAATCTACTATCGGCCATTCCCTACATCGGAACGGACCTGGTAGAATGAA
TCTGAGGGGGCTTTTCCGTAGATAAGGCGACCCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT
CATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCTAACAACCCCTCTGGA
ATCCCATCTGACTCAGACAAAATCCCATTCCATCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC
TTCTCGCCCTAACCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGACAACTATAC
CCCCGCAAATCCACTGAGCACTCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC
CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCATCA
TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGACCCCTAAGCCAATGCCTATTCTGACT
TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTCATTATTATC
GGACAACTGGCCTCCATTCTCTACTTTACAATCCTCCTAGTACTTATACCCATCGCTGGAATTATCGAAA
ACAACCTCTTAAAGTGGAGA
>Polarbear
ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGATCTTCCAACAC
CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTGTGTTTAATTCTACAGATTCTAAC
AGGCCTGTTTCTAGCCATACACTATACATCAGACACAACCACAGCTTTTTCATCAGTCACCCACATTTGC
CGAGACGTTCACTACGGGTGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATGTTCTTTATCTGCC
TATTCATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT
TATTCTCCTATTTACAGTTATAGCCACCGCATTTATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTC
TGAGGAGCGACTGTCATCACCAATCTACTATCGGCCATTCCCTATATCGGAACGGACCTGGTAGAATGAA
TCTGAGGGGGCTTTTCCGTAGATAAGGCGACTCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT
CATCATCCTAGCACTAGCAGCAGTCCACCTATTGTTCCTACACGAAACAGGATCCAACAACCCCTCTGGA
ATCCCATCTGACTCAGACAAAATCCCATTCCATCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC
TTCTCACCCTAGCCCTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGATAACTATAT
CCCCGCAAATCCACTAAGCACCCCACCCCACATCAAACCCGAGTGGTACTTTCTATTTGCCTACGCTATC
CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCCTCA
TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGGCCCCTAAGCCAATGCCTATTTTGACT
TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACACCCCTTCATTATTATC
GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCCTAGTACTCATACCCATCGCTGGAATTATTGAAA
ACAACCTCTTAAAGTGGAGA
>Blackbear
GAAACTTCGGATCCCTCCTCGGAGTATGTTTAGTACTACAAATTCTAACGGGCCTATTTCTAGCCATACA
CTACACATCAGATACAACTACAGCCTTTTCATCAATCACCCATATTTGCCGAGATGTTCACTACGGATGA
ATTATCCGATACATACATGCTAACGGAGCTTCCATGTTCTTTATCTGCCTGTTCATGCACGTAGGACGGG
GTCTGTACTATGGCTCATACCTACTCTCAGAAACATGAAACATTGGCATTATCCTCCTATTTACAGTTAT
AGCCACCGCATTCATAGGATATGTCCTGCCCTGAGGCCAAATATCCTTCTGAGGAGCAACTGTTATCACC
AACCTCCTATCAGCCATCCCCTATATTGGAACAGACCTAGTAGAATGGATCTGAGGGGGCTTTTCTGTGA
ATAAGGCAACTCTGACACGATTCTTTGCCTTCCACTTTATTCTTCCATTCATCATCTTGACACTAGCAGC
AGTCCACCTATTATTCCTACACGAAACAGGATCTAATAACCCCTCTGGAATCCCATCTGACTCAGACAAA
ATCCCATTTCATCCATATTATACAATTAAAGACGCCCTAGGCGCCCTACTTTTCATCCTAGCCCTAGCAA
CTCTAGTCCTATTCTCGCCTGACCTACTAGGAGATCCCGATAACTACACCCCCGCAAACCCACTGAGCAC
CCCACCCCACATCAAACCT
>Speckled
ATGACCAACATCCGAAAAACTCACCCACTAGCTAAAATCATCAACAGCTCATTCATTGACCTCCCAACAC
CATCAAATATCTCAGCGTGATGAAACTTCGGGTCCCTTCTTGGGGTGTGCCTGATCCTACACATCCTAAC
GGGCCTATTCCTGGCCATACACTATACAGCAGACACGACTACAGCCTTCTCATCAGTCGCCCATATCTGT
CGAGACGTTAACTACGGATGAGTTATCCGATACATACACGCGAACGGAGCTTCAATATTCTTTATCTGCT
TGTTCATACACGTGGGACGGGGTCTGTATTACGGCTCATACCTATTCTCAGAAACATGAAACATTGGAAT
TATTCTCCTACTCACAATTATAGCCACAGCATTCATGGGGTACGTGCTGCCCTGAGGCCAAATATCCTTT
TGAGGAGCAACCGTCATCACCAATCTGCTATCAGCTATCCCCTACATTGGAACCGACCTAGTAGAATGAA
TCTGAGGTGGATTCTCAGTAGATAAAGCAACCCTTACCCGATTTTTCGCTTTTCACTTTATCCTTCCATT
CATTATTTTAGCACTAGCCATAGTCCACCTATTATTTCTTCACGAAACAGGATCCAACAATCCCTCTGGA
ATCTCATCGAACTCAGACAAAATCCCATTTCACCCTTACTATACAATTAAAGATATTCTAGGCGTCTTAC
TTCTTCTCCTAGCCCTGGTAACCCTAGTCCTATTCTCACCCGACTTACTAGGAGACCCCGACAACTACAC
CCCTGCAAACCCAGTGAGCACCCCACTACATATCAAGCCTGAATGGTACTTCTTATTTGCCTACGCCATT
CTACGATCTATTCCCAATAAATTGGGAGGAGTACTGGCCCTAATCTTCTCCATTCTAATCCTAGCTATCA
TTCCTCTGCTGCACACATCCAAACAACGAGGAATGATATTCCGACCTTTAAGCCAATGCCTTTTCTGGCT
TCTAGCAGCAGACTTACTAACACTAACATGAATCGGAGGACAACCAGTGGAACATCCTCTTGTTATCATC
GGACAGCTAGCCTCTATCCTCTACTTCACAATCCTCCTAGTACTTATACCCATCGCCGGAATCATTGAAA
ATAACCTCTCAAAGTGAAGA
>Sloth
ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATTAACAACTCACTCATTGACCTCCCAGCAC
CGTCAAACATCTCAGCATGATGAAACTTCGGATCCCTCCTCGGAGTGTGCTTAATTCTACAAATTCTAAC
AGGCCTATTTCTAGCCATGCACTATACATCAGACACAACCACAGCCTTTTCATCAGTCACCCATATCTGT
CGAGACGTCCACTACGGATGAATCATCCGATATATACATGCAAACGGGGCCTCCATATTCTTTATCTGCC
TATTCATGCACGTAGGACGGGGTCTGTACTATGGCTCATACCTATTCTCGGAGACATGAAACACCGGCAT
TATTCTCCTATTTACAGTCATAGCCACCGCATTCATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTC
TGAGGAGCAACTGTCATCACCAATCTGCTATCGGCCATTCCCTATATTGGAGCGGACCTAGTAGAATGAA
TCTGAGGGGGGTTTTCCGTAGACAAGGCGACTCTAACACGATTCTTTGCCTTCCACTTTATCTTTCCATT
TATCATCCTAGCACTGGTAATAGTCCACCTATTGTTCCTACATGAAACAGGATCTAACAACCCCTCTGGA
ATCCCATCCAACTCAGACAAAATCCCATTTCACCCATATTATACAATTAAAGATATTATAGGCGCCTTAC
TTCTCATCCTAGCCCTGGCAACCCTAGTCCTATTCTCACCCGACTTACTAGGAGACCCCGACAACTACAC
CCCTGCAAACCCACTGAGCACCCCACCCCACATCAAACCCGAGTGGTACTTTCTATTTGCCTACGCTATC
CTACGATCCATCCCCAATAAACTAGGAGGGGTACTAGCACTAATTTTCTCCATCCTAATCCTAGCTATCA
TTCCCCTTCTACACACATCCAAACAACGAGGAATGATATTCCGGCCCCTAAGCCAATGCCTATTTTGACT
CCTAGTAGCAGACCTACTAACACTTACATGAATCGGAGGACAACCAGTAGAATATCCCTTCATCACTATT
GGACAACTAGCCTCCATCCTCTACTTCATAATCCTCCTAGTACTCATGCCCATCGCCGGAATCATTGAAA
ATAATCTCTCAAAGTGAAGA
>Asianblack
ATGACCAACATCCGAAAAACCCATCCATTAGCCAAAATCATCAACAACTCACTCATTGATCTCCCAGCAC
CATCAAATATCTCAGCATGATGAAACTTTGGATCCCTCCTCGGAATATGCCTAATCCTACAGATTCTGAC
AGGCCTATTTCTAGCTATACACTACACATCAGACGCGACTACAGCCTTTTCATCAGTCGCCCATATTTGC
CGAGACGTCCATTACGGATGAATTATCCGATACATACATGCAAACGGAGCCTCCATGTTCTTCATCTGCC
TATTCATACACGTAGGACGGGGCTTGTACTATGGCTCATACCTACTCTCAGAAACATGAAACATTGGCAT
CATCCTCCTATTTACAGTTATAGCCACCGCATTCATAGGATATGTCCTACCCTGAGGCCAAATATCTTTC
TGAGGAGCGACTGTCATTACCAACCTCCTATCAGCCATTCCCTATATTGGAACGGACCTAGTAGAGTGAA
TCTGAGGGGGCTTTTCCGTAGATAAAGCAACCCTAACACGATTCTTTGCTTTCCACTTTATCCTTCCATT
TATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCCAACAACCCCTCTGGA
ATCCCATCCGACTCGGACAAAATCCCATTCCACCCATACTATACAATTAAGGACGCCCTAGGCGCCCTAC
TTCTCATTCTAGCCCTAGCAACTCTAGTTCTATTCTCGCCCGACTTACTGGGAGACCCTGACAACTATAC
CCCCGCAAACCCACTGAGCACCCCGCCCCACATCAAGCCCGAGTGATACTTTTTATTTGCTTACGCCATC
TTACGATCCATCCCCAACAAACTAGGAGGAGTACTAGCGCTAATCTTCTCTATCCTAATCCTAGCCATTA
TCCCCCTTCTACACACATCCAAACAACGAGGAATAATGTTCCGACCCCTAAGCCAATGCCTATTTTGACT
CCTAGTAGCAGACCTACTAACACTAACATGAATCGGAGGACAACCAGTAGAACATCCCTTCATCATTATC
GGACAGCTAGCCTCCATCCTCTACTTCACAATCCTCCTGGTGCTCATGCCCATCGCTGGAATCATTGAAA
ACAATCTCTCAAAGTGAAGA
>Sun
ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATTAACAACTCACTTATTGACCTCCCAGCAC
CATCAAACATCTCGGCGTGATGAAACTTCGGATCCCTCCTCGGAGTATGCTTAATCCTACAGATTATGAC
AGGCCTATTTCTAGCCATACACTATACATCAGACACAACCACAGCCTTTTCATCAATCACTCATATCTGC
CGAGACGTTCACTACGGATGAATTATCCGATATATACATGCAAACGGAGCCTCCATGTTCTTTATCTGCC
TATTCATGCACGTAGGACGGGGTCTGTACTATGGCTCGTACCTATTCTCAGAAACATGAAACATCGGTAT
TATCCTCCTATTTACAGTTATAGCCACCGCATTTATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTC
TGAGGAGCAACTGTCATTACCAATCTCTTATCAGCCATCCCCTATATTGGAACGGACCTAGTAGAATGAG
TCTGAGGAGGCTTTTCCGTAGACAAGGCGACTCTAACACGATTCTTTGCCTTCCACTTTATCCTTCCGTT
CATCATCTTGGCACTAACAGCGGTCCACCTATTATTCCTACACGAAACAGGGTCCAACAATCCCTCTGGA
ATCCCATCTGACTCAGACAAAATCCCATTTCACCCGTACTATACAATTAAGGACATCCTAGGCGCCCTAC
TTCTTACCCTAGCCCTAACAACCCTAGTTCTATTCTCGCCCGACTTACTAGGAGACCCTGACAACTACAT
CCCCGCAAATCCATTGAGCACCCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC
CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCTCTAGTCTTCTCTATCCTAATCCTAGCCATTA
TCCCCCTCTTACACACATCCAAGCAACGAGGAATGATATTCCGACCTCTGAGCCAATGCCTATTTTGACT
CCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTTACCATTATC
GGACAACTAGCCTCCATTCTCTATTTCATAATCTTCCTAGTATTCATACCCATCGCTGGAATTATTGAAA
ATAACCTCTCAAAATGAAGA
>Paleopolar
tgaccaacatccgaaaaacccacccattagctaaaatcatcaacaactcatttattgaccttcc
aacaccatcaaacatctcagcatgatgaaactttggatccctccttggagtatgtttaat
tctacagattctaacaggcctgttcctagccatacactatacaccagacacaaccacagc
tttttcatcggtcacccacatttgccgagacgttcactacgggtgagttatccgatatgt
acatgcaaatggagcctccatcttctttatctgcctatttatgcacgtaggacggggcct
gtactatggctcatacctattcccagaaacatgaaacattggcattattctcctatttac
aattatagccaccgcatttataggatacgtcctaccctggggccaaatgtccttctgagg
agcgactgtcatcaccaacctactatcggccattccctacatcggaacggacctggtaga
atgaatctgagggggcttttccgtagataaggcgaccctaacacgattctttgctttcca
ctttattctcccgttcatcatcctagcactagcagcagtccatctattgttcctacacga
aacaggatctaacaacccctctggaatcccatctgactcagacaaaatcccattccatcc
atactatacaattaaggatattctaggcgccctacttctcgccctaaccttagcaaccct
agtcctattctcgcccgacttactaggagaccctgataactatacccccgcaaatccact
gagcactccaccccacatcaaacccgaatggtactttctatttgcctacgctatcctacg
atccatccctaataaactaggaggagtactagcactaattttctccattctaatcctagc
catcattcctcttctacacacgtccaaacaacgaggaatgatattccgacccctaagcca
atgcctattttgacttctagtagcagacctactaacactaacatgaattggaggacaacc
agtagaacatcccttcattattatcggacaactagcctccattctctactttacaatcct
cttagtacttatacctatcgctggaattatcgaaaacaacctcttaaagtggagagtctt
tgtag
Unknown
sequences:
>unknown_1
ACCCGCTGAAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACTAACAA
GGATTCCCCTAGTAACTGCGAGTGAAGCGGGAGAAGCTCAAATTTAGAAT
CTGGCGGTCCCCGCGGCCGTCCGAGTTGTAATCTAGAGAAGCGTCATCCG
CGCCAGCCCGCGTACAAGTCTCTTGGAATAGAGCGTCGCAGAGGGTGAGA
ATCCCGTCTCTGACGCGGACCGCTGGCGCGTTGCGATACGTTCTCGATGA
GTCGAGTTGTTTGGGAATGCAGCTCAAAATGGGTGGTAAATTCCATCTAA
AGCTAAATATTGGCGAGAGACCGATAGCGAACAAGTACCGTGAGGGAAAG
ATGAAAAGAACTTTGGAAAGAGAGTTAAACAGTACGTGAAATTGTTGAAA
GGGAAACGCTTGAAGTCAGTCGCGTACGCTGGGAATCAGCCTTCTCTCGA
GGCGGCGCACTTCCCAGCGAACGGGTCGGCATCAATTTCATCTGCCGGAG
AATGGCGGGGGGAATGTGGCATCTCTTCGGATGTGTTATAGCCTCCCGTC
GCATGCTGCAGATGGGATTGAGGATCTCAGCACGCCGCAAGGCCGGGGCT
CGCCCACGTACGTGCTTAGGATGCCGGCATAATGGCTTTAATCGACCCGT
CTTGAAACACGGACCAAGGAGTCTAACATGCTCGCGAGTGTTTGGGTGTC
AAACCCGAGCGCGCAACGAAAGTGAAAGTTGAGATCTCTGTCGTGGAGAG
CATCGACGCCCAGACCAGACCTTCTGCGACGGATCTGCGGTCGAGCGCGT
ATGTTGGGACCCGAAAGATGGTGAACTATGCCTGAATAGGGCGAAGCCAG
AGGAAACTCTGGTGGAGGCTCGTAGCGATTCTGACGTGCAAATCGATCGT
CGAATTTGGGTATAGGGGCGAAAGACTAATCGAACCATCTAGTAGCTGGT
TCCTGCCGAAGTTTCCCTCAGGATAGCAGAAACTCGTATCAGATTTATGT
GGTAAAGCGAATGATTAGAGGCCTTGGGGTTGAAACAACCTTAACCTATT
CTCAAACTTTAAATATGTAAGAACGGGGGGTCTCTTGATTGGACCTCCCG
GCGATTGAGAGTTTCTAGTGGGCCATTTTTGGTAAGCAGAACTGGCGATG
CGGGATGAACCGAACGCGAGGTTAAGGTGCCGGAATTCACGCTCATCAGA
CACCACAAAAGGTGTTAGTTCATCTAGACAGCAGGACGGTGGCCATGGAA
GTCGGAATCCGCTAAGGAGTGTGTAACAACTCACCTGCCGAATGAACTAG
CCCTGAAAATGGATGGCGCTTAAGCGTGATACCCATACCTCGCCGTCGGC
GTTCGAGTGACGCGCCGACGAGTAGGCAGGCGTGGAGGTCCGTGAAGAAG
CCTCGGCAGCGATG
>unknown_2
CGATAAATGGCTCATTGGGATAGATATAAATGAACAATACCCCCCCTAGAAACGTATAAGAGAGGTTTTC
TCCTCATACGGCTCGCGAAAAAACGATTCGAAATTATTATGTATCGAATTAGAATGTCAAATATCAAATA
GATATATAAAATCATCAAATCAATTTCCAGAGATTTAAGTCCTTCTTTTTTTTCTTCTTTTTCGAAAAAG
AAGAAAAAAAGAGCATTCGTACTCTCATAACTCAAGTTGGATAACTTTCAAATAGCCTATAAAGAACAGC
CTTAGGCATTTATTTCATTTTTTGAGCGGTCTCTAACCCCTTTGTTGTTTGTCTCCTTTCGAATCCATTT
TTGGAGTCTCGATTCTGATCTAATTATTGAGACAATTGAAAACGGTATTTCCTTGTTCCAGGATCCTTTA
TCTTTGCCTTGAATCATTGGGTTTAGACATTACTTCGGTGATCTTTAATCGTTTTCAAAAATGGCAGCAA
CAAACCTCTTTTTGTGATTTCTTTCTATGAAAGAATCATACGAACAATTGATTCCTGCATGATACACTTT
TGATCGAAAGAGTTTTACCAATTCAAAAAGATTTTCCTTTTGCATTGAAAAATTGTTCGAATCGGATCCT
TTCGATTTCGATATCAAAAATATACTTACGAAGTTTGTTCCAACGTATTGATTGGTATTAACCCTAGACC
CTTGCCCCTGAGAAATGAATAAATACTTTCTACTCGAGCTCCATCAAGTACTATTTACATTACAACCCAA
CAAAAAACGAGGGTTGTAGTAGAACCGAACAAAGGATGTCGAGCCAAGAGCCCATTCATTCCTAGATAAT
ATAAAATAGAAAATGGTGGATGGAAAAAAAATCCACAGCTGATCATGTCCTTCAA
>unknown_3
GAAAGCCTGACGGAGCAATACCGCGTGAGGGAGGAAGGCTCTTGGGTTGTAAACCCTCTTTTCTCAGGGA
AGAATCAATGAAGGTACTTGAGGAATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGG
ATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCCGCAGGTGGTTTTTCAAGTCTCCTGTCAAAGC
STCGGGCTCAACTCGAAAARGGCAGGGGAAACTGANAGACTAGAGTAAGGTAGGGGTAGAGGRAATTCCN
GGTGTAGCGGTGAAATGCGTAAAGATCAGGAAAAACACCGGTGGCGAAAGCNCTCTGCTGGACTATTACT
GACACTGAGGGACRAAAGCTAGGGGAGCGAATGGGATTAAATACCCCAGGTAGTCA
Literature:
Huson, D.H. and D. Bryant. 2006. Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution 23: 254-267.
Lindqvist, C. et al. (12 additional authors). 2010. Complete mitochondrial genome of a Pleistocene jawbone unveils the origin of polar bear. PNAS 107: 5053-5057.
Stamatakis, A., P. Hoover, J. Rougemont. 2008. A rapid bootstrap algorithm for the RAxML webservers. Systematic Biology 75: 758-771.
Talbot, S. L. and G. F. Shields. 1996a. A phylogeny of the bears (Ursidae) inferred from complete sequences of three mitochondrial genes. Molecular Phylogenetics and Evolution 5: 567-575.
Talbot, S. L. and G. F. Shields. 1996b.
Phylogeography of brown bears (Ursus arctos) of