BIOL 471

Multiple sequence alignment and analysis exercise

Nucleotide and amino acid sequences are commonly used to reconstruct the phylogenetic history of organisms. Sequences obtained by researchers are deposited in various online databases and are freely available for use by others. The analysis of sequence data frequently begins by obtaining an alignment, in which multiple sequences are arranged to show positional homology. Since sequences vary in length because of mutations (insertions or deletions--indels, point mutations, translocations, etc.), gaps are added to sequences to maintain positional homology. However, positional homology is not known with certainty; it must be inferred using an analytical approach. In this exercise, you will use an online program that aligns sequences by matching similar portions of sequences as closely as possible, but with the fewest gaps possible. You will use the alignment to create a preliminary tree representing the phylogeny of a group of organisms.

Nucleotide sequence data can also be obtained from unrecognized or undescribed organisms to provide information about their identity or their proper placement among groups of organisms whose identity is already known. Analysis typically includes alignment of sequence information obtained for the same gene region in several organisms to see how homologous sites in the sequences compare. If the sequences are not properly aligned, valid comparisons cannot be made and identifications will be incorrect. In this exercise, you will also attempt to identify an unknown organism based on preliminary phylogenetic information obtained using related sequences obtained from online databases.

The most commonly used program to align sequences is ClustalW (or ClustalX, a similar text-based program), which is freely available online as source code or an executable program. Clustal aligns input sequences in pairs to produce a distance matrix, which is then used to produce a simple unrooted NJ guide tree. Using this tree, Clustal gradually builds an alignment by following the branching order on the tree. So the most terminal groups are aligned first, then treated as one (gaps are kept in place) and aligned with the next most closely related sequences. This continues until all sequences are aligned. The process is fast and can be used to align hundreds of input sequences.

Aligning existing sequences and creating a preliminary phylogeny:

Below are sequences of a gene amplified from the mitochondrial genome of representatives of the Ursidae (bear family), the relationships of which were studied in a series of papers by Talbot and Shields (1996a,b), and discussed in class. A recent paper (Lindqvist et al. 2010) reported the entire mitochondrial genome sequence of a Pleistocene polar bear, and I’ve included the orthologous portion of this sequence for comparison. You will use these to create an initial multiple sequence alignment for analysis later in class. Based on simple tests of relationships you should be able to answer the questions below. Since these are published sequences, you can access information about them online at the NCBI (National Center for Biotechnology Information) website using the access numbers (GenBank access numbers in this case will start with letters such as U***** DQ****** or EF******); the nucleotide database also provides information about the collection localities, depositors, literature, etc.

Once you have a preliminary alignment, you can produce a phylogeny using available online freeware. One such program is SplitsTree, available from www.splitstree.org. This is a free program, but you must register a site license. It will take an alignment formatted in Clustal (which you will have), or one of many other common formats (NEXUS, PHYLIP, FASTA) and create trees or networks using a variety of analytical approaches (background discussed by the developers in Huson and Bryant, 2006). You will not be required to use this program for this exercise, but I will briefly introduce its use in class.

Another online source is RAxML Blackbox, which permits a user to upload an aligned sequence file and analyze the sequences remotely at one of several RAxML webservers. For this exercise, we can use this resource (since our files are small and won’t take up much of the server’s time). The program can also be downloaded for free and used on one’s personal computer. RAxML (Randomized Axelerated Maximum Likelihood) is a tool developed by Alexandros Stamatakis to rapidly estimate phylogenies using maximum likelihood (Stamatakis et al. 2008). It can make use of either nucleotide or amino acid sequences. We will use it to analyze our aligned sequences and produce a preliminary rooted ML tree.

Identifying an unknown from sequence information:

You will also be given unpublished sequences obtained from unknown organisms, and you will be asked to find out as much as you can about these unknowns. These sequences were obtained with either fungal primers (nuITS) or universal bacterial primers (16S), so you should be retrieving either fungal or bacterial sequences.

To do these exercises, you must visit GenBank (to access deposited sequences) and also the European Bioinformatics Institute nucleotide database EMBL-EBI website that provides access to ClustalW, which will enable you to get an initial multiple alignment of your sequences.

Follow the procedure outlined below and answer the questions at the end of the exercise.

Procedure:

1. Visit GenBank and get information about the bear sequences (using either the Latin names or the accession numbers). Here is where you can find out who deposited them, what gene(s) were sequenced, if they were published and where, etc.

2. Copy and paste the sequence information below (it is already formatted in FASTA format, which can be recognized by ClustalW) into the ClustalW submission form window. Set the input sequence format to “DNA”. Use all default settings EXCEPT “Output Format” (under “More Options”). Under “FORMAT”, this should be set to “PHYLIP” so that we can use the alignment file in RAxML Blackbox later. Push "RUN."  Wait for the program to align the sequences. When it is finished, you will see a listing of your sequences, now aligned so that columns represent homologous positions. Notice that they are not all the same length, so the alignment inserts gaps where something is missing or added (indels, positions where a base or bases were inserted or deleted in one or more sequences). The process used by the program to gap the sequences is complicated, but we would like it to minimize this as much as possible so as to create the most likely representation of the homologies among the sequences.

3. Look first at the “Result Summary.” The "Scores Table" tells you the percentage similarity of each pair of sequences (which may or may not be evolutionarily significant). If you sort by alignment score, you can see which sequences are most similar to each other, or to unknowns. Click on “Guide Tree” to see a simple NJ guide tree produced from the pairwise distance measures (a cladogram or phylogram which may or may not represent actual evolutionary relationships). Clustal uses this to construct the alignment in the sequence order defined by the tree. Keep in mind that this tree is unrooted and represents only the most simplistic relationships of the very limited number of specimens examined in this exercise. A more complicated analysis requires a different program and careful consideration of the sequences included in the analysis.

4. Save the alignment file to your computer and call it something you can remember later (e.g., “My bear alignment”). Right-hand click the link under “Alignments in PHYLIP format” and save to your computer.

5. Now produce a reduced version of your alignment by aligning only half (approximately) of each sequence below in ClustalW. Make sure you copy the front end of each sequence that includes the accession information. How does the alignment change (if at all)? How does the guide tree change?

6. What happens if you use only 10 bp sequences? Can you detect a significant loss of signal?

7. Now go to the RAxML Blackbox and upload your aligned sequences for ML analysis. Paste (or browse and choose) your named alignment file (remember, it must have been saved in phylip format). Indicate which sequence you will use as an outgroup to root the tree (“panda”, chosen because previous work has established that pandas are outside the bear clade), and choose “maximum likelihood”. Push ‘run’. Your analysis should not take long to run (less than a minute). You may have to check the indicated output page several times before it finishes. When finished, you will have 100 bootstrap trees, a consensus tree and the best-scoring ML tree. You can look at these trees directly or save them to files.

8. If you want to go further with this analysis, you can access a program at www.splitstree.org and produce preliminary visualizations of trees and networks from your aligned sequences. I will do this for the class so you can see how it works, but it is not required for this exercise.

9. To find out the possible identity of the fungal unknown, use the BLAST search tool at GenBank (see below). This tool searches nucleotide databases for sequences similar to yours. These presumably represent potential close relatives of the unknown (perhaps even the same species as the unknown, if sequences have been deposited previously).

To BLAST search the sequences, visit the BLAST page of the NCBI website where GenBank is located. BLAST (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. For a better understanding of BLAST you can refer to the BLAST Course which explains the basics of the BLAST algorithm, or to the NCBI BLAST tutorial. For our purposes, it will be sufficient to do the following:

  • In the Nucleotide box, click on "Nucleotide-nucleotide BLAST (blastn)." 
  • You now must load the sequence of your DNA sample, and the program will search for similar sequences. 
  • Copy ONLY ONE SEQUENCE AT A TIME from your list of unknowns (either in FASTA format or data only), and paste it into the window labeled "search". 
  • Chose the database to search ("others - nr"); the default database is the "human genomic plus transcript database."
  • Then click "BLAST"

 

10. For one of the unknowns, add the first five sequences to the unknown and produce an alignment of the sequences in Clustal. You must save the sequences you want to align in FASTA format. Produce a NJ guide tree to see how these sequences cluster together.

Questions:

1. Which mitochondrial gene is used for this alignment? By looking at the alignment, can you tell whether the sequences all cover the same portion of the gene? Are they all the same length? What can cause the sequences to be unequal in length (why must they be gapped in the alignment)?

2. A quick look at the “Scores Table” in Clustal gives you an idea of the pairwise relationships of the sequences. Based on these scores, can you tell whether the brown bear sequences are the most similar to each other? Are there situations where brown bears are more closely related to some other species than to other brown bears?

3. The brown bear sequences represent various populations distributed throughout Alaska. Based on the ML tree, these populations of brown bears are paraphyletic with regard to polar bears. Why is this? Of the various brown bear sequences, which one appears to be most closely related to the sequence representing modern polar bears? If you added sequences representing more distantly-related populations in Eurasia, do you think this will make brown bears monophyletic?

4. We used the giant panda sequence as an outgroup to ‘root’ the tree. What is an ‘outgroup’ and how is it chosen? What will happen if we change the outgroup?

5. What are the bootstrap values on the ML tree an indication of? What does ‘bootstrapping’ mean in phylogenetic analysis? Which groups are best supported? Which are least supported? What do you think could be done to improve the statistical support for our phylogenetic groups?

6. How does the Pleistocene polar bear sequence fit with present-day polar bear and brown bear sequences?

7. What would happen if you used unaligned sequences in the RAxML analysis? How would the support (bootstrap) values change? [In this particular case, there is little difference since the sequences are so similar to each other]

8. Based on your BLAST search, which unknown is a basidiomycete fungus? Which one is a bacterium (actually a cyanobacterium)?  Which is a plant? How similar are they to sequences retrieved from genbank (use scores)?

9. For the unknowns, different genes were sequenced. For which unknown was a mitochondrial gene (small subunit ribosomal) sequenced and used in the search? For which unknown was a bacterial gene (small subunit ribosomal) used? For which was a chloroplast gene used?

10. What information is gained about the unknowns by creating an alignment using additional sequences from GenBank?

 

Bear Sequence Data:

AY390359_A_melanoleuca_giant_panda

U18870_Ursus_arctos_GB01

U18886_Ursus_arctos_GB17

U18888_Ursus_arctos_GB19

U18878_Ursus_arctos_GB09

U18897_Ursus_arctos_GB28

EU567096_U_maritimus_polar_bear

AF007937_U_americanus_black_bear

U23554_Tremarctos_ornatus_speckled_bear

U23562_Melursus_ursinus_sloth_bear

U23558_S_thibetanus_Asian_black_bear

Helarctos_malayanus_sun_bear

Ursus_arctos_paleo_76824

>panda

ATGATCAACATCCGAAAAACTCATCCATTAGTTAAAATTATCAACAACTCATTCATTGACCTTCCAACAC

CATCAAACATTTCAACATGATGGAACTTTGGGTCTCTGTTAGGAGTGTGTCTGATCTTGCAAATCTTAAC

AGGCTTATTTCTAGCCATACACTATACATCAGATACAGCTACAGCCTTTTCATCAGTCGCACACATTTGT

CGAGACGTCAACTATGGTTGATTTATCCGATATATACATGCCAATGGGGCCTCTATATTTTTTATCTGCC

TATTTATACACGTAGGGCGAGGCTTATACTATGGATCATACCTATTTCCAGAGACATGGAATATCGGAAT

TATTCTCCTACTTACAGTTATAGCCACAGCATTCATAGGGTATGTACTACCTTGAGGACAAATATCCTTC

TGAGGAGCAACCGTCATTACTAACCTACTATCAGCAATTCCTTACATTGGCACTAATCTAGTGGAGTGAG

TCTGAGGGGGTTTCTCCGTAGATAAAGCAACACTAACCCGATTTTTTGCTTTTCACTTTATCCTTCCATT

TATCATCTCAGCACTAGCAATAGTCCATCTATTATTCCTTCACGAAACAGGATCTAATAACCCCTCCGGA

ATTCCATCTGACCCAGACAAAATCCCATTTTACCCCTATCATACAATTAAAGACATCCTAGGCGTCCTAT

TTCTTGTCCTCGCCTTAATAACCCTGGCTTTATTCTCACCAGACCTGTTAGGAGACCCTGATAACTATAC

CCCTGCAAATCCACTAAGTACCCCGCCACATATTAAGCCTGAATGGTACTTTCTATTTGCCTACGCTATC

CTGCGATCTATTCCTAATAAACTAGGAGGGGTGCTAGCTCTAATCTTCTCTATTCTAATTCTAACTATTA

TTCCACTATTACATACATCCAAACAACGAAGCATGATATTCCGACCTCTAAGTCAATGCTTATTCTGACT

CCTAGTAGCAGACCTACTCACACTAACATGAATTGGAGGACAGCCAGTAGAACACCCCTTCATTATTATT

GGGCAATTGGCCTCTATTCTCTACTTTACAATTCTTCTAGTACTTATACCTATCACTAGCATTATTGAGA

ATAGCCTCTCAAAATGAAGA

 

>BrownGB01

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC

CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTGTGTTTAATTCTACAGATTCTAAC

AGGCCTGTTTCTAGCCATACACTATACATCAGACACAACCACAGCTTTTTCATCAGTCACCCACATTTGC

CGAGACGTTCACTACGGGTGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATGTTCTTTATCTGCC

TATTCATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT

TATTCTCCTATTTACAGTTATAGCCACCGCATTTATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTC

TGAGGAGCAACTGTCATCACCAATCTACTATCGGCCGTTCCCTATATCGGAACGGACCTGGTAGAATGAA

TCTGAGGGGGCTTTTCCGTAGATAAGGCGACTCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT

CATCATCCTAGCACTAGCAGCAGTCCACCTATTATTCCTACACGAAACAGGATCCAACAACCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCATTCCACCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC

TTCTCACCCTAGCCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGACAACTATAT

CCCCGCAAATCCACTGAGCACCCCACCCCACATCAAACCCGAGTGGTACTTTCTATTTGCCTACGCTATC

CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCCTCA

TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGGCCCCTAAGCCAATGCCTATTTTGACT

TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACACCCCTTCATTATTAT

GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCCTAGTACTTATACCCACCGCTGGAATTATTGAAA

ACAACCTCTTAAAGTGGAGA

 

>BrownGB17

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC

CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTATGTTTAATTCTACAGATTCTAAC

AGGCCTGTTCCTAGCCATACACTATACACCAGACACAACCACAGCTTTTTCATCGGTCACCCACATTTGC

CGAGACGTTCACTACGGATGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATCTTCTTTATCTGCC

TATTTATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT

TATTCTCCTATTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGGGGCCAAATGTCCTTC

TGAGGAGCGACTGTCATCACCAATCTACTATCGGCCATTCCCTACATCGGAACGGACCTGGTAGAATGAA

TCTGAGGGGGCTTTTCCGTAGATAAGGCGACCCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT

CATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCTAACAACCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCATTCCATCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC

TTCTCGCCCTAACCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGATAACTATAC

CCCCGCAAATCCACTGAGCACTCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC

CTACGATCTATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCATCA

TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGACCCCTAAGCCAATGCCTATTTTGACT

TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTCATTATTATC

GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCTTAGTACTTATACCTATCGCTGGAATTATCGAAA

ACAACCTCTTAAAGTGGAGA

 

>BrownGB10

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC

CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTATGTTTAATTCTACAGATTCTAAC

AGGCCTGTTCCTAGCCATACACTATACACCAGACACAACCACAGCTTTTTCATCGGTCACCCACATTTGC

CGAGACGTTCACTACGGGTGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATCTTCTTTATCTGCC

TATTTATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCCCAGAAACATGAAACATTGGCAT

TATTCTCCTATTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGGGGCCAAATGTCCTTC

TGAGGAGCGACTGTCATCACCAACCTACTATCGGCCATTCCCTACATCGGAACGGACCTGGTAGAATGAA

TCTGAGGGGGCTTTTCCGTAGATAAGGCGACCCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT

CATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCTAACAACCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCATTCCATCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC

TTCTCGCCCTAACCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGATAACTATAC

CCCCGCAAATCCACTGAGCACTCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC

CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCATCA

TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGACCCCTAAGCCAATGCCTATTTTGACT

TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTCATTATTATC

GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCTTAGTACTTATACCTATCGCTGGAATTATCGAAA

ACAACCTCTTAAAGTGGAGA

 

>BrownGB09

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC

CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTATGTTTAATTCTACAGATTCTAAC

AGGCCTGTTCCTAGCCATACACTATACACCAGACACAACCACAGCTTTTTCATCGGTCACCCACATTTGC

CGAGACGTTCACTACGGATGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATCTTCTTTATCTGCC

TATTTATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT

TATTCTCCTATTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGGGGCCAAATGTCCTTC

TGAGGAGCGACTGTCATCACCAATCTACTATCGGCCATTCCCTACATCGGAACGGACCTGGTAGAATGAA

TCTGAGGGGGCTTTTCCGTAGATAAGGCGACCCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT

CATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCTAACAACCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCCTTCCATCCATACTATACAATTAAAGATATTCTAGGCGCCCTAC

TTCTCGCCCTAACCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGATAACTATAC

CCCCGCAAATCCACTGAGCACTCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC

CTACGATCTATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCATCA

TTCCCCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGACCCCTAAGCCAATGCCTATTTTGACT

TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTCATTATTATC

GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCCTAGTACTTATACCTATCGCTGGAATTATTGAAA

ACAACCTCTTAAAGTGGAGA

 

>BrownGB28

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC

CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTATGTTTAATTCTACAGATTCTAAC

AGGCCTGTTCCTAGCCATACACTATACACCAGACACAACCACAGCTTTTTCATCGGTCACCCACATTTGC

CGAGACGTTCACTACGGGTGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATCTTCTTTATCTGCC

TATTTATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT

TATTCTCCTATTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGGGGCCAAATGTCCTTC

TGAGGAGCGACTGTCATCACCAATCTACTATCGGCCATTCCCTACATCGGAACGGACCTGGTAGAATGAA

TCTGAGGGGGCTTTTCCGTAGATAAGGCGACCCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT

CATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCTAACAACCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCATTCCATCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC

TTCTCGCCCTAACCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGACAACTATAC

CCCCGCAAATCCACTGAGCACTCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC

CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCATCA

TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGACCCCTAAGCCAATGCCTATTCTGACT

TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTCATTATTATC

GGACAACTGGCCTCCATTCTCTACTTTACAATCCTCCTAGTACTTATACCCATCGCTGGAATTATCGAAA

ACAACCTCTTAAAGTGGAGA

 

>Polarbear

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGATCTTCCAACAC

CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTGTGTTTAATTCTACAGATTCTAAC

AGGCCTGTTTCTAGCCATACACTATACATCAGACACAACCACAGCTTTTTCATCAGTCACCCACATTTGC

CGAGACGTTCACTACGGGTGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATGTTCTTTATCTGCC

TATTCATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT

TATTCTCCTATTTACAGTTATAGCCACCGCATTTATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTC

TGAGGAGCGACTGTCATCACCAATCTACTATCGGCCATTCCCTATATCGGAACGGACCTGGTAGAATGAA

TCTGAGGGGGCTTTTCCGTAGATAAGGCGACTCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT

CATCATCCTAGCACTAGCAGCAGTCCACCTATTGTTCCTACACGAAACAGGATCCAACAACCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCATTCCATCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC

TTCTCACCCTAGCCCTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGATAACTATAT

CCCCGCAAATCCACTAAGCACCCCACCCCACATCAAACCCGAGTGGTACTTTCTATTTGCCTACGCTATC

CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCCTCA

TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGGCCCCTAAGCCAATGCCTATTTTGACT

TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACACCCCTTCATTATTATC

GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCCTAGTACTCATACCCATCGCTGGAATTATTGAAA

ACAACCTCTTAAAGTGGAGA

 

>Blackbear

GAAACTTCGGATCCCTCCTCGGAGTATGTTTAGTACTACAAATTCTAACGGGCCTATTTCTAGCCATACA

CTACACATCAGATACAACTACAGCCTTTTCATCAATCACCCATATTTGCCGAGATGTTCACTACGGATGA

ATTATCCGATACATACATGCTAACGGAGCTTCCATGTTCTTTATCTGCCTGTTCATGCACGTAGGACGGG

GTCTGTACTATGGCTCATACCTACTCTCAGAAACATGAAACATTGGCATTATCCTCCTATTTACAGTTAT

AGCCACCGCATTCATAGGATATGTCCTGCCCTGAGGCCAAATATCCTTCTGAGGAGCAACTGTTATCACC

AACCTCCTATCAGCCATCCCCTATATTGGAACAGACCTAGTAGAATGGATCTGAGGGGGCTTTTCTGTGA

ATAAGGCAACTCTGACACGATTCTTTGCCTTCCACTTTATTCTTCCATTCATCATCTTGACACTAGCAGC

AGTCCACCTATTATTCCTACACGAAACAGGATCTAATAACCCCTCTGGAATCCCATCTGACTCAGACAAA

ATCCCATTTCATCCATATTATACAATTAAAGACGCCCTAGGCGCCCTACTTTTCATCCTAGCCCTAGCAA

CTCTAGTCCTATTCTCGCCTGACCTACTAGGAGATCCCGATAACTACACCCCCGCAAACCCACTGAGCAC

CCCACCCCACATCAAACCT

 

>Speckled

ATGACCAACATCCGAAAAACTCACCCACTAGCTAAAATCATCAACAGCTCATTCATTGACCTCCCAACAC

CATCAAATATCTCAGCGTGATGAAACTTCGGGTCCCTTCTTGGGGTGTGCCTGATCCTACACATCCTAAC

GGGCCTATTCCTGGCCATACACTATACAGCAGACACGACTACAGCCTTCTCATCAGTCGCCCATATCTGT

CGAGACGTTAACTACGGATGAGTTATCCGATACATACACGCGAACGGAGCTTCAATATTCTTTATCTGCT

TGTTCATACACGTGGGACGGGGTCTGTATTACGGCTCATACCTATTCTCAGAAACATGAAACATTGGAAT

TATTCTCCTACTCACAATTATAGCCACAGCATTCATGGGGTACGTGCTGCCCTGAGGCCAAATATCCTTT

TGAGGAGCAACCGTCATCACCAATCTGCTATCAGCTATCCCCTACATTGGAACCGACCTAGTAGAATGAA

TCTGAGGTGGATTCTCAGTAGATAAAGCAACCCTTACCCGATTTTTCGCTTTTCACTTTATCCTTCCATT

CATTATTTTAGCACTAGCCATAGTCCACCTATTATTTCTTCACGAAACAGGATCCAACAATCCCTCTGGA

ATCTCATCGAACTCAGACAAAATCCCATTTCACCCTTACTATACAATTAAAGATATTCTAGGCGTCTTAC

TTCTTCTCCTAGCCCTGGTAACCCTAGTCCTATTCTCACCCGACTTACTAGGAGACCCCGACAACTACAC

CCCTGCAAACCCAGTGAGCACCCCACTACATATCAAGCCTGAATGGTACTTCTTATTTGCCTACGCCATT

CTACGATCTATTCCCAATAAATTGGGAGGAGTACTGGCCCTAATCTTCTCCATTCTAATCCTAGCTATCA

TTCCTCTGCTGCACACATCCAAACAACGAGGAATGATATTCCGACCTTTAAGCCAATGCCTTTTCTGGCT

TCTAGCAGCAGACTTACTAACACTAACATGAATCGGAGGACAACCAGTGGAACATCCTCTTGTTATCATC

GGACAGCTAGCCTCTATCCTCTACTTCACAATCCTCCTAGTACTTATACCCATCGCCGGAATCATTGAAA

ATAACCTCTCAAAGTGAAGA

 

>Sloth

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATTAACAACTCACTCATTGACCTCCCAGCAC

CGTCAAACATCTCAGCATGATGAAACTTCGGATCCCTCCTCGGAGTGTGCTTAATTCTACAAATTCTAAC

AGGCCTATTTCTAGCCATGCACTATACATCAGACACAACCACAGCCTTTTCATCAGTCACCCATATCTGT

CGAGACGTCCACTACGGATGAATCATCCGATATATACATGCAAACGGGGCCTCCATATTCTTTATCTGCC

TATTCATGCACGTAGGACGGGGTCTGTACTATGGCTCATACCTATTCTCGGAGACATGAAACACCGGCAT

TATTCTCCTATTTACAGTCATAGCCACCGCATTCATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTC

TGAGGAGCAACTGTCATCACCAATCTGCTATCGGCCATTCCCTATATTGGAGCGGACCTAGTAGAATGAA

TCTGAGGGGGGTTTTCCGTAGACAAGGCGACTCTAACACGATTCTTTGCCTTCCACTTTATCTTTCCATT

TATCATCCTAGCACTGGTAATAGTCCACCTATTGTTCCTACATGAAACAGGATCTAACAACCCCTCTGGA

ATCCCATCCAACTCAGACAAAATCCCATTTCACCCATATTATACAATTAAAGATATTATAGGCGCCTTAC

TTCTCATCCTAGCCCTGGCAACCCTAGTCCTATTCTCACCCGACTTACTAGGAGACCCCGACAACTACAC

CCCTGCAAACCCACTGAGCACCCCACCCCACATCAAACCCGAGTGGTACTTTCTATTTGCCTACGCTATC

CTACGATCCATCCCCAATAAACTAGGAGGGGTACTAGCACTAATTTTCTCCATCCTAATCCTAGCTATCA

TTCCCCTTCTACACACATCCAAACAACGAGGAATGATATTCCGGCCCCTAAGCCAATGCCTATTTTGACT

CCTAGTAGCAGACCTACTAACACTTACATGAATCGGAGGACAACCAGTAGAATATCCCTTCATCACTATT

GGACAACTAGCCTCCATCCTCTACTTCATAATCCTCCTAGTACTCATGCCCATCGCCGGAATCATTGAAA

ATAATCTCTCAAAGTGAAGA

 

>Asianblack

ATGACCAACATCCGAAAAACCCATCCATTAGCCAAAATCATCAACAACTCACTCATTGATCTCCCAGCAC

CATCAAATATCTCAGCATGATGAAACTTTGGATCCCTCCTCGGAATATGCCTAATCCTACAGATTCTGAC

AGGCCTATTTCTAGCTATACACTACACATCAGACGCGACTACAGCCTTTTCATCAGTCGCCCATATTTGC

CGAGACGTCCATTACGGATGAATTATCCGATACATACATGCAAACGGAGCCTCCATGTTCTTCATCTGCC

TATTCATACACGTAGGACGGGGCTTGTACTATGGCTCATACCTACTCTCAGAAACATGAAACATTGGCAT

CATCCTCCTATTTACAGTTATAGCCACCGCATTCATAGGATATGTCCTACCCTGAGGCCAAATATCTTTC

TGAGGAGCGACTGTCATTACCAACCTCCTATCAGCCATTCCCTATATTGGAACGGACCTAGTAGAGTGAA

TCTGAGGGGGCTTTTCCGTAGATAAAGCAACCCTAACACGATTCTTTGCTTTCCACTTTATCCTTCCATT

TATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCCAACAACCCCTCTGGA

ATCCCATCCGACTCGGACAAAATCCCATTCCACCCATACTATACAATTAAGGACGCCCTAGGCGCCCTAC

TTCTCATTCTAGCCCTAGCAACTCTAGTTCTATTCTCGCCCGACTTACTGGGAGACCCTGACAACTATAC

CCCCGCAAACCCACTGAGCACCCCGCCCCACATCAAGCCCGAGTGATACTTTTTATTTGCTTACGCCATC

TTACGATCCATCCCCAACAAACTAGGAGGAGTACTAGCGCTAATCTTCTCTATCCTAATCCTAGCCATTA

TCCCCCTTCTACACACATCCAAACAACGAGGAATAATGTTCCGACCCCTAAGCCAATGCCTATTTTGACT

CCTAGTAGCAGACCTACTAACACTAACATGAATCGGAGGACAACCAGTAGAACATCCCTTCATCATTATC

GGACAGCTAGCCTCCATCCTCTACTTCACAATCCTCCTGGTGCTCATGCCCATCGCTGGAATCATTGAAA

ACAATCTCTCAAAGTGAAGA

 

>Sun

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATTAACAACTCACTTATTGACCTCCCAGCAC

CATCAAACATCTCGGCGTGATGAAACTTCGGATCCCTCCTCGGAGTATGCTTAATCCTACAGATTATGAC

AGGCCTATTTCTAGCCATACACTATACATCAGACACAACCACAGCCTTTTCATCAATCACTCATATCTGC

CGAGACGTTCACTACGGATGAATTATCCGATATATACATGCAAACGGAGCCTCCATGTTCTTTATCTGCC

TATTCATGCACGTAGGACGGGGTCTGTACTATGGCTCGTACCTATTCTCAGAAACATGAAACATCGGTAT

TATCCTCCTATTTACAGTTATAGCCACCGCATTTATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTC

TGAGGAGCAACTGTCATTACCAATCTCTTATCAGCCATCCCCTATATTGGAACGGACCTAGTAGAATGAG

TCTGAGGAGGCTTTTCCGTAGACAAGGCGACTCTAACACGATTCTTTGCCTTCCACTTTATCCTTCCGTT

CATCATCTTGGCACTAACAGCGGTCCACCTATTATTCCTACACGAAACAGGGTCCAACAATCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCATTTCACCCGTACTATACAATTAAGGACATCCTAGGCGCCCTAC

TTCTTACCCTAGCCCTAACAACCCTAGTTCTATTCTCGCCCGACTTACTAGGAGACCCTGACAACTACAT

CCCCGCAAATCCATTGAGCACCCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC

CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCTCTAGTCTTCTCTATCCTAATCCTAGCCATTA

TCCCCCTCTTACACACATCCAAGCAACGAGGAATGATATTCCGACCTCTGAGCCAATGCCTATTTTGACT

CCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTTACCATTATC

GGACAACTAGCCTCCATTCTCTATTTCATAATCTTCCTAGTATTCATACCCATCGCTGGAATTATTGAAA

ATAACCTCTCAAAATGAAGA

 

>Paleopolar

tgaccaacatccgaaaaacccacccattagctaaaatcatcaacaactcatttattgaccttcc

aacaccatcaaacatctcagcatgatgaaactttggatccctccttggagtatgtttaat

tctacagattctaacaggcctgttcctagccatacactatacaccagacacaaccacagc

tttttcatcggtcacccacatttgccgagacgttcactacgggtgagttatccgatatgt

acatgcaaatggagcctccatcttctttatctgcctatttatgcacgtaggacggggcct

gtactatggctcatacctattcccagaaacatgaaacattggcattattctcctatttac

aattatagccaccgcatttataggatacgtcctaccctggggccaaatgtccttctgagg

agcgactgtcatcaccaacctactatcggccattccctacatcggaacggacctggtaga

atgaatctgagggggcttttccgtagataaggcgaccctaacacgattctttgctttcca

ctttattctcccgttcatcatcctagcactagcagcagtccatctattgttcctacacga

aacaggatctaacaacccctctggaatcccatctgactcagacaaaatcccattccatcc

atactatacaattaaggatattctaggcgccctacttctcgccctaaccttagcaaccct

agtcctattctcgcccgacttactaggagaccctgataactatacccccgcaaatccact

gagcactccaccccacatcaaacccgaatggtactttctatttgcctacgctatcctacg

atccatccctaataaactaggaggagtactagcactaattttctccattctaatcctagc

catcattcctcttctacacacgtccaaacaacgaggaatgatattccgacccctaagcca

atgcctattttgacttctagtagcagacctactaacactaacatgaattggaggacaacc

agtagaacatcccttcattattatcggacaactagcctccattctctactttacaatcct

cttagtacttatacctatcgctggaattatcgaaaacaacctcttaaagtggagagtctt

tgtag

 

 

 

Unknown sequences:

 

>unknown_1

ACCCGCTGAAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACTAACAA

GGATTCCCCTAGTAACTGCGAGTGAAGCGGGAGAAGCTCAAATTTAGAAT

CTGGCGGTCCCCGCGGCCGTCCGAGTTGTAATCTAGAGAAGCGTCATCCG

CGCCAGCCCGCGTACAAGTCTCTTGGAATAGAGCGTCGCAGAGGGTGAGA

ATCCCGTCTCTGACGCGGACCGCTGGCGCGTTGCGATACGTTCTCGATGA

GTCGAGTTGTTTGGGAATGCAGCTCAAAATGGGTGGTAAATTCCATCTAA

AGCTAAATATTGGCGAGAGACCGATAGCGAACAAGTACCGTGAGGGAAAG

ATGAAAAGAACTTTGGAAAGAGAGTTAAACAGTACGTGAAATTGTTGAAA

GGGAAACGCTTGAAGTCAGTCGCGTACGCTGGGAATCAGCCTTCTCTCGA

GGCGGCGCACTTCCCAGCGAACGGGTCGGCATCAATTTCATCTGCCGGAG

AATGGCGGGGGGAATGTGGCATCTCTTCGGATGTGTTATAGCCTCCCGTC

GCATGCTGCAGATGGGATTGAGGATCTCAGCACGCCGCAAGGCCGGGGCT

CGCCCACGTACGTGCTTAGGATGCCGGCATAATGGCTTTAATCGACCCGT

CTTGAAACACGGACCAAGGAGTCTAACATGCTCGCGAGTGTTTGGGTGTC

AAACCCGAGCGCGCAACGAAAGTGAAAGTTGAGATCTCTGTCGTGGAGAG

CATCGACGCCCAGACCAGACCTTCTGCGACGGATCTGCGGTCGAGCGCGT

ATGTTGGGACCCGAAAGATGGTGAACTATGCCTGAATAGGGCGAAGCCAG

AGGAAACTCTGGTGGAGGCTCGTAGCGATTCTGACGTGCAAATCGATCGT

CGAATTTGGGTATAGGGGCGAAAGACTAATCGAACCATCTAGTAGCTGGT

TCCTGCCGAAGTTTCCCTCAGGATAGCAGAAACTCGTATCAGATTTATGT

GGTAAAGCGAATGATTAGAGGCCTTGGGGTTGAAACAACCTTAACCTATT

CTCAAACTTTAAATATGTAAGAACGGGGGGTCTCTTGATTGGACCTCCCG

GCGATTGAGAGTTTCTAGTGGGCCATTTTTGGTAAGCAGAACTGGCGATG

CGGGATGAACCGAACGCGAGGTTAAGGTGCCGGAATTCACGCTCATCAGA

CACCACAAAAGGTGTTAGTTCATCTAGACAGCAGGACGGTGGCCATGGAA

GTCGGAATCCGCTAAGGAGTGTGTAACAACTCACCTGCCGAATGAACTAG

CCCTGAAAATGGATGGCGCTTAAGCGTGATACCCATACCTCGCCGTCGGC

GTTCGAGTGACGCGCCGACGAGTAGGCAGGCGTGGAGGTCCGTGAAGAAG CCTCGGCAGCGATG

 

>unknown_2

CGATAAATGGCTCATTGGGATAGATATAAATGAACAATACCCCCCCTAGAAACGTATAAGAGAGGTTTTC

TCCTCATACGGCTCGCGAAAAAACGATTCGAAATTATTATGTATCGAATTAGAATGTCAAATATCAAATA

GATATATAAAATCATCAAATCAATTTCCAGAGATTTAAGTCCTTCTTTTTTTTCTTCTTTTTCGAAAAAG

AAGAAAAAAAGAGCATTCGTACTCTCATAACTCAAGTTGGATAACTTTCAAATAGCCTATAAAGAACAGC

CTTAGGCATTTATTTCATTTTTTGAGCGGTCTCTAACCCCTTTGTTGTTTGTCTCCTTTCGAATCCATTT

TTGGAGTCTCGATTCTGATCTAATTATTGAGACAATTGAAAACGGTATTTCCTTGTTCCAGGATCCTTTA

TCTTTGCCTTGAATCATTGGGTTTAGACATTACTTCGGTGATCTTTAATCGTTTTCAAAAATGGCAGCAA

CAAACCTCTTTTTGTGATTTCTTTCTATGAAAGAATCATACGAACAATTGATTCCTGCATGATACACTTT

TGATCGAAAGAGTTTTACCAATTCAAAAAGATTTTCCTTTTGCATTGAAAAATTGTTCGAATCGGATCCT

TTCGATTTCGATATCAAAAATATACTTACGAAGTTTGTTCCAACGTATTGATTGGTATTAACCCTAGACC

CTTGCCCCTGAGAAATGAATAAATACTTTCTACTCGAGCTCCATCAAGTACTATTTACATTACAACCCAA

CAAAAAACGAGGGTTGTAGTAGAACCGAACAAAGGATGTCGAGCCAAGAGCCCATTCATTCCTAGATAAT

ATAAAATAGAAAATGGTGGATGGAAAAAAAATCCACAGCTGATCATGTCCTTCAA

 

>unknown_3

GAAAGCCTGACGGAGCAATACCGCGTGAGGGAGGAAGGCTCTTGGGTTGTAAACCCTCTTTTCTCAGGGA

AGAATCAATGAAGGTACTTGAGGAATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGG

ATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCCGCAGGTGGTTTTTCAAGTCTCCTGTCAAAGC

STCGGGCTCAACTCGAAAARGGCAGGGGAAACTGANAGACTAGAGTAAGGTAGGGGTAGAGGRAATTCCN

GGTGTAGCGGTGAAATGCGTAAAGATCAGGAAAAACACCGGTGGCGAAAGCNCTCTGCTGGACTATTACT

GACACTGAGGGACRAAAGCTAGGGGAGCGAATGGGATTAAATACCCCAGGTAGTCA

 

Literature:

Huson, D.H. and D. Bryant. 2006. Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution 23: 254-267.

Lindqvist, C. et al. (12 additional authors). 2010. Complete mitochondrial genome of a Pleistocene jawbone unveils the origin of polar bear. PNAS 107: 5053-5057.

Stamatakis, A., P. Hoover, J. Rougemont. 2008. A rapid bootstrap algorithm for the RAxML webservers. Systematic Biology 75: 758-771.

Talbot, S. L. and G. F. Shields. 1996a. A phylogeny of the bears (Ursidae) inferred from complete sequences of three mitochondrial genes. Molecular Phylogenetics and Evolution 5: 567-575.

Talbot, S. L. and G. F. Shields. 1996b. Phylogeography of brown bears (Ursus arctos) of Alaska and paraphyly within the Ursidae. Molecular Phylogenetics and Evolution 5: 477-494.