BIOL 471

Multiple sequence alignment exercise

Nucleotide and amino acid sequences are commonly used to reconstruct the phylogenetic history of organisms. Sequences obtained by researchers are deposited in various online databases and are freely available for use by others. The analysis of sequence data frequently begins by obtaining an alignment, in which multiple sequences are arranged to show positional homology. Since sequences vary in length because of mutations (insertions or deletions--indels, point mutations, translocations, etc.), gaps are added to sequences to maintain positional homology. However, positional homology is not known with certainty; it must be inferred using an analytical approach. In this exercise, you will use an online program that aligns sequences by matching similar portions of sequences as closely as possible, but with the fewest gaps possible. You will use the alignment to create a preliminary tree representing the phylogeny of a group of organisms.

Nucleotide sequence data can also be obtained from unrecognized or undescribed organisms to provide information about their identity or their proper placement among groups of organisms whose identity is already known. Analysis typically includes alignment of sequence information obtained for the same gene region in several organisms to see how homologous sites in the sequences compare. If the sequences are not properly aligned, valid comparisons cannot be made and identifications will be incorrect. In this exercise, you will also attempt to identify an unknown organism based on preliminary phylogenetic information obtained using related sequences obtained from online databases.

The most commonly used program to align sequences is ClustalW (or ClustalX, a similar text-based program), which is freely available online as source code or an executable program. Clustal aligns input sequences in pairs to produce a distance matrix, which is then used to produce a simple unrooted NJ guide tree. Using this tree, Clustal gradually builds an alignment by following the branching order on the tree. So the most terminal groups are aligned first, then treated as one (gaps are kept in place) and aligned with the next most closely related sequences. This continues until all sequences are aligned. The process is fast and can be used to align hundreds of input sequences.

Aligning existing sequences and creating a preliminary phylogeny:

Below are sequences of a gene amplified from the mitochondrial genome of representatives of the Ursidae (bear family), the relationships of which were studied in a series of papers by Talbot and Shields (1996a,b), and discussed in class. You will use these to create an initial multiple sequence alignment for analysis later in class. Based on simple tests of relationships you should be able to answer the questions below. Since these are published sequences, you can access information about them online at the NCBI (National Center for Biotechnology Information) website using the access numbers (GenBank access numbers in this case will start with letters such as U***** DQ****** or EF******); the nucleotide database also provides information about the collection localities, depositors, literature, etc.

Once you have a preliminary alignment, you can produce a phylogeny using available online freeware. One such program is SplitsTree, available from www.splitstree.org. This is a free program, but you must register a site license. It will take an alignment formatted in Clustal (which you will have), or one of many other common formats (NEXUS, PHYLIP, FASTA) and create trees or networks using a variety of analytical approaches (background discussed by the developers in Huson and Bryant, 2006). You will not be required to use this program for this exercise, but I will briefly introduce its use in class.

Identifying an unknown from sequence information:

You will also be given sequences obtained from an unknown organism (a fungus!), and you will be asked to find out as much as you can about this unknown. These sequences were obtained with fungal primers designed to amplify the large subunit (LSU) ribosomal RNA gene, the internal transcribed spacer (ITS) ribosomal gene region, and the second largest subunit of RNA polymerase II (RPB2). These have not yet been published.

To do these exercises, you must visit GenBank (to access deposited sequences) and also the European Bioinformatics Institute nucleotide database EMBL-EBI website that provides access to ClustalW, which will enable you to get an initial multiple alignment of your sequences.

Follow the procedure outlined below and answer the questions at the end of the exercise.

Procedure:

1. Visit GenBank and get information about the bear sequences (using either the Latin names or the accession numbers). Here is where you can find out who deposited them, what gene(s) were sequenced, if they were published and where, etc.

2. Copy and paste the sequence information below (it is already formatted in FASTA format, which can be recognized by ClustalW) into the ClustalW submission form window. Use default settings (you don't need to change anything) on the form and push "run."  You will be given a multiple alignment of the sequences in two windows, one (push the "JalView" button) of which is in color and requires Java, and the other given below the alignment scores. Notice that the sequences are not all the same length, so the alignment leaves gaps where something is missing or added (indels, positions where a base or bases were inserted or deleted in one or more sequences). The process of gapping the sequences is complicated, but we would like the program to minimize this as much as possible so as to create the most likely representation of the homologies among the sequences.

3. Look first at the "Scores Table."  This tells you the percentage similarity of each pair of sequences (which may or may not be evolutionarily significant). If you sort by alignment score, you can see which sequences are most similar to each other, or to unknowns. At the end of the results is a simple NJ guide tree produced from the pairwise distance measures (a cladogram or phylogram which may or may not represent actual evolutionary relationships). Clustal uses this to construct the alignment in the sequence order defined by the tree. Keep in mind that this tree is unrooted and represents only the most simplistic relationships of the very limited number of specimens examined in this exercise. A more complicated analysis requires a different program and careful consideration of the sequences included in the analysis.

4. Now produce a reduced version of your alignment by aligning only half (approximately) of each sequence below in ClustalW. Make sure you copy the front end of each sequence that includes the accession information. How does the alignment change (if at all)? How does the guide tree change?

5. What happens if you use only 10 bp sequences? Is there significant loss of signal?

6. If you want to go further with this analysis, you can access a program at www.splitstree.org and produce preliminary visualizations of trees and networks from your aligned sequences. I will do this for the class so you can see how it works, but it is not required for this exercise.

7. To find out the possible identity of the fungal unknown, use the BLAST search tool at GenBank (see below). This tool searches nucleotide databases for sequences similar to yours. These presumably represent potential close relatives of the unknown (perhaps even the same species as the unknown, if sequences have been deposited previously).

To BLAST search the sequences, visit the BLAST page of the NCBI website where GenBank is located. BLAST (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. For a better understanding of BLAST you can refer to the BLAST Course which explains the basics of the BLAST algorithm, or to the NCBI BLAST tutorial. For our purposes, it will be sufficient to do the following:

  • In the Nucleotide box, click on "Nucleotide-nucleotide BLAST (blastn)." 
  • You now must load the sequence of your DNA sample, and the program will search for similar sequences. 
  • Copy ONLY ONE SEQUENCE AT A TIME from your list of unknowns (either in FASTA format or data only), and paste it into the window labeled "search". 
  • Chose the database to search ("others - nr"); the default database is the "human genomic plus transcript database."
  • Then click "BLAST"

 

8. For the fungal LSU search only, add the first five sequences to the unknown and produce an alignment of the sequences in Clustal. Produce a NJ guide tree to see how these sequences cluster together.

 

Questions:

1. Which mitochondrial gene is used for this alignment? By looking at the alignment, can you tell whether the sequences all cover the same portion of the gene? Are they all the same length? What can cause the sequences to be unequal in length (why must they be gapped in the alignment)?

 

2. A quick look at the “Scores Table” in Clustal gives you an idea of the pairwise relationships of the sequences. Based on these scores, can you tell whether the brown bear sequences are the most similar to each other? Are there situations where brown bears are more closely related to some other species than to other brown bears?

 

3. The brown bear sequences represent various populations distributed throughout Alaska. Based on the guide tree, these populations of brown bears are paraphyletic with regard to polar bears. Why is this? Of the various brown bear sequences, which one appears to be most closely related to the sequence representing polar bears? If you added sequences representing more distantly-related populations in Eurasia, do you think this will make brown bears monophyletic?

 

4. Based on your BLAST search, what is the likely identity of the unknown (genus? species?)? Do you get different results when you use different genes to do the search? Why is this?

 

5. What information about the unknown is gained by creating an alignment using additional sequences from GenBank?

 

Bear Sequence Data:

>AY390359_A_melanoleuca_giant_panda

ATGATCAACATCCGAAAAACTCATCCATTAGTTAAAATTATCAACAACTCATTCATTGACCTTCCAACAC

CATCAAACATTTCAACATGATGGAACTTTGGGTCTCTGTTAGGAGTGTGTCTGATCTTGCAAATCTTAAC

AGGCTTATTTCTAGCCATACACTATACATCAGATACAGCTACAGCCTTTTCATCAGTCGCACACATTTGT

CGAGACGTCAACTATGGTTGATTTATCCGATATATACATGCCAATGGGGCCTCTATATTTTTTATCTGCC

TATTTATACACGTAGGGCGAGGCTTATACTATGGATCATACCTATTTCCAGAGACATGGAATATCGGAAT

TATTCTCCTACTTACAGTTATAGCCACAGCATTCATAGGGTATGTACTACCTTGAGGACAAATATCCTTC

TGAGGAGCAACCGTCATTACTAACCTACTATCAGCAATTCCTTACATTGGCACTAATCTAGTGGAGTGAG

TCTGAGGGGGTTTCTCCGTAGATAAAGCAACACTAACCCGATTTTTTGCTTTTCACTTTATCCTTCCATT

TATCATCTCAGCACTAGCAATAGTCCATCTATTATTCCTTCACGAAACAGGATCTAATAACCCCTCCGGA

ATTCCATCTGACCCAGACAAAATCCCATTTTACCCCTATCATACAATTAAAGACATCCTAGGCGTCCTAT

TTCTTGTCCTCGCCTTAATAACCCTGGCTTTATTCTCACCAGACCTGTTAGGAGACCCTGATAACTATAC

CCCTGCAAATCCACTAAGTACCCCGCCACATATTAAGCCTGAATGGTACTTTCTATTTGCCTACGCTATC

CTGCGATCTATTCCTAATAAACTAGGAGGGGTGCTAGCTCTAATCTTCTCTATTCTAATTCTAACTATTA

TTCCACTATTACATACATCCAAACAACGAAGCATGATATTCCGACCTCTAAGTCAATGCTTATTCTGACT

CCTAGTAGCAGACCTACTCACACTAACATGAATTGGAGGACAGCCAGTAGAACACCCCTTCATTATTATT

GGGCAATTGGCCTCTATTCTCTACTTTACAATTCTTCTAGTACTTATACCTATCACTAGCATTATTGAGA

ATAGCCTCTCAAAATGAAGA

 

>U18870_Ursus_arctos_GB01

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC

CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTGTGTTTAATTCTACAGATTCTAAC

AGGCCTGTTTCTAGCCATACACTATACATCAGACACAACCACAGCTTTTTCATCAGTCACCCACATTTGC

CGAGACGTTCACTACGGGTGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATGTTCTTTATCTGCC

TATTCATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT

TATTCTCCTATTTACAGTTATAGCCACCGCATTTATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTC

TGAGGAGCAACTGTCATCACCAATCTACTATCGGCCGTTCCCTATATCGGAACGGACCTGGTAGAATGAA

TCTGAGGGGGCTTTTCCGTAGATAAGGCGACTCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT

CATCATCCTAGCACTAGCAGCAGTCCACCTATTATTCCTACACGAAACAGGATCCAACAACCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCATTCCACCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC

TTCTCACCCTAGCCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGACAACTATAT

CCCCGCAAATCCACTGAGCACCCCACCCCACATCAAACCCGAGTGGTACTTTCTATTTGCCTACGCTATC

CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCCTCA

TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGGCCCCTAAGCCAATGCCTATTTTGACT

TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACACCCCTTCATTATTAT

GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCCTAGTACTTATACCCACCGCTGGAATTATTGAAA

ACAACCTCTTAAAGTGGAGA

 

>U18886_Ursus_arctos_GB17

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC

CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTATGTTTAATTCTACAGATTCTAAC

AGGCCTGTTCCTAGCCATACACTATACACCAGACACAACCACAGCTTTTTCATCGGTCACCCACATTTGC

CGAGACGTTCACTACGGATGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATCTTCTTTATCTGCC

TATTTATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT

TATTCTCCTATTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGGGGCCAAATGTCCTTC

TGAGGAGCGACTGTCATCACCAATCTACTATCGGCCATTCCCTACATCGGAACGGACCTGGTAGAATGAA

TCTGAGGGGGCTTTTCCGTAGATAAGGCGACCCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT

CATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCTAACAACCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCATTCCATCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC

TTCTCGCCCTAACCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGATAACTATAC

CCCCGCAAATCCACTGAGCACTCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC

CTACGATCTATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCATCA

TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGACCCCTAAGCCAATGCCTATTTTGACT

TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTCATTATTATC

GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCTTAGTACTTATACCTATCGCTGGAATTATCGAAA

ACAACCTCTTAAAGTGGAGA

 

>U18888_Ursus_arctos_GB19

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC

CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTATGTTTAATTCTACAGATTCTAAC

AGGCCTGTTCCTAGCCATACACTATACACCAGACACAACCACAGCTTTTTCATCGGTCACCCACATTTGC

CGAGACGTTCACTACGGGTGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATCTTCTTTATCTGCC

TATTTATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCCCAGAAACATGAAACATTGGCAT

TATTCTCCTATTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGGGGCCAAATGTCCTTC

TGAGGAGCGACTGTCATCACCAACCTACTATCGGCCATTCCCTACATCGGAACGGACCTGGTAGAATGAA

TCTGAGGGGGCTTTTCCGTAGATAAGGCGACCCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT

CATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCTAACAACCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCATTCCATCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC

TTCTCGCCCTAACCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGATAACTATAC

CCCCGCAAATCCACTGAGCACTCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC

CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCATCA

TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGACCCCTAAGCCAATGCCTATTTTGACT

TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTCATTATTATC

GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCTTAGTACTTATACCTATCGCTGGAATTATCGAAA

ACAACCTCTTAAAGTGGAGA

 

>U18878_Ursus_arctos_GB09

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC

CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTATGTTTAATTCTACAGATTCTAAC

AGGCCTGTTCCTAGCCATACACTATACACCAGACACAACCACAGCTTTTTCATCGGTCACCCACATTTGC

CGAGACGTTCACTACGGATGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATCTTCTTTATCTGCC

TATTTATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT

TATTCTCCTATTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGGGGCCAAATGTCCTTC

TGAGGAGCGACTGTCATCACCAATCTACTATCGGCCATTCCCTACATCGGAACGGACCTGGTAGAATGAA

TCTGAGGGGGCTTTTCCGTAGATAAGGCGACCCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT

CATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCTAACAACCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCCTTCCATCCATACTATACAATTAAAGATATTCTAGGCGCCCTAC

TTCTCGCCCTAACCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGATAACTATAC

CCCCGCAAATCCACTGAGCACTCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC

CTACGATCTATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCATCA

TTCCCCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGACCCCTAAGCCAATGCCTATTTTGACT

TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTCATTATTATC

GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCCTAGTACTTATACCTATCGCTGGAATTATTGAAA

ACAACCTCTTAAAGTGGAGA

 

>U18897_Ursus_arctos_GB28

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGACCTTCCAACAC

CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTATGTTTAATTCTACAGATTCTAAC

AGGCCTGTTCCTAGCCATACACTATACACCAGACACAACCACAGCTTTTTCATCGGTCACCCACATTTGC

CGAGACGTTCACTACGGGTGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATCTTCTTTATCTGCC

TATTTATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT

TATTCTCCTATTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGGGGCCAAATGTCCTTC

TGAGGAGCGACTGTCATCACCAATCTACTATCGGCCATTCCCTACATCGGAACGGACCTGGTAGAATGAA

TCTGAGGGGGCTTTTCCGTAGATAAGGCGACCCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT

CATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCTAACAACCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCATTCCATCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC

TTCTCGCCCTAACCTTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGACAACTATAC

CCCCGCAAATCCACTGAGCACTCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC

CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCATCA

TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGACCCCTAAGCCAATGCCTATTCTGACT

TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTCATTATTATC

GGACAACTGGCCTCCATTCTCTACTTTACAATCCTCCTAGTACTTATACCCATCGCTGGAATTATCGAAA

ACAACCTCTTAAAGTGGAGA

 

>EU567096_U_maritimus_polar_bear

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATCAACAACTCATTTATTGATCTTCCAACAC

CATCAAACATCTCAGCATGATGAAACTTTGGATCCCTCCTTGGAGTGTGTTTAATTCTACAGATTCTAAC

AGGCCTGTTTCTAGCCATACACTATACATCAGACACAACCACAGCTTTTTCATCAGTCACCCACATTTGC

CGAGACGTTCACTACGGGTGAGTTATCCGATATGTACATGCAAATGGAGCCTCCATGTTCTTTATCTGCC

TATTCATGCACGTAGGACGGGGCCTGTACTATGGCTCATACCTATTCTCAGAAACATGAAACATTGGCAT

TATTCTCCTATTTACAGTTATAGCCACCGCATTTATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTC

TGAGGAGCGACTGTCATCACCAATCTACTATCGGCCATTCCCTATATCGGAACGGACCTGGTAGAATGAA

TCTGAGGGGGCTTTTCCGTAGATAAGGCGACTCTAACACGATTCTTTGCTTTCCACTTTATTCTCCCGTT

CATCATCCTAGCACTAGCAGCAGTCCACCTATTGTTCCTACACGAAACAGGATCCAACAACCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCATTCCATCCATACTATACAATTAAGGATATTCTAGGCGCCCTAC

TTCTCACCCTAGCCCTAGCAACCCTAGTCCTATTCTCGCCCGACTTACTAGGAGACCCTGATAACTATAT

CCCCGCAAATCCACTAAGCACCCCACCCCACATCAAACCCGAGTGGTACTTTCTATTTGCCTACGCTATC

CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCACTAATTTTCTCCATTCTAATCCTAGCCCTCA

TTCCTCTTCTACACACGTCCAAACAACGAGGAATGATATTCCGGCCCCTAAGCCAATGCCTATTTTGACT

TCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACACCCCTTCATTATTATC

GGACAACTAGCCTCCATTCTCTACTTTACAATCCTCCTAGTACTCATACCCATCGCTGGAATTATTGAAA

ACAACCTCTTAAAGTGGAGA

 

>AF007937_U_americanus_(black_bear)

GAAACTTCGGATCCCTCCTCGGAGTATGTTTAGTACTACAAATTCTAACGGGCCTATTTCTAGCCATACA

CTACACATCAGATACAACTACAGCCTTTTCATCAATCACCCATATTTGCCGAGATGTTCACTACGGATGA

ATTATCCGATACATACATGCTAACGGAGCTTCCATGTTCTTTATCTGCCTGTTCATGCACGTAGGACGGG

GTCTGTACTATGGCTCATACCTACTCTCAGAAACATGAAACATTGGCATTATCCTCCTATTTACAGTTAT

AGCCACCGCATTCATAGGATATGTCCTGCCCTGAGGCCAAATATCCTTCTGAGGAGCAACTGTTATCACC

AACCTCCTATCAGCCATCCCCTATATTGGAACAGACCTAGTAGAATGGATCTGAGGGGGCTTTTCTGTGA

ATAAGGCAACTCTGACACGATTCTTTGCCTTCCACTTTATTCTTCCATTCATCATCTTGACACTAGCAGC

AGTCCACCTATTATTCCTACACGAAACAGGATCTAATAACCCCTCTGGAATCCCATCTGACTCAGACAAA

ATCCCATTTCATCCATATTATACAATTAAAGACGCCCTAGGCGCCCTACTTTTCATCCTAGCCCTAGCAA

CTCTAGTCCTATTCTCGCCTGACCTACTAGGAGATCCCGATAACTACACCCCCGCAAACCCACTGAGCAC

CCCACCCCACATCAAACCT

 

>U23554_Tremarctos_ornatus_speckled_bear

ATGACCAACATCCGAAAAACTCACCCACTAGCTAAAATCATCAACAGCTCATTCATTGACCTCCCAACAC

CATCAAATATCTCAGCGTGATGAAACTTCGGGTCCCTTCTTGGGGTGTGCCTGATCCTACACATCCTAAC

GGGCCTATTCCTGGCCATACACTATACAGCAGACACGACTACAGCCTTCTCATCAGTCGCCCATATCTGT

CGAGACGTTAACTACGGATGAGTTATCCGATACATACACGCGAACGGAGCTTCAATATTCTTTATCTGCT

TGTTCATACACGTGGGACGGGGTCTGTATTACGGCTCATACCTATTCTCAGAAACATGAAACATTGGAAT

TATTCTCCTACTCACAATTATAGCCACAGCATTCATGGGGTACGTGCTGCCCTGAGGCCAAATATCCTTT

TGAGGAGCAACCGTCATCACCAATCTGCTATCAGCTATCCCCTACATTGGAACCGACCTAGTAGAATGAA

TCTGAGGTGGATTCTCAGTAGATAAAGCAACCCTTACCCGATTTTTCGCTTTTCACTTTATCCTTCCATT

CATTATTTTAGCACTAGCCATAGTCCACCTATTATTTCTTCACGAAACAGGATCCAACAATCCCTCTGGA

ATCTCATCGAACTCAGACAAAATCCCATTTCACCCTTACTATACAATTAAAGATATTCTAGGCGTCTTAC

TTCTTCTCCTAGCCCTGGTAACCCTAGTCCTATTCTCACCCGACTTACTAGGAGACCCCGACAACTACAC

CCCTGCAAACCCAGTGAGCACCCCACTACATATCAAGCCTGAATGGTACTTCTTATTTGCCTACGCCATT

CTACGATCTATTCCCAATAAATTGGGAGGAGTACTGGCCCTAATCTTCTCCATTCTAATCCTAGCTATCA

TTCCTCTGCTGCACACATCCAAACAACGAGGAATGATATTCCGACCTTTAAGCCAATGCCTTTTCTGGCT

TCTAGCAGCAGACTTACTAACACTAACATGAATCGGAGGACAACCAGTGGAACATCCTCTTGTTATCATC

GGACAGCTAGCCTCTATCCTCTACTTCACAATCCTCCTAGTACTTATACCCATCGCCGGAATCATTGAAA

ATAACCTCTCAAAGTGAAGA

 

>U23562_Melursus_ursinus_sloth_bear

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATTAACAACTCACTCATTGACCTCCCAGCAC

CGTCAAACATCTCAGCATGATGAAACTTCGGATCCCTCCTCGGAGTGTGCTTAATTCTACAAATTCTAAC

AGGCCTATTTCTAGCCATGCACTATACATCAGACACAACCACAGCCTTTTCATCAGTCACCCATATCTGT

CGAGACGTCCACTACGGATGAATCATCCGATATATACATGCAAACGGGGCCTCCATATTCTTTATCTGCC

TATTCATGCACGTAGGACGGGGTCTGTACTATGGCTCATACCTATTCTCGGAGACATGAAACACCGGCAT

TATTCTCCTATTTACAGTCATAGCCACCGCATTCATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTC

TGAGGAGCAACTGTCATCACCAATCTGCTATCGGCCATTCCCTATATTGGAGCGGACCTAGTAGAATGAA

TCTGAGGGGGGTTTTCCGTAGACAAGGCGACTCTAACACGATTCTTTGCCTTCCACTTTATCTTTCCATT

TATCATCCTAGCACTGGTAATAGTCCACCTATTGTTCCTACATGAAACAGGATCTAACAACCCCTCTGGA

ATCCCATCCAACTCAGACAAAATCCCATTTCACCCATATTATACAATTAAAGATATTATAGGCGCCTTAC

TTCTCATCCTAGCCCTGGCAACCCTAGTCCTATTCTCACCCGACTTACTAGGAGACCCCGACAACTACAC

CCCTGCAAACCCACTGAGCACCCCACCCCACATCAAACCCGAGTGGTACTTTCTATTTGCCTACGCTATC

CTACGATCCATCCCCAATAAACTAGGAGGGGTACTAGCACTAATTTTCTCCATCCTAATCCTAGCTATCA

TTCCCCTTCTACACACATCCAAACAACGAGGAATGATATTCCGGCCCCTAAGCCAATGCCTATTTTGACT

CCTAGTAGCAGACCTACTAACACTTACATGAATCGGAGGACAACCAGTAGAATATCCCTTCATCACTATT

GGACAACTAGCCTCCATCCTCTACTTCATAATCCTCCTAGTACTCATGCCCATCGCCGGAATCATTGAAA

ATAATCTCTCAAAGTGAAGA

 

>U23558_S_thibetanus_Asian_black_bear

ATGACCAACATCCGAAAAACCCATCCATTAGCCAAAATCATCAACAACTCACTCATTGATCTCCCAGCAC

CATCAAATATCTCAGCATGATGAAACTTTGGATCCCTCCTCGGAATATGCCTAATCCTACAGATTCTGAC

AGGCCTATTTCTAGCTATACACTACACATCAGACGCGACTACAGCCTTTTCATCAGTCGCCCATATTTGC

CGAGACGTCCATTACGGATGAATTATCCGATACATACATGCAAACGGAGCCTCCATGTTCTTCATCTGCC

TATTCATACACGTAGGACGGGGCTTGTACTATGGCTCATACCTACTCTCAGAAACATGAAACATTGGCAT

CATCCTCCTATTTACAGTTATAGCCACCGCATTCATAGGATATGTCCTACCCTGAGGCCAAATATCTTTC

TGAGGAGCGACTGTCATTACCAACCTCCTATCAGCCATTCCCTATATTGGAACGGACCTAGTAGAGTGAA

TCTGAGGGGGCTTTTCCGTAGATAAAGCAACCCTAACACGATTCTTTGCTTTCCACTTTATCCTTCCATT

TATCATCCTAGCACTAGCAGCAGTCCATCTATTGTTCCTACACGAAACAGGATCCAACAACCCCTCTGGA

ATCCCATCCGACTCGGACAAAATCCCATTCCACCCATACTATACAATTAAGGACGCCCTAGGCGCCCTAC

TTCTCATTCTAGCCCTAGCAACTCTAGTTCTATTCTCGCCCGACTTACTGGGAGACCCTGACAACTATAC

CCCCGCAAACCCACTGAGCACCCCGCCCCACATCAAGCCCGAGTGATACTTTTTATTTGCTTACGCCATC

TTACGATCCATCCCCAACAAACTAGGAGGAGTACTAGCGCTAATCTTCTCTATCCTAATCCTAGCCATTA

TCCCCCTTCTACACACATCCAAACAACGAGGAATAATGTTCCGACCCCTAAGCCAATGCCTATTTTGACT

CCTAGTAGCAGACCTACTAACACTAACATGAATCGGAGGACAACCAGTAGAACATCCCTTCATCATTATC

GGACAGCTAGCCTCCATCCTCTACTTCACAATCCTCCTGGTGCTCATGCCCATCGCTGGAATCATTGAAA

ACAATCTCTCAAAGTGAAGA

 

>Helarctos_malayanus_sun_bear

ATGACCAACATCCGAAAAACCCACCCATTAGCTAAAATCATTAACAACTCACTTATTGACCTCCCAGCAC

CATCAAACATCTCGGCGTGATGAAACTTCGGATCCCTCCTCGGAGTATGCTTAATCCTACAGATTATGAC

AGGCCTATTTCTAGCCATACACTATACATCAGACACAACCACAGCCTTTTCATCAATCACTCATATCTGC

CGAGACGTTCACTACGGATGAATTATCCGATATATACATGCAAACGGAGCCTCCATGTTCTTTATCTGCC

TATTCATGCACGTAGGACGGGGTCTGTACTATGGCTCGTACCTATTCTCAGAAACATGAAACATCGGTAT

TATCCTCCTATTTACAGTTATAGCCACCGCATTTATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTC

TGAGGAGCAACTGTCATTACCAATCTCTTATCAGCCATCCCCTATATTGGAACGGACCTAGTAGAATGAG

TCTGAGGAGGCTTTTCCGTAGACAAGGCGACTCTAACACGATTCTTTGCCTTCCACTTTATCCTTCCGTT

CATCATCTTGGCACTAACAGCGGTCCACCTATTATTCCTACACGAAACAGGGTCCAACAATCCCTCTGGA

ATCCCATCTGACTCAGACAAAATCCCATTTCACCCGTACTATACAATTAAGGACATCCTAGGCGCCCTAC

TTCTTACCCTAGCCCTAACAACCCTAGTTCTATTCTCGCCCGACTTACTAGGAGACCCTGACAACTACAT

CCCCGCAAATCCATTGAGCACCCCACCCCACATCAAACCCGAATGGTACTTTCTATTTGCCTACGCTATC

CTACGATCCATCCCTAATAAACTAGGAGGAGTACTAGCTCTAGTCTTCTCTATCCTAATCCTAGCCATTA

TCCCCCTCTTACACACATCCAAGCAACGAGGAATGATATTCCGACCTCTGAGCCAATGCCTATTTTGACT

CCTAGTAGCAGACCTACTAACACTAACATGAATTGGAGGACAACCAGTAGAACATCCCTTTACCATTATC

GGACAACTAGCCTCCATTCTCTATTTCATAATCTTCCTAGTATTCATACCCATCGCTGGAATTATTGAAA

ATAACCTCTCAAAATGAAGA

 

 

 

Unknown fungal sequences:

 

>LSU_unknown

GGATTCCCCTAGTAACTGCGAGTGAAGCGGGAAAAGCTCAAATTTAAAATCTGGCAGGGTCCTCTCCGTCCGAGTTGTAATCTAGAGAAGCGCTATCCGT GCCGGACCGTGTACAAGTCTCTTGGAACAGAGCGTCGCAGAGGGTGAGAATCCCGTCTTTGACACGGACTGCCGGTGCACTGTGATACGCTCTCAACGAG TCGAGTTGTTTGGGAATGCAGCTCAAAACGGGTGGTAAACTCCATCTAAAGCTAAATATTGGCGAGAGACCGATAGCGAACAAGTACCGTGAGGGAAAGA TGAAAAGAACTTTGGAAAGAGAGTTAAACAGTACGTGAAATTGTTGAAAGGGAAACGCTTGAAGTCAGTCGCGTCTGCCGGGGATCATCCTTCTCTCGAG TCGGAGTACTTCCCGGTCGACGGGTCAGCATCAGTTTCGACCGTCGGATAAAAGCACGAGGAATGTGGCATCTCCGGATGTGTTATAGCCTCGGGTTGCA TACGACGGTCGGGACTGAGGAACTCAGCACGCCGCGAGGCCGGGGTTCTCGAACCCACGTACGTGCTTAGGATGCTGGCGTAATGGCTTTAATCGACCCG TCTTGAAACACGGACCAAGGAGTCTAACATGCCCGCGAGTGTTTGGGTGCAAAACCCGAGCGCGCAACGAAAGTGAAAGTTGAGATCTCTGTCGCGGAGA GCATCGACGCCCAGACCAGACCTTTTGTGACGGATCTGCGGTCGAGCGTGTATGTTGGGACCCGAAAGATGGTGAACTATGCCTGAATAGGGCGAAGCCA GAGGAAACTCTGGTGGAGGCTCGTAGCGATTCTGACGTGCAAATCGATCGTCGAATTTGGGTATAGGGGCGAAAGACTAATCGAACCATCTAGTAGCTGGTTCCTGCC

 

>ITS_unknown

AAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTAATGAATTTAAACACGAGAGTTGGAAAGGGTTGCTGCTGGCCGAAAGGTATTCGTGCACGCCCCTCCTTCTCTCTGTGTTCATCTCCGAACCCCTGTGCACCCGTCGTAGGCCGAGCGATCGGCCTATGTTTTTTTCACAAACACCGTAAAGTTAAACGAATGTCATTCACGCAATGGGTCACTCCTCGAAAGAGGCCGGCGGCTCTTTGTTAAAATAAAAATAATACAACTTTCAACAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCCCTTTGGTATTCCGAAGGGCATGCCTGTTTGAGTGTCATTAAATTATCAACTCTGAAAGCTTTCGTGGCCCTTCAGAGCTTGGATCTTGGAGCGTGCCGGGGATCCCCATCCTCGGCTCCTCTTGAAATGCATAAGTGGAACCTCTACAACTGCAGATCTGTTCTGGCGTGATAATTATCTACGTCGCGGCAGAGAAGCGGGTTGGATCCGCCTACAACCGTCGCGTCTCTGCGACAAAACGGGGGCCAGTCCCCCTCATTGACAATTTGACCTCAAATCAGGTAG

 

>RPB2_unknown

CCGAATGGACACTATGGCCAATATCCTGTACTACCCTCAGAAACCTCTTGCGACGACACGTTCCATGGAGTACCTTAAGTTCCGGGAACTTCCAGCCGGT CAAAACGCGATCGTTGCAATTCTTTGCTACAGCGGATACAACCAGGAAGATTCCGTTATTATGAATCAGAGCTCGATTGATCGAGGCCTTTTCCGCAGTA TCTACTACCGCAGCTACATGGACCTCGAGAAGAAGAGTGGAATTCAACAGCTCGAGGAGTTCGAGAAGCCAACGCGGGATAACACGTTACGCATGAAACA TGGAACGTACGACAAACTGGAGGATGATGGGTTAATCGCCCCTGGAACTGGTGTCCGAGGAGAAGACATTATCATCGGTAAAACGGCGCCGATTCCACCA GACAGCGAAGAGCTTGGTCAACGGACACGAACCCACACGCGGAGAGATGTGTCAACACCCCTGAAGAGTACAGAAAGCGGTATAGTCGATCAGGTTCTGA TCACGACGAATTCGGAAGGTCAAAAGTTTGTCAAGATTCGTGTTCGATCAAGTCGTGTTCCCCAAATCGGGGACAAATTTGCATCACGCCACGGTCAGAA AGGAACTATCGGTATCACGTATCGACAAGAAGACATGCCATTCACCGCCGAAGGTATCGTTCCTGACATTATCATCAATCCCCACGCCATTCCTTCCCGC ATGACGATCGGCCACTTGGTGGAATGCCTACTATCAAAAGTTGCAACTCTGATTGGCAACGAGGGTGATGCTACGCCCTTCACGGACCTCACCGTTGAGT CGGTCTCAACTTTCTTGAGGCAAAAGGGGTACCAGTCACGCGGGCTGGAGGTGATGTTCCACGGGCACACGGGACGCAAGCTCCAGGCTCAGGNTTATCTCGGNCCTACGTACTACC

 

 

Literature:

Huson, D.H. and D. Bryant. 2006. Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution 23: 254-267/

Talbot, S. L. and G. F. Shields. 1996a. A phylogeny of the bears (Ursidae) inferred from complete sequences of three mitochondrial genes. Molecular Phylogenetics and Evolution 5: 567-575.

Talbot, S. L. and G. F. Shields. 1996b. Phylogeography of brown bears (Ursus arctos) of Alaska and paraphyly within the Ursidae. Molecular Phylogenetics and Evolution 5: 477-494.