Sequence Alignment - Michael's Notes

# Sequence Alignment Suppose $M$ is a given alignment between $X$ and $Y$. The goal is to find the minimal alignment cost, aka the *optimal alignment*. The cost of $M$ is the sum of gap and mismatch costs: - Gap Penalty: Every - Mismatch Cost: For each pair of leters $p,q$ in the alphabet, there is a mismatch cost $\alpha_{pq}$ for lining up $p$ and $q$. (Assumption that $\alpha_{pp}=0$) There are three possible situations we can encounter when comparing characters: - Alignment: They are identical - Mismatch ## Types of Alignments ### Global vs Local Alignment Sequence alignment is tha arrangement of biological sequences to identify regions of similarity and help identify any structural and functional overlap. - Sequences from a sample are often aligned with sequences of a reference genome to identify - **Global Alignment:** Aims to align every residue in every sequence from start to end - **Local Alignment:** Aims to align parts of the sequence which share the highest similarity - Typically uses ![[Pasted image 20240220055256.png|200]] #### Operational Taxonomic Unit (OTU) > https://www.cd-genomics.com/microbioseq/operational-taxonomic-unit-otu-and-otu-clustering.html https://www.zymoresearch.com/blogs/blog/microbiome-informatics-otu-vs-asv Ex: 16S rRNA-Seq, ### Multiple vs Pairwise Alignment [Pairwise vs. Multiple Sequence Alignments: Which has better accuracy?](https://www.biostars.org/p/114718/#114779) ## Scoring/Alignment Matrices