# Integration of Single Cell Data
> [!NOTE]- Resources
> - [Comprehensive Integration of Single Cell Data—Rahul Satija](https://www.youtube.com/watch?v=omU_-ExMYIE&list=PL22s6k9bpeocCbGG9mjuIJqLlduo1At5E&index=1)
Cells will cluster by their biological variation, but also by any technological variation present as well.
This leads to a mess
This problem is similar to sequence alignment, in the sense that if you want to identify whats different between two different datasets, you begin by identifying points of similarity and use them as an *anchor*
**Batch Effect:** Systemic variability introduced during the experimental process that is not related to the biological differences between individual cells.
---
**Training & Embedding of Models**
- Also known as *batch correction* or *transfer learning*
- Once the initial dataset is used to train a model, it captures the underlying structure and relationships between cells. Reusing this model allows us to embed the data into the same dimension reduced space
- enhanced comparability, & reduced computational burden
- [ ] While it makes sense to embed highly similar datasets (i.e. same tissue source or disease state), could we learn new information by trying to embed distant sample types?
The value in batch correction is that *it enables you to see population heterogeneity* within clusters/celltypes *across batches*.
**Cross-Dataset Type Integration**
- [scATAC-Seq x 10X Genomics Multiome Integration](https://stuartlab.org/signac/articles/integrate_atac)
## Correction Techniques
### Anchors
### Unsupervised Approaches
One such method is called "unsupervised domain adaptation" or "domain adaptation without alignment," where the aim is to learn a common representation space for different domains without using labeled data or explicit alignment. Techniques like adversarial domain adaptation and domain adversarial neural networks fall under this category. These methods attempt to align the distributions of different domains in an unsupervised manner, without relying on specific anchor points or labels.
## Packages for Batch Effect Correction
- [Review article on existing batch correction techniques](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9)
- [Batch Effect Correction (10X Genomics)](https://www.10xgenomics.com/analysis-guides/introduction-batch-effect-correction)
### SCTransform (Seurat)
CCA
#### Signac (ATAC + scRNA)
[Joint RNA and ATAC analysis: 10x multiomic](https://stuartlab.org/signac/articles/pbmc_multiomic)
### Harmony (Broad Institute)
[Harmony in motion: visualize an iterative algorithm for aligning multiple datasets](https://slowkow.com/notes/harmony-animation/)
### fastMNN (batchelor)
[Mutual Nearest Neighbor Correction (fastMNN)](https://bioinformatics-core-shared-training.github.io/UnivCambridge_ScRnaSeq_Nov2021/Slides/07_DataIntegrationAndBatchCorrectionSlides.html#8)
- a
![[Single Cell Integration Anchors.png|500]]
- Violin Plot
-
why would we use single cell seq vs bulk seq
- What established/characterized relationships exist between microbiota and immune cells that we can use as anchors
- do ANY joint dimension reduction/integration techniques exist that don't require a traditional anchor?
- we need to identify correlations between the gut microbiome and gene expression within immune cells (as a starting point; we will also need to integrate scATAC-Seq and other techniques in the future)
## Joint Dimension Reduction
CCA
Must be single cell ()
![[Pasted image 20240322081710.png|450]]
[aaaa](zotero://select/groups/5402523/items/NII4JU35)