Automatic cell-type harmonization and integration across Human Cell Atlas datasets

Cite

Xu, C., Prete, M., Webb, S., Jardine, L., Stewart, B.J., Hoo, R., He, P., Meyer, K.B. and Teichmann, S.A., 2023. Automatic cell-type harmonization and integration across Human Cell Atlas datasets. Cell, 186(26), pp.5876-5891.

https://doi.org/10.1016/j.cell.2023.11.026

Abstract

Harmonizing cell types across the single-cell community and assembling them into a common framework is central to building a standardized Human Cell Atlas. Here, we present CellHint, a predictive clustering tree-based tool to resolve cell-type differences in annotation resolution and technical biases across datasets. CellHint accurately quantifies cell-cell transcriptomic similarities and places cell types into a relationship graph that hierarchically defines shared and unique cell subtypes. Application to multiple immune datasets recapitulates expert-curated annotations. CellHint also reveals underexplored relationships between healthy and diseased lung cell states in eight diseases. Furthermore, we present a workflow for fast cross-dataset integration guided by harmonized cell types and cell hierarchy, which uncovers underappreciated cell types in adult human hippocampus. Finally, we apply CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database with ∼3.7 million cells and various machine learning models for automatic cell annotation across human tissues.

Datasets

1. Heart

Explore
Data source
Subject ID
Cell subtype
CellHint harmonised group
Cell type
Assay
Disease
Sex
Tissue
Ethnicity
Age
Sample ID
Age group
Leiden clustering
Litvinukova et al. 2020451,513 cells
Tucker et al. 2020287,269 cells
Koenig et al. 2022150,582 cells
Kuppe et al. 202241,648 cells
Automatic cell-type harmonization and integration across Human Cell Atlas datasets

2. Hippocampus

Explore
Data source
Subject ID
Cell subtype
CellHint harmonised group
Cell type
Assay
Disease
Sex
Tissue
Ethnicity
Age
Age group
Sample ID
Leiden clustering
Siletti et al. 2022345,593 cells
Franjic et al. 2022191,529 cells
Ayhan et al. 2021129,655 cells
Tran et al. 202110,068 cells
Automatic cell-type harmonization and integration across Human Cell Atlas datasets

3. Blood

Explore
Data source
Subject ID
Cell subtype
CellHint harmonised group
Cell type
Assay
Disease
Sex
Tissue
Ethnicity
Age
Age group
Sample ID
Leiden clustering
Ren et al. 2021164,945 cells
Stephenson et al. 202197,008 cells
Yoshida et al. 202146,964 cells
Dominguez Conde et al. 202226,999 cells
Automatic cell-type harmonization and integration across Human Cell Atlas datasets

4. Lung

Explore
Data source
Subject ID
Cell subtype
CellHint harmonised group
Cell type
Assay
Disease
Sex
Tissue
Ethnicity
Age
Age group
Sample ID
Leiden clustering
Madissoon et al. 2022105,135 cells
Adams et al. 202095,301 cells
Travaglini et al. 202060,971 cells
Madissoon et al. 202057,019 cells
Automatic cell-type harmonization and integration across Human Cell Atlas datasets

5. Intestine

Explore
Data source
Subject ID
Cell subtype
CellHint harmonised group
Cell type
Assay
Disease
Sex
Tissue
Ethnicity
Age
Age group
Leiden clustering
Sample ID
Elmentaite et al. 2021113,121 cells
Smillie et al. 2019109,884 cells
James et al. 202027,680 cells
Burclaff et al. 202212,590 cells
Automatic cell-type harmonization and integration across Human Cell Atlas datasets

6. Liver

Explore
Data source
Subject ID
Cell subtype
CellHint harmonised group
Cell type
Assay
Disease
Sex
Tissue
Age
Leiden clustering
Age group
Sample ID
Guilliams et al. 2022167,587 cells
Andrews et al. 202273,295 cells
Aizarani et al. 201910,352 cells
MacParland et al. 20188,444 cells
Automatic cell-type harmonization and integration across Human Cell Atlas datasets

7. Skeletal muscle

Explore
Data source
Subject ID
Cell subtype
CellHint harmonised group
Cell type
Assay
Disease
Sex
Tissue
Ethnicity
Age
Leiden clustering
Age group
Sample ID
Perez et al. 2022143,051 cells
Tabula Sapiens 202230,734 cells
Micheli et al. 202022,058 cells
He et al. 20205,732 cells
Automatic cell-type harmonization and integration across Human Cell Atlas datasets

8. Spleen

Explore
Data source
Subject ID
Cell subtype
CellHint harmonised group
Cell type
Assay
Disease
Sex
Tissue
Ethnicity
Age
Leiden clustering
Age group
Sample ID
Madissoon et al. 202092,049 cells
Dominguez Conde et al. 202270,099 cells
Tabula Sapiens 202234,004 cells
He et al. 20204,512 cells
Automatic cell-type harmonization and integration across Human Cell Atlas datasets

9. Kidney

Explore
Data source
Subject ID
Cell subtype
CellHint harmonised group
Cell type
Assay
Disease
Sex
Tissue
Ethnicity
Age
Leiden clustering
Age group
Sample ID
Lake et al. 2021105,428 cells
HCA Kidney 202248,772 cells
Stewart et al. 201920,319 cells
Muto et al. 202119,985 cells
Automatic cell-type harmonization and integration across Human Cell Atlas datasets

10. Lymph node

Explore
Data source
Subject ID
Cell subtype
CellHint harmonised group
Cell type
Assay
Disease
Sex
Tissue
Ethnicity
Age
Leiden clustering
Age group
Sample ID
Dominguez Conde et al. 202282,974 cells
Szabo et al. 201916,527 cells
James et al. 202013,872 cells
Elmentaite et al. 202111,067 cells
Automatic cell-type harmonization and integration across Human Cell Atlas datasets

11. Pancreas

Explore
Data source
Subject ID
Cell subtype
CellHint harmonised group
Cell type
Assay
Disease
Sex
Tissue
Ethnicity
Age
Leiden clustering
Age group
Sample ID
Tosti et al. 202179,367 cells
Tritschler et al. 202226,474 cells
Fasolino et al. 202213,960 cells
Muraro et al. 20162,115 cells
Automatic cell-type harmonization and integration across Human Cell Atlas datasets

12. Bone marrow

Explore
Data source
Subject ID
Cell subtype
CellHint harmonised group
Cell type
Assay
Disease
Sex
Tissue
Ethnicity
Age
Leiden clustering
Age group
Sample ID
Dominguez Conde et al. 202239,889 cells
Tabula Sapiens 202212,293 cells
Roy et al. 202111,201 cells
He et al. 20203,230 cells
Automatic cell-type harmonization and integration across Human Cell Atlas datasets

Alias names

PRJEB50820