Hyper-parameters for clustering parameters optimization (CPO)
UNAGI has a built-in clustering parameters optimization strategy to maintain consistency in cluster numbers and sizes, as well as the distances between cell neighbors, across various time-points. The improper number of neighbors in the neighborhood graph or the improper resolution setting can lead to over-clustering or underclustering, introducing complications in the analysis process. The consistency in the number and size of clusters is important for tracing the lineage of cell populations through various time-points of development or disease progression. The proposed CPO method encompasses two primary steps.
Searching for the optimal number of neighbors to construct graphs with consistent cellneighbor distances across different time-points. Starting by selecting an anchor stage, which is the stage with a cell count closest to the median count of all time-points, denoted as \(N_{anchor}\). Then the average distance between cells is caculated and their neighbors in this anchor stage are identified to establish the
anchor neighbor distance. The goal for other time-points is to find a number of neighbors that yields a neighbor distance similar to that of the anchor stage. Noted that the number of neighbors should be within the pre-defined range \([N_{min}, N_{max}]\).Determining the optimal clustering resolution. A resolution range \([R_{min},R_{max}]\) should be predefined for different time-points. CPO strategy will find a set of resolutions within the predefined range to have a similar median number of cells per cluster across time-points.
By employing the CPO method, UNAGI ensures that the neighborhood graphs for different stages maintain similar cell-neighbor distances. Additionally, this approach ensures a consistent number and size of clusters across different stages, thereby enhancing the coherence and robustness of our analytical framework.
Users can specify hyper-parameters described above using the function UNAGI().register_CPO_parameters. Parameters:
anchor_neighbors: \(N_{anchor}\)
max_neighbors: \(N_{max}\)
min_neighbors: \(N_{min}\)
resolution_min: \(R_{min}\)
resolution_max: \(R_{max}\)
The larger number of neighbors will lead to a more sparse cell neighbors graph and potentially lead to larger clusters. On the other hand, the smaller number of neighbors will have a more condensed cell neighbor graphs which could lead to smaller clusters. For the resolution, typically increasing the resolution will lead to more clusters.
import warnings
warnings.filterwarnings("ignore")
from UNAGI import UNAGI
unagi = UNAGI()
#....... load the data and setup the model architecture and hyperparameters ..........#
iDREM_Path = 'directory_to_iDREM_tool'
anchor_neighbors = 15
max_neighbors = 30
min_neighbors = 10
resolution_min = 0.5
resolution_max = 1.2
unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)
unagi.run_UNAGI(iDREM_Path)
Increasing the number of neighbors while freezing the \(R_{min}\) and \(R_{max}\).
anchor_neighbors = 30
max_neighbors = 40
min_neighbors = 20
resolution_min = 0.5
resolution_max = 1.2
unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)
unagi.run_UNAGI(iDREM_Path)
Decreasing the number of neighbors while freezing the \(R_{min}\) and \(R_{max}\).
anchor_neighbors = 10
max_neighbors = 15
min_neighbors = 5
resolution_min = 0.5
resolution_max = 1.2
unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)
unagi.run_UNAGI(iDREM_Path)
Increasing the \(R_{min}\) and \(R_{max}\) while keeping the number of neighbors.
anchor_neighbors = 15
max_neighbors = 30
min_neighbors = 10
resolution_min = 0.8
resolution_max = 2.0
unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)
unagi.run_UNAGI(iDREM_Path)
Decreasing the \(R_{min}\) and \(R_{max}\) while keeping the number of neighbors.
anchor_neighbors = 15
max_neighbors = 30
min_neighbors = 10
resolution_min = 0.2
resolution_max = 0.8
unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)
unagi.run_UNAGI(iDREM_Path)