{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Hyper-parameters for clustering parameters optimization (CPO)\n", "\n", "UNAGI has a built-in clustering parameters optimization strategy to maintain consistency in cluster numbers and sizes, as well as the distances between cell neighbors, across various time-points. The improper number of neighbors in the neighborhood graph or the improper resolution setting can lead to over-clustering or underclustering, introducing complications in the analysis process. The consistency in the number and size of clusters is important for tracing the lineage of cell populations through various time-points of development or disease progression. The proposed CPO method encompasses two primary steps. \n", "\n", "- Searching for the optimal number of neighbors to construct graphs with consistent cellneighbor distances across different time-points. Starting by selecting an anchor stage, which is the stage with a cell count closest to the median count of all time-points, denoted as $N_{anchor}$. Then the average distance between cells is caculated and their neighbors in this anchor stage are identified to establish the `anchor neighbor distance`. The goal for other time-points is to find a number of neighbors that yields a neighbor distance similar to that of the anchor stage. Noted that the number of neighbors should be within the pre-defined range $[N_{min}, N_{max}]$.\n", "\n", "- Determining the optimal clustering resolution. A resolution range $[R_{min},R_{max}]$ should be predefined for different time-points. CPO strategy will find a set of resolutions within the predefined range to have a similar median number of cells per cluster across time-points.\n", "\n", "By employing the CPO method, UNAGI ensures that the neighborhood graphs for different stages maintain similar cell-neighbor distances. Additionally, this approach ensures a consistent number and size of clusters across different stages, thereby enhancing the coherence and robustness of our analytical framework.\n", "\n", "Users can specify hyper-parameters described above using the function `UNAGI().register_CPO_parameters`. Parameters:\n", "\n", "- anchor_neighbors: $N_{anchor}$\n", "- max_neighbors: $N_{max}$\n", "- min_neighbors: $N_{min}$\n", "- resolution_min: $R_{min}$\n", "- resolution_max: $R_{max}$\n", "\n", "The larger number of neighbors will lead to a more sparse cell neighbors graph and potentially lead to larger clusters. On the other hand, the smaller number of neighbors will have a more condensed cell neighbor graphs which could lead to smaller clusters. For the resolution, typically increasing the resolution will lead to more clusters.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings(\"ignore\")\n", "from UNAGI import UNAGI\n", "unagi = UNAGI()\n", "\n", "#....... load the data and setup the model architecture and hyperparameters ..........#\n", "\n", "iDREM_Path = 'directory_to_iDREM_tool'\n", "\n", "anchor_neighbors = 15\n", "max_neighbors = 30\n", "min_neighbors = 10\n", "resolution_min = 0.5\n", "resolution_max = 1.2\n", "\n", "unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)\n", "unagi.run_UNAGI(iDREM_Path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Increasing the number of neighbors while freezing the $R_{min}$ and $R_{max}$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "anchor_neighbors = 30\n", "max_neighbors = 40\n", "min_neighbors = 20\n", "resolution_min = 0.5\n", "resolution_max = 1.2\n", "\n", "unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)\n", "unagi.run_UNAGI(iDREM_Path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Decreasing the number of neighbors while freezing the $R_{min}$ and $R_{max}$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "anchor_neighbors = 10\n", "max_neighbors = 15\n", "min_neighbors = 5\n", "resolution_min = 0.5\n", "resolution_max = 1.2\n", "\n", "unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)\n", "unagi.run_UNAGI(iDREM_Path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Increasing the $R_{min}$ and $R_{max}$ while keeping the number of neighbors." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "anchor_neighbors = 15\n", "max_neighbors = 30\n", "min_neighbors = 10\n", "resolution_min = 0.8\n", "resolution_max = 2.0\n", "\n", "unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)\n", "unagi.run_UNAGI(iDREM_Path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Decreasing the $R_{min}$ and $R_{max}$ while keeping the number of neighbors." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "anchor_neighbors = 15\n", "max_neighbors = 30\n", "min_neighbors = 10\n", "resolution_min = 0.2\n", "resolution_max = 0.8\n", "\n", "unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)\n", "unagi.run_UNAGI(iDREM_Path)" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }