{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Configure the model architecture and training parameters\n", "\n", "This tutorial provides a step-by-step guide on configuring the model architectures, training hyperparameters, and analysis of time-series single dataset using UNAGI. We demonstrate the capabilities of UNAGI by applying it to scRNA-seq data sampled from a single-nuclei RNA sequencing data.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings('ignore')\n", "from UNAGI import UNAGI\n", "unagi = UNAGI()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1: Setup and load the datasets\n", "\n", "After loading UNAGI package, we need to setup the data for UNAGI training.\n", "\n", "- We need to specify the data path of your h5ad files after stage segmentation. e.g. '../data/small/0.h5ad'. Then UNAGI will load all h5ad files in the target directory. \n", "\n", "- UNAGI requires the total number of time-points the dataset has as the input. e.g. total_stage=4\n", "\n", "- UNAGI requires the key of time-points attribute in the annData.obs table.\n", "\n", "- If the dataset is not splited into individual stages, you can specify the splited_dataset as False to segment the dataset.\n", "\n", "- To build the K-Nearest Neighbors (KNN) connectivity matrix in Graph convolution training, the neighbors number of KNN should be defined. The default value is 25. \n", "\n", "- You can also specify how many threads you want to use when using UNAGI. The default number of threads is 20. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "mkdir: cannot create directory ‘../UNAGI/data/example/0’: File exists\n" ] } ], "source": [ "unagi.setup_data('../UNAGI/data/example',total_stage=4,stage_key='stage')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: Configure the model architecture of UNAGI and training hyper-parameters\n", "\n", "First, it's mandatory to specify the **task** your are executing. (e.g. we call the example dataset as task='small_sample') The **task** is the identifier of your experiments and you can reterive the trained model and the results of each iteration at '../data/**task**/' directory. \n", "\n", "Next, you will need to specify the distribution of you single cell data. UNAGI provides negative binomial (NB), zero-inflated negative binomial (ZINB), zero-inflated log normal, and normal distribution to model your single cell data.\n", "\n", "You can use the *device* keyword to specify the device you want to use for training.\n", "\n", "'epoch_initial': the number of training epochs for the first iteration.\n", "\n", "'epoch_iter': the number of training epochs for the iterative training.\n", "\n", "'max_iter': the total number of iterations UNAGI will run\n", "\n", "'BATCHSIZE': the batch size of a mini-batch\n", "\n", "'lr': the learning rate of Graph VAE\n", "\n", "'lr_dis': the learning rate of the adversarial discriminator\n", "\n", "'latent_dim': the dimension of Z space\n", "\n", "'hiddem_dim': the neuron size of each fully connected layers\n", "\n", "'graph_dim': the dimension of graph representation\n", "\n", "After settingt the training hyper parameters and model architectures, you can use `unagi.run_UNAGI()` to start training. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "...\n", "0\n", "loss 9241.198552284153\n", "[epoch 000] average training loss: 9241.1986\n", "(13550, 2484)\n", "top gene\n", "done\n", "write stageadata\n", "update done\n", "(4195, 2484)\n", "top gene\n", "done\n", "write stageadata\n", "update done\n", "(3152, 2484)\n", "top gene\n", "done\n", "write stageadata\n", "update done\n", "(6750, 2484)\n", "top gene\n", "done\n", "write stageadata\n", "update done\n", "edges updated\n", "[[[3], [0], [2, 3, 15], [0, 6, 7, 8, 12]], [[5], [5], [4], [5]]]\n", "b''\n", "[[[3], [0], [2, 3, 15], [0, 6, 7, 8, 12]], [[5], [5], [4], [5]]]\n", "['3', '0', '2n3n15', '0n6n7n8n12']\n", "['5', '5', '4', '5']\n", "['5-5-4-5.txt', '3-0-2n3n15-0n6n7n8n12.txt', '3-0-0n11-0n10n15n16n22.txt', '3-1-1n3-0n4n5n7.txt', '4-5-6-6.txt', '6-10-5-6.txt', '9-14-12-4.txt', '5-4-4-3n9.txt', '7-8n17-6n7-7.txt', '6-6-7-8.txt']\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "mkdir: cannot create directory ‘../UNAGI/data/example/0/idremInput’: File exists\n", "Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.\n", "Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "b\"java.lang.IllegalArgumentException: WARNING: 'TF-Minimum_Absolute_Log_Ratio_Expression' is an unrecognized variable.\\n\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.parseDefaults(DREM_IO_Batch.java:916)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.(DREM_IO_Batch.java:233)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.(DREM_IO_Batch.java:206)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO.main(DREM_IO.java:5613)\\n\"\n", "b\"java.lang.IllegalArgumentException: WARNING: 'TF-Minimum_Absolute_Log_Ratio_Expression' is an unrecognized variable.\\n\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.parseDefaults(DREM_IO_Batch.java:916)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.(DREM_IO_Batch.java:233)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.(DREM_IO_Batch.java:206)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO.main(DREM_IO.java:5613)\\n\"\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.\n", "Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "b'java.lang.IllegalArgumentException: All Genes Filtered\\n\\tat edu.cmu.cs.sb.core.DataSetCore.filtergenesgeneral(DataSetCore.java:902)\\n\\tat edu.cmu.cs.sb.core.DataSetCore.filtergenesthreshold2change(DataSetCore.java:1011)\\n\\tat edu.cmu.cs.sb.core.DataSetCore.filtergenesthreshold2(DataSetCore.java:979)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO.buildset(DREM_IO.java:1920)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.clusterscript(DREM_IO_Batch.java:1313)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.(DREM_IO_Batch.java:264)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.(DREM_IO_Batch.java:206)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO.main(DREM_IO.java:5613)\\n'\n", "b'0\\n1\\n2\\n3\\n4\\n5\\n6\\n7\\n8\\n9\\n10\\n11\\n12\\nwriting Json..\\nTime: 30157ms\\n'\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.\n", "Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "b'java.lang.IllegalArgumentException: All Genes Filtered\\n\\tat edu.cmu.cs.sb.core.DataSetCore.filtergenesgeneral(DataSetCore.java:902)\\n\\tat edu.cmu.cs.sb.core.DataSetCore.filtergenesthreshold2change(DataSetCore.java:1011)\\n\\tat edu.cmu.cs.sb.core.DataSetCore.filtergenesthreshold2(DataSetCore.java:979)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO.buildset(DREM_IO.java:1920)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.clusterscript(DREM_IO_Batch.java:1313)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.(DREM_IO_Batch.java:264)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.(DREM_IO_Batch.java:206)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO.main(DREM_IO.java:5613)\\n'\n", "b'0\\n1\\n2\\n3\\n4\\n5\\n6\\n7\\n8\\n9\\n10\\n11\\nwriting Json..\\nTime: 27588ms\\n'\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.\n", "Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "b'0\\n1\\n2\\n3\\n4\\n5\\n6\\n7\\n8\\n9\\n10\\nwriting Json..\\nTime: 30188ms\\n'\n", "b'0\\n1\\n2\\n3\\n4\\n5\\n6\\n7\\n8\\n9\\n10\\n11\\n12\\n13\\n14\\nwriting Json..\\nTime: 33155ms\\n'\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.\n", "Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "b'## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\n## NaN 0 0.0 0.0 0.0 0.0 -10.0 NaN\\njava.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 3\\n\\tat edu.cmu.cs.sb.drem.DREM_Timeiohmm.deleteMinPath(DREM_Timeiohmm.java:2382)\\n\\tat edu.cmu.cs.sb.drem.DREM_Timeiohmm.(DREM_Timeiohmm.java:676)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.clusterscript(DREM_IO_Batch.java:1359)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.(DREM_IO_Batch.java:264)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO_Batch.(DREM_IO_Batch.java:206)\\n\\tat edu.cmu.cs.sb.drem.DREM_IO.main(DREM_IO.java:5613)\\n'\n", "b'0\\n1\\n2\\n3\\n4\\n5\\n6\\n7\\n8\\n9\\n10\\n11\\n12\\n13\\n14\\n15\\nwriting Json..\\nTime: 32050ms\\n'\n", "/mnt/md0/yumin/to_upload/UNAGI/tutorials\n", "b''\n", "b''\n", "b''\n", "idrem Done\n", "getting TFs from 3-0-0n11-0n10n15n16n22.txt_viz\n", "getting TFs from 5-4-4-3n9.txt_viz\n", "getting TFs from 9-14-12-4.txt_viz\n", "getting TFs from 6-10-5-6.txt_viz\n", "getting TFs from 7-8n17-6n7-7.txt_viz\n", "getting Target genes from 3-0-0n11-0n10n15n16n22.txt_viz\n", "getting Target genes from 5-4-4-3n9.txt_viz\n", "getting Target genes from 9-14-12-4.txt_viz\n", "getting Target genes from 6-10-5-6.txt_viz\n", "getting Target genes from 7-8n17-6n7-7.txt_viz\n", "number of idrem file 0\n", "stage 1\n", "number of idrem file 1\n", "stage 1\n", "number of idrem file 2\n", "stage 1\n", "number of idrem file 3\n", "stage 1\n", "number of idrem file 4\n", "stage 1\n", "number of idrem file 4\n", "stage 1\n", "number of idrem file 0\n", "stage 2\n", "number of idrem file 0\n", "stage 2\n", "number of idrem file 1\n", "stage 2\n", "number of idrem file 2\n", "stage 2\n", "number of idrem file 3\n", "stage 2\n", "number of idrem file 4\n", "stage 2\n", "number of idrem file 4\n", "stage 2\n", "number of idrem file 0\n", "stage 3\n", "number of idrem file 0\n", "stage 3\n", "number of idrem file 0\n", "stage 3\n", "number of idrem file 0\n", "stage 3\n", "number of idrem file 0\n", "stage 3\n", "number of idrem file 1\n", "stage 3\n", "number of idrem file 1\n", "stage 3\n", "number of idrem file 2\n", "stage 3\n", "number of idrem file 3\n", "stage 3\n", "number of idrem file 4\n", "stage 3\n", "27646\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "mkdir: cannot create directory ‘../UNAGI/data/example/1’: File exists\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "...\n", "load last iteration model.....\n", "0\n", "loss 3684.8589019604296\n", "[epoch 000] average training loss: 3684.8589\n", "1\n", "loss 2963.5102881867833\n", "[epoch 001] average training loss: 2963.5103\n", "2\n", "loss 2694.852858131081\n", "[epoch 002] average training loss: 2694.8529\n", "3\n", "loss 2546.214904080913\n", "[epoch 003] average training loss: 2546.2149\n", "4\n", "loss 2461.9431945599886\n", "[epoch 004] average training loss: 2461.9432\n", "(13550, 2484)\n", "top gene\n", "done\n" ] } ], "source": [ "unagi.setup_training('example',dist='ziln',device='cuda:0',GPU=True,epoch_iter=5,epoch_initial=1,max_iter=3,BATCHSIZE=560)\n", "unagi.run_UNAGI(idrem_dir = '../../idrem')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 3: Perform in-silico perturbations and downstream analysis\n", "\n", "After training the UNAGI model, you can perfrom downstream tasks including hierarchical static marker discovries \n", "parameters: \n", "data_path: the directory of the dataset generated by UNAGI\n", "iteration: the iteration of the dataset belongs to\n", "progressionmarker_background_sampling_times: the number of sampling times to generate the dynamic marker backgrounds\n", "target_dir: the directory to store the downstream analysis results and h5ad files\n", "customized_drug: the directory to customized drug profile\n", "cmap_dir: the directory to the precomputed CMAP database which contains the drug/compounds and their regualted genes and regualated directions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings('ignore')\n", "from UNAGI import UNAGI\n", "unagi = UNAGI()\n", "unagi.analyse_UNAGI('../UNAGI/data/example/2/stagedata/org_dataset.h5ad',2,10,target_dir=None,customized_drug='../UNAGI/data/jasper_target_pair.npy',cmap_dir='../../CMAPDirectionDf.npy')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "wks", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 2 }