{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# In-silico perturbation for customized pathway databases\n", "UNAGI allows users to use their own pathway databases or systematically perturb some pathways of their interests. Here is the guidance to show the format of pathway database that UNAGI can recognize." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Built-in pathway databases \n", "We downloaded pathway database from [GSEA](https://www.gsea-msigdb.org/gsea/index.jsp), it includes pathways from REACTOME, MatrisomeDB, and KEGG. Then the database was parsed into a `.npy` file. The format of built-in pathway database is shown as below:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['BIOCARTA_GRANULOCYTES_PATHWAY', 'BIOCARTA_LYM_PATHWAY', 'BIOCARTA_BLYMPHOCYTE_PATHWAY', 'BIOCARTA_CARM_ER_PATHWAY', 'BIOCARTA_LAIR_PATHWAY', 'BIOCARTA_VDR_PATHWAY', 'BIOCARTA_MTA3_PATHWAY', 'BIOCARTA_GABA_PATHWAY', 'BIOCARTA_EGFR_SMRTE_PATHWAY', 'BIOCARTA_MONOCYTE_PATHWAY']\n", "['CXCL8', 'IFNG', 'IL1A', 'CSF3', 'SELP', 'ITGAM', 'ITGAL', 'TNF', 'ITGB2', 'PECAM1', 'ICAM2', 'C5', 'SELPLG', 'ICAM1', 'SELL']\n" ] } ], "source": [ "import numpy as np\n", "built_in_pathway_data = np.load('../data/gesa_pathways.npy',allow_pickle=True).item()\n", "\n", "#The keys are the pathway names\n", "print(list(built_in_pathway_data.keys())[:10]) # show first 10 pathways\n", "#The values are the gene sets\n", "print(built_in_pathway_data['BIOCARTA_GRANULOCYTES_PATHWAY'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Customize pathway database\n", "As long as the pathway database follows the previous `.npy` file format, UNAGI is able to recognize the customized database. Here is an example: " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "customized_pathway_database = {}\n", "customized_pathway_database['Pathway_A'] = ['COL6A3', 'MET', 'COL7A1', 'MMP1', 'COL11A1', 'COL1A2', 'COL5A2', 'COL4A3', 'COL12A1', 'COL10A1', 'COL5A1', 'COL3A1', 'COL4A4', 'COL14A1', 'COL8A1', 'MMP9', 'COL4A1', 'MMP7', 'COL15A1', 'COL1A1', 'COL17A1', 'COL4A6']\n", "customized_pathway_database['Pathway_B'] = ['MAP2','THBS1',]\n", "np.save('customized_pathway_database.npy',customized_pathway_database)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Perform in-silico pathway perturbation on the customized pathway database\n", "Genes that are not overlapped with the input single data will be ignored." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/yumin/anaconda3/envs/test_unagi/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "calculateDataPathwayOverlapGene done\n", "Start perturbation....\n", "track: 10\n", "processing....\n", "track: 11\n", "processing....\n", "track: 0\n", "processing....\n", "track: 1\n", "processing....\n", "track: 4\n", "processing....\n", "track: 8\n", "processing....\n", "track: 9\n", "processing....\n", "track: 3\n", "processing....\n", "track: 5\n", "processing....\n", "track: 6\n", "processing....\n", "track: 2\n", "processing....\n", "track: 7\n", "processing....\n", "random background done\n", "Finish results analysis\n" ] } ], "source": [ "from UNAGI import UNAGI\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")\n", "unagi = UNAGI()\n", "data_path = 'PATH_TO_TARGET/dataset.h5ad'\n", "iteration = 0 #which iteration of the model to use\n", "change_level = 0.5 #reduce the expression to 50% of the original value\n", "customized_pathway = 'PATH_TO/customized_pathway_database.npy'\n", "results = unagi.customize_pathway_perturbation(data_path,iteration,customized_pathway,change_level,target_dir='PATH_TO_TARGET',device='cuda:0')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pathwaysperturbation scorepval_adjustedregulated genesidrem_suggestion
0Pathway_A0.4736840.000713[COL6A3, MET, COL7A1, MMP1, COL11A1, COL1A2, C...[COL6A3:-, MET:-, COL7A1:-, MMP1:-, COL11A1:-,...
\n", "
" ], "text/plain": [ " pathways perturbation score pval_adjusted \\\n", "0 Pathway_A 0.473684 0.000713 \n", "\n", " regulated genes \\\n", "0 [COL6A3, MET, COL7A1, MMP1, COL11A1, COL1A2, C... \n", "\n", " idrem_suggestion \n", "0 [COL6A3:-, MET:-, COL7A1:-, MMP1:-, COL11A1:-,... " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ " from UNAGI.perturbations import get_top_pathways\n", "get_top_pathways(results, change_level, top_n=10)" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }