updated notebook with correct usage

Browse files

Files changed (1) hide show

test_ablang2_HF_implementation.ipynb +32 -63

test_ablang2_HF_implementation.ipynb CHANGED Viewed

@@ -10,14 +10,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
    "id": "7ae54cd0-6253-46dd-a316-4f20b12041e0",
    "metadata": {},
    "outputs": [],
    "source": [
-    "import numpy as np \n",
-    "from transformers import AutoTokenizer, AutoModel\n",
-    "from ablang2.adapter import AbLang2PairedHuggingFaceAdapter"
    ]
   },
   {
@@ -38,7 +40,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
    "id": "99192978-a008-4a32-a80e-bba238e0ec7c",
    "metadata": {},
    "outputs": [],
@@ -85,8 +87,17 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "model = AutoModel.from_pretrained(\"/hemantn/ablang2/\", trust_remote_code=True)\n",
-    "tokenizer = AutoTokenizer.from_pretrained(\"/hemantn/ablang2/\", trust_remote_code=True)\n",
     "ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)"
    ]
   },
@@ -120,7 +131,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
    "id": "ceae4a88-0679-4704-8bad-c06a4569c497",
    "metadata": {},
    "outputs": [],
@@ -145,7 +156,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
    "id": "d22f4302-1262-4cc1-8a1c-a36daa8c710c",
    "metadata": {},
    "outputs": [
@@ -164,7 +175,7 @@
        "        -0.16615383, -0.15569784]], shape=(5, 480))"
       ]
      },
-     "execution_count": 6,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -189,7 +200,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
    "id": "6227f661-575f-4b1e-9646-cfba7b10c3b4",
    "metadata": {},
    "outputs": [
@@ -263,7 +274,7 @@
        "          0.24998347, -0.35954213]], shape=(238, 480), dtype=float32)]"
       ]
      },
-     "execution_count": 7,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -288,7 +299,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
    "id": "e4bc0cb1-f5b0-4255-9e93-d643ae1396df",
    "metadata": {},
    "outputs": [
@@ -450,7 +461,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
    "id": "83f3064b-48a7-42fb-ba82-ec153ea946da",
    "metadata": {},
    "outputs": [
@@ -460,7 +471,7 @@
        "array([1.96673731, 2.04801253, 2.09881898, 1.82533665, 1.97255249])"
       ]
      },
-     "execution_count": 10,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -472,7 +483,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
    "id": "42cc8b34-5ae9-4857-93fe-a438a0f2a868",
    "metadata": {},
    "outputs": [
@@ -483,7 +494,7 @@
        "      dtype=float32)"
       ]
      },
-     "execution_count": 11,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -505,7 +516,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
    "id": "2d5b725c-4eac-4a4b-9331-357c3ac140f7",
    "metadata": {},
    "outputs": [
@@ -518,7 +529,7 @@
        "      dtype='<U238')"
       ]
      },
-     "execution_count": 12,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -530,7 +541,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
    "id": "0e9615f7-c490-4947-96f4-7617266c686e",
    "metadata": {},
    "outputs": [
@@ -543,7 +554,7 @@
        "      dtype='<U238')"
       ]
      },
-     "execution_count": 13,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -560,48 +571,6 @@
    "metadata": {},
    "outputs": [],
    "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "id": "98956ca9",
-   "metadata": {},
-   "source": [
-    "## **rescoding / likelihood / probability**\n",
-    "\n",
-    "The rescodings represents each residue as a 480 sized embedding. The likelihoods represents each residue as the predicted logits for each character in the vocabulary. The probabilities represents the normalised likelihoods.\n",
-    "\n",
-    "**NB:** The output includes extra tokens (start, stop and separation tokens) in the format \"<VH_seq>|<VL_seq>\". The length of the output is therefore 5 longer than the VH and VL.\n",
-    "\n",
-    "**NB:** By default the representations are derived using a single forward pass. To prevent the predicted likelihood and probability to be affected by the input residue at each position, setting the \"stepwise_masking\" argument to True can be used. This will run a forward pass for each position with the residue at that position masked. This is much slower than running a single forward pass."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b046ae57",
-   "metadata": {},
-   "source": [
-    "## **rescoding / likelihood / probability**\n",
-    "\n",
-    "The rescodings represents each residue as a 480 sized embedding. The likelihoods represents each residue as the predicted logits for each character in the vocabulary. The probabilities represents the normalised likelihoods.\n",
-    "\n",
-    "**NB:** The output includes extra tokens (start, stop and separation tokens) in the format \"<VH_seq>|<VL_seq>\". The length of the output is therefore 5 longer than the VH and VL.\n",
-    "\n",
-    "**NB:** By default the representations are derived using a single forward pass. To prevent the predicted likelihood and probability to be affected by the input residue at each position, setting the \"stepwise_masking\" argument to True can be used. This will run a forward pass for each position with the residue at that position masked. This is much slower than running a single forward pass."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "78ccf7d8",
-   "metadata": {},
-   "source": [
-    "## **rescoding / likelihood / probability**\n",
-    "\n",
-    "The rescodings represents each residue as a 480 sized embedding. The likelihoods represents each residue as the predicted logits for each character in the vocabulary. The probabilities represents the normalised likelihoods.\n",
-    "\n",
-    "**NB:** The output includes extra tokens (start, stop and separation tokens) in the format \"<VH_seq>|<VL_seq>\". The length of the output is therefore 5 longer than the VH and VL.\n",
-    "\n",
-    "**NB:** By default the representations are derived using a single forward pass. To prevent the predicted likelihood and probability to be affected by the input residue at each position, setting the \"stepwise_masking\" argument to True can be used. This will run a forward pass for each position with the residue at that position masked. This is much slower than running a single forward pass."
-   ]
   }
  ],
  "metadata": {

   },
   {
    "cell_type": "code",
+   "execution_count": 11,
    "id": "7ae54cd0-6253-46dd-a316-4f20b12041e0",
    "metadata": {},
    "outputs": [],
    "source": [
+    "import sys\n",
+    "import os\n",
+    "import numpy as np\n",
+    "from transformers import AutoModel, AutoTokenizer\n",
+    "from transformers.utils import cached_file"
    ]
   },
   {
   },
   {
    "cell_type": "code",
+   "execution_count": 6,
    "id": "99192978-a008-4a32-a80e-bba238e0ec7c",
    "metadata": {},
    "outputs": [],
    "metadata": {},
    "outputs": [],
    "source": [
+    "# Load model and tokenizer from Hugging Face Hub\n",
+    "model = AutoModel.from_pretrained(\"hemantn/ablang2\", trust_remote_code=True)\n",
+    "tokenizer = AutoTokenizer.from_pretrained(\"hemantn/ablang2\", trust_remote_code=True)\n",
+    "\n",
+    "# Find the cached model directory and import adapter\n",
+    "adapter_path = cached_file(\"hemantn/ablang2\", \"adapter.py\")\n",
+    "cached_model_dir = os.path.dirname(adapter_path)\n",
+    "sys.path.insert(0, cached_model_dir)\n",
+    "\n",
+    "# Import and create the adapter\n",
+    "from adapter import AbLang2PairedHuggingFaceAdapter\n",
     "ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)"
    ]
   },
   },
   {
    "cell_type": "code",
+   "execution_count": 7,
    "id": "ceae4a88-0679-4704-8bad-c06a4569c497",
    "metadata": {},
    "outputs": [],
   },
   {
    "cell_type": "code",
+   "execution_count": 8,
    "id": "d22f4302-1262-4cc1-8a1c-a36daa8c710c",
    "metadata": {},
    "outputs": [
        "        -0.16615383, -0.15569784]], shape=(5, 480))"
       ]
      },
+     "execution_count": 8,
      "metadata": {},
      "output_type": "execute_result"
     }
   },
   {
    "cell_type": "code",
+   "execution_count": 9,
    "id": "6227f661-575f-4b1e-9646-cfba7b10c3b4",
    "metadata": {},
    "outputs": [
        "          0.24998347, -0.35954213]], shape=(238, 480), dtype=float32)]"
       ]
      },
+     "execution_count": 9,
      "metadata": {},
      "output_type": "execute_result"
     }
   },
   {
    "cell_type": "code",
+   "execution_count": 10,
    "id": "e4bc0cb1-f5b0-4255-9e93-d643ae1396df",
    "metadata": {},
    "outputs": [
   },
   {
    "cell_type": "code",
+   "execution_count": 12,
    "id": "83f3064b-48a7-42fb-ba82-ec153ea946da",
    "metadata": {},
    "outputs": [
        "array([1.96673731, 2.04801253, 2.09881898, 1.82533665, 1.97255249])"
       ]
      },
+     "execution_count": 12,
      "metadata": {},
      "output_type": "execute_result"
     }
   },
   {
    "cell_type": "code",
+   "execution_count": 13,
    "id": "42cc8b34-5ae9-4857-93fe-a438a0f2a868",
    "metadata": {},
    "outputs": [
        "      dtype=float32)"
       ]
      },
+     "execution_count": 13,
      "metadata": {},
      "output_type": "execute_result"
     }
   },
   {
    "cell_type": "code",
+   "execution_count": 14,
    "id": "2d5b725c-4eac-4a4b-9331-357c3ac140f7",
    "metadata": {},
    "outputs": [
        "      dtype='<U238')"
       ]
      },
+     "execution_count": 14,
      "metadata": {},
      "output_type": "execute_result"
     }
   },
   {
    "cell_type": "code",
+   "execution_count": 15,
    "id": "0e9615f7-c490-4947-96f4-7617266c686e",
    "metadata": {},
    "outputs": [
        "      dtype='<U238')"
       ]
      },
+     "execution_count": 15,
      "metadata": {},
      "output_type": "execute_result"
     }
    "metadata": {},
    "outputs": [],
    "source": []
   }
  ],
  "metadata": {