Skip to content

Commit

Permalink
edit imbalanced-learn
Browse files Browse the repository at this point in the history
  • Loading branch information
khuyentran1401 committed Nov 4, 2024
1 parent 2112273 commit 812db9b
Show file tree
Hide file tree
Showing 8 changed files with 249 additions and 402 deletions.
264 changes: 80 additions & 184 deletions Chapter5/machine_learning.ipynb

Large diffs are not rendered by default.

33 changes: 29 additions & 4 deletions Chapter6/logging_debugging.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -357,14 +357,22 @@
"### Simplify Python Logging with Loguru"
]
},
{
"cell_type": "markdown",
"id": "0a74f279-828b-4d57-90ae-e78a8e4a2340",
"metadata": {},
"source": [
"Have you ever found yourself using print() instead of a proper logger due to the hassle of setup?\n",
"\n",
"With Loguru, you can get started with logging right away. A single import is all you need to begin logging with pre-configured color and format settings.\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6f30417a",
"metadata": {},
"source": [
"Are you struggling with the complexity of configuring a logger object before logging in Python? With Loguru, you can skip this step and use the logger object directly with pre-built color and format settings.\n",
"\n",
"Here is the comparison between the standard Python logging library and Loguru:"
]
},
Expand Down Expand Up @@ -1452,7 +1460,24 @@
},
{
"data": {
"application/javascript": "\n setTimeout(function() {\n var nbb_cell_id = 17;\n var nbb_unformatted_code = \"from tqdm.notebook import tqdm\\nfrom time import sleep\\n\\n\\ndef lower(word):\\n sleep(1)\\n print(f\\\"Processing {word}\\\")\\n return word.lower()\\n\\n\\nwords = tqdm([\\\"Duck\\\", \\\"dog\\\", \\\"Flower\\\", \\\"fan\\\"])\\n\\n[lower(word) for word in words]\";\n var nbb_formatted_code = \"from tqdm.notebook import tqdm\\nfrom time import sleep\\n\\n\\ndef lower(word):\\n sleep(1)\\n print(f\\\"Processing {word}\\\")\\n return word.lower()\\n\\n\\nwords = tqdm([\\\"Duck\\\", \\\"dog\\\", \\\"Flower\\\", \\\"fan\\\"])\\n\\n[lower(word) for word in words]\";\n var nbb_cells = Jupyter.notebook.get_cells();\n for (var i = 0; i < nbb_cells.length; ++i) {\n if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n nbb_cells[i].set_text(nbb_formatted_code);\n }\n break;\n }\n }\n }, 500);\n ",
"application/javascript": [
"\n",
" setTimeout(function() {\n",
" var nbb_cell_id = 17;\n",
" var nbb_unformatted_code = \"from tqdm.notebook import tqdm\\nfrom time import sleep\\n\\n\\ndef lower(word):\\n sleep(1)\\n print(f\\\"Processing {word}\\\")\\n return word.lower()\\n\\n\\nwords = tqdm([\\\"Duck\\\", \\\"dog\\\", \\\"Flower\\\", \\\"fan\\\"])\\n\\n[lower(word) for word in words]\";\n",
" var nbb_formatted_code = \"from tqdm.notebook import tqdm\\nfrom time import sleep\\n\\n\\ndef lower(word):\\n sleep(1)\\n print(f\\\"Processing {word}\\\")\\n return word.lower()\\n\\n\\nwords = tqdm([\\\"Duck\\\", \\\"dog\\\", \\\"Flower\\\", \\\"fan\\\"])\\n\\n[lower(word) for word in words]\";\n",
" var nbb_cells = Jupyter.notebook.get_cells();\n",
" for (var i = 0; i < nbb_cells.length; ++i) {\n",
" if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n",
" if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n",
" nbb_cells[i].set_text(nbb_formatted_code);\n",
" }\n",
" break;\n",
" }\n",
" }\n",
" }, 500);\n",
" "
],
"text/plain": [
"<IPython.core.display.Javascript object>"
]
Expand Down Expand Up @@ -1701,7 +1726,7 @@
"celltoolbar": "Tags",
"hide_input": false,
"kernelspec": {
"display_name": "venv",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand Down
32 changes: 18 additions & 14 deletions docs/Chapter5/machine_learning.html
Original file line number Diff line number Diff line change
Expand Up @@ -1227,28 +1227,25 @@ <h2><span class="section-number">6.5.10. </span>imbalanced-learn: Deal with an I
<span class="expanded">Hide code cell content</span>
</summary>
<div class="cell_input docutils container">
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="o">!</span>pip<span class="w"> </span>install<span class="w"> </span>imbalanced-learn<span class="o">==</span><span class="m">0</span>.10.0<span class="w"> </span><span class="nv">mlxtend</span><span class="o">==</span><span class="m">0</span>.21.0
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="o">!</span>pip<span class="w"> </span>install<span class="w"> </span>imbalanced-learn<span class="o">==</span><span class="m">0</span>.10.0<span class="w"> </span><span class="nv">mlxtend</span><span class="o">==</span><span class="m">0</span>.21.0<span class="w"> </span>scikit-learn<span class="o">==</span><span class="m">1</span>.2.2
</pre></div>
</div>
</div>
</details>
</div>
<p>To address issues with imbalanced datasets, where one class significantly outweighs others, you can use the <code class="docutils literal notranslate"><span class="pre">imbalanced-learn</span></code> library to generate additional samples for under-represented classes.</p>
<p>Here’s how you can use the <code class="docutils literal notranslate"><span class="pre">RandomOverSampler</span></code> from <code class="docutils literal notranslate"><span class="pre">imbalanced-learn</span></code> to create a balanced dataset by oversampling the minority class:</p>
<p>In machine learning, imbalanced datasets can lead to biased models that perform poorly on minority classes. This is particularly problematic in critical applications like fraud detection or disease diagnosis.</p>
<p>With imbalanced-learn, you can rebalance your dataset using various sampling techniques that work seamlessly with scikit-learn.</p>
<p>To demonstrate this, let’s generate a sample dataset with 5000 samples, 2 features, and 4 classes:</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Libraries for plotting</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="nn">sns</span>
<span class="kn">from</span> <span class="nn">mlxtend.plotting</span> <span class="kn">import</span> <span class="n">plot_decision_regions</span>
<span class="kn">import</span> <span class="nn">matplotlib.gridspec</span> <span class="k">as</span> <span class="nn">gridspec</span>

<span class="c1"># Libraries for machine learning</span>
<span class="kn">from</span> <span class="nn">sklearn.datasets</span> <span class="kn">import</span> <span class="n">make_classification</span>
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">LinearSVC</span>
<span class="kn">import</span> <span class="nn">warnings</span>

<span class="n">warnings</span><span class="o">.</span><span class="n">simplefilter</span><span class="p">(</span><span class="s2">&quot;ignore&quot;</span><span class="p">,</span> <span class="ne">UserWarning</span><span class="p">)</span>
<span class="kn">from</span> <span class="nn">imblearn.over_sampling</span> <span class="kn">import</span> <span class="n">RandomOverSampler</span>
</pre></div>
</div>
</div>
Expand All @@ -1271,17 +1268,16 @@ <h2><span class="section-number">6.5.10. </span>imbalanced-learn: Deal with an I
</div>
</div>
</div>
<p>Resample the dataset using the <code class="docutils literal notranslate"><span class="pre">RandomOverSampler</span></code> class from imbalanced-learn to balance the class distribution. This technique works by duplicating minority samples until they match the majority class.</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">imblearn.over_sampling</span> <span class="kn">import</span> <span class="n">RandomOverSampler</span>


<span class="n">ros</span> <span class="o">=</span> <span class="n">RandomOverSampler</span><span class="p">(</span><span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">ros</span> <span class="o">=</span> <span class="n">RandomOverSampler</span><span class="p">(</span><span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">X_resampled</span><span class="p">,</span> <span class="n">y_resampled</span> <span class="o">=</span> <span class="n">ros</span><span class="o">.</span><span class="n">fit_resample</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<p>Plot the decision regions of the dataset before and after resampling using a LinearSVC classifier:</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Plotting Decision Regions</span>
Expand All @@ -1295,15 +1291,23 @@ <h2><span class="section-number">6.5.10. </span>imbalanced-learn: Deal with an I
<span class="p">):</span>
<span class="n">clf</span> <span class="o">=</span> <span class="n">LinearSVC</span><span class="p">()</span>
<span class="n">clf</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">Xi</span><span class="p">,</span> <span class="n">yi</span><span class="p">)</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plot_decision_regions</span><span class="p">(</span><span class="n">X</span><span class="o">=</span><span class="n">Xi</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">yi</span><span class="p">,</span> <span class="n">clf</span><span class="o">=</span><span class="n">clf</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">colors</span><span class="o">=</span><span class="s1">&#39;#A3D9B1,#06B1CF,#F8D347,#E48789&#39;</span><span class="p">)</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plot_decision_regions</span><span class="p">(</span><span class="n">X</span><span class="o">=</span><span class="n">Xi</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">yi</span><span class="p">,</span> <span class="n">clf</span><span class="o">=</span><span class="n">clf</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">colors</span><span class="o">=</span><span class="s1">&#39;#E583B6,#72FCDB,#72BEFA,#FFFF99&#39;</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="n">title</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">set_title</span><span class="p">(</span><span class="n">title</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s1">&#39;#000000&#39;</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="cell_output docutils container">
<img alt="../_images/63d1f487cb4b70c31a65541021dd7a53a6786273a8d698f5e3c373f190e97dc6.png" src="../_images/63d1f487cb4b70c31a65541021dd7a53a6786273a8d698f5e3c373f190e97dc6.png" />
<div class="output stderr highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>/Users/khuyentran/book/venv/lib/python3.11/site-packages/mlxtend/plotting/decision_regions.py:300: UserWarning: You passed a edgecolor/edgecolors (&#39;black&#39;) for an unfilled marker (&#39;x&#39;). Matplotlib is ignoring the edgecolor in favor of the facecolor. This behavior may change in the future.
ax.scatter(
/Users/khuyentran/book/venv/lib/python3.11/site-packages/mlxtend/plotting/decision_regions.py:300: UserWarning: You passed a edgecolor/edgecolors (&#39;black&#39;) for an unfilled marker (&#39;x&#39;). Matplotlib is ignoring the edgecolor in favor of the facecolor. This behavior may change in the future.
ax.scatter(
</pre></div>
</div>
<img alt="../_images/e43620bfb9013971379f44ef7fd72ada3ee7a94c58e698b515b15ee4101cfa2c.png" src="../_images/e43620bfb9013971379f44ef7fd72ada3ee7a94c58e698b515b15ee4101cfa2c.png" />
</div>
</div>
<p>The plot reveals that the resampling process has added more data points to the minority class (green), effectively balancing the class distribution.</p>
<p><a class="reference external" href="https://github.com/scikit-learn-contrib/imbalanced-learn">Link to imbalanced-learn</a>.</p>
</section>
<section id="estimate-prediction-intervals-in-scikit-learn-models-with-mapie">
Expand Down
23 changes: 12 additions & 11 deletions docs/Chapter6/logging_debugging.html
Original file line number Diff line number Diff line change
Expand Up @@ -211,17 +211,17 @@
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/get_elements.html">2.3.1. Get Elements</a></li>
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/unpack_iterables.html">2.3.2. Unpack Iterables</a></li>
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/join_iterable.html">2.3.3. Join Iterables</a></li>
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/interaction_between_2_lists.html">2.3.4. Interaction Between 2 Lists</a></li>
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/apply_functions_to_elements.html">2.3.5. Apply Functions to Elements in a List</a></li>
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/apply_functions_to_elements.html">2.3.4. Apply Functions to Elements in a List</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/dictionary.html">2.4. Dictionary</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/function.html">2.5. Function</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/class.html">2.6. Classes</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/datetime.html">2.7. Datetime</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/code_speed.html">2.8. Code Speed</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/good_practices.html">2.9. Good Python Practices</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/python_new_features.html">2.10. New Features in Python</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/set.html">2.4. Set</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/dictionary.html">2.5. Dictionary</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/function.html">2.6. Function</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/class.html">2.7. Classes</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/datetime.html">2.8. Datetime</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/code_speed.html">2.9. Code Speed</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/good_practices.html">2.10. Good Python Practices</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/python_new_features.html">2.11. New Features in Python</a></li>
</ul>
</li>
<li class="toctree-l1 has-children"><a class="reference internal" href="../Chapter2/Chapter2.html">3. Python Utility Libraries</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" type="checkbox"/><label class="toctree-toggle" for="toctree-checkbox-3"><i class="fa-solid fa-chevron-down"></i></label><ul>
Expand Down Expand Up @@ -271,7 +271,7 @@
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/better_pandas.html">6.12. Better Pandas</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/testing.html">6.13. Testing</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/SQL.html">6.14. SQL Libraries</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/spark.html">6.15. PySpark</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/spark.html">6.15. 3 Powerful Ways to Create PySpark DataFrames</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/llm.html">6.16. Large Language Model (LLM)</a></li>
</ul>
</li>
Expand Down Expand Up @@ -712,7 +712,8 @@ <h2><span class="section-number">7.4.2. </span>Rich’s Console: Debug your Pyth
</section>
<section id="simplify-python-logging-with-loguru">
<h2><span class="section-number">7.4.3. </span>Simplify Python Logging with Loguru<a class="headerlink" href="#simplify-python-logging-with-loguru" title="Permalink to this heading">#</a></h2>
<p>Are you struggling with the complexity of configuring a logger object before logging in Python? With Loguru, you can skip this step and use the logger object directly with pre-built color and format settings.</p>
<p>Have you ever found yourself using print() instead of a proper logger due to the hassle of setup?</p>
<p>With Loguru, you can get started with logging right away. A single import is all you need to begin logging with pre-configured color and format settings.</p>
<p>Here is the comparison between the standard Python logging library and Loguru:</p>
<p>Standard Python logging library:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># loguru_vs_logging/logging_example.py</span>
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 812db9b

Please sign in to comment.