diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..abe33cf
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,5 @@
+# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
+
+# files
+.idea
+
diff --git a/.idea/.gitignore b/.idea/.gitignore
new file mode 100644
index 0000000..26d3352
--- /dev/null
+++ b/.idea/.gitignore
@@ -0,0 +1,3 @@
+# Default ignored files
+/shelf/
+/workspace.xml
diff --git a/.idea/inspectionProfiles/Project_Default.xml b/.idea/inspectionProfiles/Project_Default.xml
new file mode 100644
index 0000000..debf80d
--- /dev/null
+++ b/.idea/inspectionProfiles/Project_Default.xml
@@ -0,0 +1,14 @@
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/.idea/inspectionProfiles/profiles_settings.xml b/.idea/inspectionProfiles/profiles_settings.xml
new file mode 100644
index 0000000..105ce2d
--- /dev/null
+++ b/.idea/inspectionProfiles/profiles_settings.xml
@@ -0,0 +1,6 @@
+
+
+
+
+
+
\ No newline at end of file
diff --git a/.idea/lihang-code.iml b/.idea/lihang-code.iml
new file mode 100644
index 0000000..d0876a7
--- /dev/null
+++ b/.idea/lihang-code.iml
@@ -0,0 +1,8 @@
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/.idea/misc.xml b/.idea/misc.xml
new file mode 100644
index 0000000..7966d0c
--- /dev/null
+++ b/.idea/misc.xml
@@ -0,0 +1,4 @@
+
+
+
+
\ No newline at end of file
diff --git a/.idea/modules.xml b/.idea/modules.xml
new file mode 100644
index 0000000..423e47e
--- /dev/null
+++ b/.idea/modules.xml
@@ -0,0 +1,8 @@
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/.idea/vcs.xml b/.idea/vcs.xml
new file mode 100644
index 0000000..35eb1dd
--- /dev/null
+++ b/.idea/vcs.xml
@@ -0,0 +1,6 @@
+
+
+
+
+
+
\ No newline at end of file
diff --git "a/\347\254\25401\347\253\240 \347\273\237\350\256\241\345\255\246\344\271\240\346\226\271\346\263\225\346\246\202\350\256\272/.ipynb_checkpoints/1.Introduction_to_statistical_learning_methods-checkpoint.ipynb" "b/\347\254\25401\347\253\240 \347\273\237\350\256\241\345\255\246\344\271\240\346\226\271\346\263\225\346\246\202\350\256\272/.ipynb_checkpoints/1.Introduction_to_statistical_learning_methods-checkpoint.ipynb"
new file mode 100644
index 0000000..f4b6929
--- /dev/null
+++ "b/\347\254\25401\347\253\240 \347\273\237\350\256\241\345\255\246\344\271\240\346\226\271\346\263\225\346\246\202\350\256\272/.ipynb_checkpoints/1.Introduction_to_statistical_learning_methods-checkpoint.ipynb"
@@ -0,0 +1,511 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 第1章 统计学习方法概论"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "1.统计学习是关于计算机基于数据构建概率统计模型并运用模型对数据进行分析与预测的一门学科。统计学习包括监督学习、非监督学习、半监督学习和强化学习。\n",
+ "\n",
+ "2.统计学习方法三要素——模型、策略、算法,对理解统计学习方法起到提纲挈领的作用。\n",
+ "\n",
+ "3.本书主要讨论监督学习,监督学习可以概括如下:从给定有限的训练数据出发, 假设数据是独立同分布的,而且假设模型属于某个假设空间,应用某一评价准则,从假设空间中选取一个最优的模型,使它对已给训练数据及未知测试数据在给定评价标准意义下有最准确的预测。\n",
+ "\n",
+ "4.统计学习中,进行模型选择或者说提高学习的泛化能力是一个重要问题。如果只考虑减少训练误差,就可能产生过拟合现象。模型选择的方法有正则化与交叉验证。学习方法泛化能力的分析是统计学习理论研究的重要课题。\n",
+ "\n",
+ "5.分类问题、标注问题和回归问题都是监督学习的重要问题。本书中介绍的统计学习方法包括感知机、$k$近邻法、朴素贝叶斯法、决策树、逻辑斯谛回归与最大熵模型、支持向量机、提升方法、EM算法、隐马尔可夫模型和条件随机场。这些方法是主要的分类、标注以及回归方法。它们又可以归类为生成方法与判别方法。\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 使用最小二乘法拟和曲线\n",
+ "\n",
+ "高斯于1823年在误差$e_1,…,e_n$独立同分布的假定下,证明了最小二乘方法的一个最优性质: 在所有无偏的线性估计类中,最小二乘方法是其中方差最小的!\n",
+ "对于数据$(x_i, y_i) (i=1, 2, 3...,m)$\n",
+ "\n",
+ "拟合出函数$h(x)$\n",
+ "\n",
+ "有误差,即残差:$r_i=h(x_i)-y_i$\n",
+ "\n",
+ "此时$L2$范数(残差平方和)最小时,$h(x)$ 和 $y$ 相似度最高,更拟合"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "一般的$H(x)$为$n$次的多项式,$H(x)=w_0+w_1x+w_2x^2+...w_nx^n$\n",
+ "\n",
+ "$w(w_0,w_1,w_2,...,w_n)$为参数\n",
+ "\n",
+ "最小二乘法就是要找到一组 $w(w_0,w_1,w_2,...,w_n)$ ,使得$\\sum_{i=1}^n(h(x_i)-y_i)^2$ (残差平方和) 最小\n",
+ "\n",
+ "即,求 $min\\sum_{i=1}^n(h(x_i)-y_i)^2$\n",
+ "\n",
+ "----"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "举例:我们用目标函数$y=sin2{\\pi}x$, 加上一个正态分布的噪音干扰,用多项式去拟合【例1.1 11页】"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import scipy as sp\n",
+ "from scipy.optimize import leastsq\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "* ps: numpy.poly1d([1,2,3]) 生成 $1x^2+2x^1+3x^0$*"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# 目标函数\n",
+ "def real_func(x):\n",
+ " return np.sin(2*np.pi*x)\n",
+ "\n",
+ "# 多项式\n",
+ "def fit_func(p, x):\n",
+ " f = np.poly1d(p)\n",
+ " return f(x)\n",
+ "\n",
+ "# 残差\n",
+ "def residuals_func(p, x, y):\n",
+ " ret = fit_func(p, x) - y\n",
+ " return ret"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[0. 0.11111111 0.22222222 0.33333333 0.44444444 0.55555556\n",
+ " 0.66666667 0.77777778 0.88888889 1. ] [-0.13279908173173613, 0.7367746341452334, 1.0045132782564739, 1.0297101571118858, 0.41600828736613554, -0.26374521862130107, -0.8323785925492532, -1.0074937559168053, -0.8772043052086325, 0.03821360535661758]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 十个点\n",
+ "x = np.linspace(0, 1, 10)\n",
+ "x_points = np.linspace(0, 1, 1000)\n",
+ "# 加上正态分布噪音的目标函数的值\n",
+ "y_ = real_func(x)\n",
+ "y = [np.random.normal(0, 0.1) + y1 for y1 in y_]\n",
+ "print(x,y)\n",
+ "\n",
+ "def fitting(M=0):\n",
+ " \"\"\"\n",
+ " M 为 多项式的次数\n",
+ " \"\"\"\n",
+ " # 随机初始化多项式参数\n",
+ " p_init = np.random.rand(M + 1)\n",
+ " print(\"parameters initialization: \", p_init)\n",
+ " # 最小二乘法\n",
+ " p_lsq = leastsq(residuals_func, p_init, args=(x, y))\n",
+ " print('Fitting Parameters:', p_lsq[0])\n",
+ "\n",
+ " # 可视化\n",
+ " plt.plot(x_points, real_func(x_points), label='real')\n",
+ " plt.plot(x_points, fit_func(p_lsq[0], x_points), label='fitted curve')\n",
+ " plt.plot(x, y, 'bo', label='noise')\n",
+ " plt.legend()\n",
+ " return p_lsq"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### M=0"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "parameters initialization: [0.51134405]\n",
+ "Fitting Parameters: [0.0111599]\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# M=0\n",
+ "p_lsq_0 = fitting(M=0)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### M=1"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "parameters initialization: [0.33798747 0.7429498 ]\n",
+ "Fitting Parameters: [-1.42280691 0.72256336]\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# M=1\n",
+ "p_lsq_1 = fitting(M=1)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### M=3 "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "parameters initialization: [0.80748537 0.26355156 0.15225828 0.69983347]\n",
+ "Fitting Parameters: [ 23.51836106 -35.90542618 12.59021898 -0.18343697]\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# M=3\n",
+ "p_lsq_3 = fitting(M=3)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### M=9"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "parameters initialization: [0.91227185 0.61186299 0.8670106 0.46130526 0.24519055 0.66403862\n",
+ " 0.80093522 0.93778575 0.87814087 0.20415735]\n",
+ "Fitting Parameters: [ 1.39123363e+04 -6.39746319e+04 1.24383137e+05 -1.32914986e+05\n",
+ " 8.47957513e+04 -3.27261066e+04 7.35983355e+03 -8.84389244e+02\n",
+ " 4.92269900e+01 -1.32799082e-01]\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# M=9\n",
+ "p_lsq_9 = fitting(M=9)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "当M=9时,多项式曲线通过了每个数据点,但是造成了过拟合"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 正则化"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "结果显示过拟合, 引入正则化项(regularizer),降低过拟合\n",
+ "\n",
+ "$Q(x)=\\sum_{i=1}^n(h(x_i)-y_i)^2+\\frac{\\lambda}{2}||w||^2$。\n",
+ "\n",
+ "回归问题中,损失函数是平方损失,正则化可以是参数向量的L2范数,也可以是L1范数。\n",
+ "\n",
+ "- L1: regularization \\* abs(p)\n",
+ "\n",
+ "- L2: regularization \\* np.sqrt(np.square(p))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "regularization = 0.0001\n",
+ "\n",
+ "\n",
+ "def residuals_func_regularization(p, x, y):\n",
+ " ret = fit_func(p, x) - y\n",
+ " ret = np.append(ret,\n",
+ " 0.5 * regularization * np.square(p)) # L2范数的平方作为正则化项,前面需要乘以0.5\n",
+ " return ret"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# 最小二乘法,加正则化项\n",
+ "p_init = np.random.rand(9 + 1)\n",
+ "p_lsq_regularization = leastsq(\n",
+ " residuals_func_regularization, p_init, args=(x, y))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 28,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plt.plot(x_points, real_func(x_points), label='real')\n",
+ "plt.plot(x_points, fit_func(p_lsq_9[0], x_points), label='fitted curve')\n",
+ "plt.plot(\n",
+ " x_points,\n",
+ " fit_func(p_lsq_regularization[0], x_points),\n",
+ " label='regularization')\n",
+ "plt.plot(x, y, 'bo', label='noise')\n",
+ "plt.legend()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 第1章统计学习方法概论-习题\n",
+ "**撰写人:**胡锐锋-天国之影-Relph\n",
+ "\n",
+ "**github地址:**https://github.com/datawhalechina/statistical-learning-method-solutions-manual\n",
+ "\n",
+ "### 习题1.1\n",
+ " 说明伯努利模型的极大似然估计以及贝叶斯估计中的统计学习方法三要素。伯努利模型是定义在取值为0与1的随机变量上的概率分布。假设观测到伯努利模型$n$次独立的数据生成结果,其中$k$次的结果为1,这时可以用极大似然估计或贝叶斯估计来估计结果为1的概率。\n",
+ "\n",
+ "**解答:**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "伯努利模型的极大似然估计以及贝叶斯估计中的**统计学习方法三要素**如下: \n",
+ "1. **极大似然估计** \n",
+ "**模型:** $\\mathcal{F}=\\{f|f_p(x)=p^x(1-p)^{(1-x)}\\}$ \n",
+ "**策略:** 最大化似然函数 \n",
+ "**算法:** $\\displaystyle \\mathop{\\arg\\min}_{p} L(p)= \\mathop{\\arg\\min}_{p} \\binom{n}{k}p^k(1-p)^{(n-k)}$\n",
+ "2. **贝叶斯估计** \n",
+ "**模型:** $\\mathcal{F}=\\{f|f_p(x)=p^x(1-p)^{(1-x)}\\}$ \n",
+ "**策略:** 求参数期望 \n",
+ "**算法:**\n",
+ "$$\\begin{aligned} E_\\pi\\big[p \\big| y_1,\\cdots,y_n\\big]\n",
+ "& = {\\int_0^1}p\\pi (p|y_1,\\cdots,y_n) dp \\\\\n",
+ "& = {\\int_0^1} p\\frac{f_D(y_1,\\cdots,y_n|p)\\pi(p)}{\\int_{\\Omega}f_D(y_1,\\cdots,y_n|p)\\pi(p)dp}dp \\\\\n",
+ "& = {\\int_0^1}\\frac{p^{k+1}(1-p)^{(n-k)}}{\\int_0^1 p^k(1-p)^{(n-k)}dp}dp\n",
+ "\\end{aligned}$$\n",
+ "\n",
+ "**伯努利模型的极大似然估计:** \n",
+ "定义$P(Y=1)$概率为$p$,可得似然函数为:$$L(p)=f_D(y_1,y_2,\\cdots,y_n|\\theta)=\\binom{n}{k}p^k(1-p)^{(n-k)}$$方程两边同时对$p$求导,则:$$\\begin{aligned}\n",
+ "0 & = \\binom{n}{k}[kp^{k-1}(1-p)^{(n-k)}-(n-k)p^k(1-p)^{(n-k-1)}]\\\\\n",
+ "& = \\binom{n}{k}[p^{(k-1)}(1-p)^{(n-k-1)}(m-kp)]\n",
+ "\\end{aligned}$$可解出$p$的值为$p=0,p=1,p=k/n$,显然$\\displaystyle P(Y=1)=p=\\frac{k}{n}$ \n",
+ "\n",
+ "**伯努利模型的贝叶斯估计:** \n",
+ "定义$P(Y=1)$概率为$p$,$p$在$[0,1]$之间的取值是等概率的,因此先验概率密度函数$\\pi(p) = 1$,可得似然函数为: $$L(p)=f_D(y_1,y_2,\\cdots,y_n|\\theta)=\\binom{n}{k}p^k(1-p)^{(n-k)}$$ \n",
+ "根据似然函数和先验概率密度函数,可以求解$p$的条件概率密度函数:$$\\begin{aligned}\\pi(p|y_1,\\cdots,y_n)&=\\frac{f_D(y_1,\\cdots,y_n|p)\\pi(p)}{\\int_{\\Omega}f_D(y_1,\\cdots,y_n|p)\\pi(p)dp}\\\\\n",
+ "&=\\frac{p^k(1-p)^{(n-k)}}{\\int_0^1p^k(1-p)^{(n-k)}dp}\\\\\n",
+ "&=\\frac{p^k(1-p)^{(n-k)}}{B(k+1,n-k+1)}\n",
+ "\\end{aligned}$$所以$p$的期望为:$$\\begin{aligned}\n",
+ "E_\\pi[p|y_1,\\cdots,y_n]&={\\int}p\\pi(p|y_1,\\cdots,y_n)dp \\\\\n",
+ "& = {\\int_0^1}\\frac{p^{(k+1)}(1-p)^{(n-k)}}{B(k+1,n-k+1)}dp \\\\\n",
+ "& = \\frac{B(k+2,n-k+1)}{B(k+1,n-k+1)}\\\\\n",
+ "& = \\frac{k+1}{n+2}\n",
+ "\\end{aligned}$$\n",
+ "$\\therefore \\displaystyle P(Y=1)=\\frac{k+1}{n+2}$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 习题1.2\n",
+ " 通过经验风险最小化推导极大似然估计。证明模型是条件概率分布,当损失函数是对数损失函数时,经验风险最小化等价于极大似然估计。\n",
+ "\n",
+ "**解答:**\n",
+ "\n",
+ "假设模型的条件概率分布是$P_{\\theta}(Y|X)$,现推导当损失函数是对数损失函数时,极大似然估计等价于经验风险最小化。\n",
+ "极大似然估计的似然函数为:$$L(\\theta)=\\prod_D P_{\\theta}(Y|X)$$两边取对数:$$\\ln L(\\theta) = \\sum_D \\ln P_{\\theta}(Y|X) \\\\ \n",
+ "\\mathop{\\arg \\max}_{\\theta} \\sum_D \\ln P_{\\theta}(Y|X) = \\mathop{\\arg \\min}_{\\theta} \\sum_D (- \\ln P_{\\theta}(Y|X))$$ \n",
+ "反之,经验风险最小化等价于极大似然估计,亦可通过经验风险最小化推导极大似然估计。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----\n",
+ "参考代码:https://github.com/wzyonggege/statistical-learning-method\n",
+ "\n",
+ "本文代码更新地址:https://github.com/fengdu78/lihang-code\n",
+ "\n",
+ "习题解答:https://github.com/datawhalechina/statistical-learning-method-solutions-manual\n",
+ "\n",
+ "配置环境:python 3.8+\n",
+ "\n",
+ "代码全部测试通过。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git "a/\347\254\25401\347\253\240 \347\273\237\350\256\241\345\255\246\344\271\240\346\226\271\346\263\225\346\246\202\350\256\272/1.Introduction_to_statistical_learning_methods.ipynb" "b/\347\254\25401\347\253\240 \347\273\237\350\256\241\345\255\246\344\271\240\346\226\271\346\263\225\346\246\202\350\256\272/1.Introduction_to_statistical_learning_methods.ipynb"
index f66c3a5..f4b6929 100644
--- "a/\347\254\25401\347\253\240 \347\273\237\350\256\241\345\255\246\344\271\240\346\226\271\346\263\225\346\246\202\350\256\272/1.Introduction_to_statistical_learning_methods.ipynb"
+++ "b/\347\254\25401\347\253\240 \347\273\237\350\256\241\345\255\246\344\271\240\346\226\271\346\263\225\346\246\202\350\256\272/1.Introduction_to_statistical_learning_methods.ipynb"
@@ -103,9 +103,18 @@
},
{
"cell_type": "code",
- "execution_count": 4,
+ "execution_count": 17,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[0. 0.11111111 0.22222222 0.33333333 0.44444444 0.55555556\n",
+ " 0.66666667 0.77777778 0.88888889 1. ] [-0.13279908173173613, 0.7367746341452334, 1.0045132782564739, 1.0297101571118858, 0.41600828736613554, -0.26374521862130107, -0.8323785925492532, -1.0074937559168053, -0.8772043052086325, 0.03821360535661758]\n"
+ ]
+ }
+ ],
"source": [
"# 十个点\n",
"x = np.linspace(0, 1, 10)\n",
@@ -113,7 +122,7 @@
"# 加上正态分布噪音的目标函数的值\n",
"y_ = real_func(x)\n",
"y = [np.random.normal(0, 0.1) + y1 for y1 in y_]\n",
- "\n",
+ "print(x,y)\n",
"\n",
"def fitting(M=0):\n",
" \"\"\"\n",
@@ -121,6 +130,7 @@
" \"\"\"\n",
" # 随机初始化多项式参数\n",
" p_init = np.random.rand(M + 1)\n",
+ " print(\"parameters initialization: \", p_init)\n",
" # 最小二乘法\n",
" p_lsq = leastsq(residuals_func, p_init, args=(x, y))\n",
" print('Fitting Parameters:', p_lsq[0])\n",
@@ -142,24 +152,27 @@
},
{
"cell_type": "code",
- "execution_count": 5,
+ "execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
- "Fitting Parameters: [0.02515259]\n"
+ "parameters initialization: [0.51134405]\n",
+ "Fitting Parameters: [0.0111599]\n"
]
},
{
"data": {
- "image/png": "\n",
+ "image/png": "\n",
"text/plain": [
- ""
+ ""
]
},
- "metadata": {},
+ "metadata": {
+ "needs_background": "light"
+ },
"output_type": "display_data"
}
],
@@ -177,24 +190,27 @@
},
{
"cell_type": "code",
- "execution_count": 6,
+ "execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
- "Fitting Parameters: [-1.50626624 0.77828571]\n"
+ "parameters initialization: [0.33798747 0.7429498 ]\n",
+ "Fitting Parameters: [-1.42280691 0.72256336]\n"
]
},
{
"data": {
- "image/png": "\n",
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAD4CAYAAADhNOGaAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/NK7nSAAAACXBIWXMAAAsTAAALEwEAmpwYAABGrUlEQVR4nO3dd3hU1dbA4d9KJ3QIvSQBQg8ECL1KBxFQQUFURBQLVZSrXLx2LFeu9CIqRYwFUQQRpCMoICT0nlACoYZQQ0jf3x9n8IuYQMKUk5nZ7/PMkzl9nRBmzT77nLVFKYWmaZrmvjzMDkDTNE0zl04EmqZpbk4nAk3TNDenE4GmaZqb04lA0zTNzXmZHcC9CAgIUEFBQWaHoWma5lSioqIuKqVK3T7fKRNBUFAQkZGRZoehaZrmVEQkNrv5+tKQpmmam9OJQNM0zc3pRKBpmubmnLKPQNM055GWlkZcXBzJyclmh+I2/Pz8qFixIt7e3rlaXycCTdPsKi4ujsKFCxMUFISImB2Oy1NKkZCQQFxcHMHBwbnaRl8a0kwTEQFBQeDhYfyMiDA7Is0ekpOTKVmypE4CDiIilCxZMk8tMN0i0EwREQFDhkBSkjEdG2tMAwwYYF5cmn3oJOBYef196xaBZopx4/4/CdySlGTM1zTNsWySCERkjohcEJF9OSwXEZkiIjEiskdEGmZZNlBEoi2vgbaIR7s7sy/LnDyZt/maZqagoCAuXrxodhh2Y6sWwTyg6x2WdwNCLK8hwEwAESkBvAk0BZoAb4pIcRvFpOXg1mWZ2FhQ6v8vyzgyGVSunLf5mmYrSikyMzPNDiNfsUkfgVJqo4gE3WGVXsCXyhgObauIFBORckA7YLVS6hKAiKzGSCjf2CIuLXs5XZb512uZFKl7jlOXk7iSlMa1m2l4eAg+nh4U8/emUnF/Akv6U7t8Efx9rPvTGT/+730EAP7+xnxNs7UTJ07QpUsXmjZtSlRUFI888gjLli0jJSWFBx98kLfffhuA3r17c+rUKZKTkxk5ciRDbnVcuThHdRZXAE5lmY6zzMtp/j+IyBCM1gSV9ddGq+R0+eVMnDD8m50A+Hp5ULSAN5kKUtIzSExJ59aopl4eQu3yRWhVLYDuoeWoU75InjunbnUIjxtnxFO5spEEdEexa3v75/0cOHPNpvusXb4Ibz5Q567rRUdHM3/+fK5du8aiRYvYtm0bSil69uzJxo0badOmDXPmzKFEiRLcvHmTxo0b8/DDD1OyZEmbxpsfOc1dQ0qp2cBsgPDwcD3Q8j1Ky8gkoGwm8Wf/+U9fqlwGy0e0JijA/x/f+FPSMzhzJZlj8YnsOHmZ7Scu8+nGY8zYcJSgkv4MaBrII+GVKOqfuwdYwPjQ1x/8mqMEBgbSrFkzXnnlFVatWkWDBg0ASExMJDo6mjZt2jBlyhQWL14MwKlTp4iOjtaJwIZOA5WyTFe0zDuNcXko6/wNDorJraSkZ/DttlPM3niMzEbF8VhZj8w0z7+W+/vDxI+9qF2+SLbb+3p5EhxQkOCAgnSoVQaAyzdSWX3gPIui4hi//CCfrD7CY00r82K7qpQs5OuQ89KcS26+udtLwYIFAaOPYOzYsTz33HN/W75hwwbWrFnDli1b8Pf3p127dm7zNLSjbh9dCjxpuXuoGXBVKXUWWAl0FpHilk7izpZ5mo0opVi+9ywdP/mNN5fup2xRP777b3nmz/EgMBBEIDAQZs/O+7fz4gV9eKRxJRY+35xfRrSiW2hZ5v5xnDb/Xc+kNUdITsuwz0lpmhW6dOnCnDlzSExMBOD06dNcuHCBq1evUrx4cfz9/Tl06BBbt241OVLHsUmLQES+wfhmHyAicRh3AnkDKKVmAcuB7kAMkAQMsiy7JCLvAtstu3rnVsexZr1Tl5J49Yc9bD6aQI0yhVkwuAmtQyxjUtSCxx+33bHqlC/KJ4+E8WK7qvxv1REmrYnmp52nea93KK1CAmx3IE2zUufOnTl48CDNmzcHoFChQnz11Vd07dqVWbNmUatWLWrUqEGzZs1MjtRxRCnnu9weHh6u9MA0OcvMVERsO8kHyw/iIcKr3WrSv3ElvDwd9/zg79EXef2nvZxISKJ/k0r8p0dtq+800pzTwYMHqVWrltlhuJ3sfu8iEqWUCr99Xf0/08VcTUrjpYW7WHfoAq2qBfBRn3pUKFbA4XG0Cgng11FtmLjmCLM3HmPb8UtM7d8wxz4ITdPMo0tMuJB9p6/SY9omNkXH8+YDtVkwuIkpSeAWP29PxnarxVeDm3I9OZ3e0//gh6g40+LRNC17OhG4iGV7zvDwzM2kpSu+HdKcQS2D802hr5bVAlgxsjXhQcV5+fvdfLDiIBmZzndJUtNclU4ETk4pxWcbjzHs652EVijKshGtaBSY/6p0lCzky/ynm/B4s8p8+tsxnlsQxc1UfVeRpuUHOhE4scxMxds/H2D88oN0Dy3LV880JSAf37/v7enBe71DebtnHdYeOs/AOdu4lpxmdlia5vZ0InBSmZmKfy/ey7zNJxjcKphp/Rvi5+159w3zgYEtgpjSrwE7Tl6m/+ytXExMMTskTXNrOhE4oYxMxb9+2MO3208xvH01Xr+/Fh4e+aM/ILceqF+ezweGczQ+kf6zt5Kgk4FmR1OmTKFWrVoMGDCApUuX8uGHHwLw008/ceDAgb/WmzdvHmfOnMnTvk+cOEHdunVtGq+j6UTgZDIzFWMW7WZRVByjOobwcuca+aZTOK/a1SjN3KeacPJSEk98sY0rSalmh6S5qBkzZrB69WoiIiLo2bMnr732GmCbRGAP6enpDj2eTgRORCnF2z/v58cdpxndqTqjOlY3OySrNa9aks+eDCfmQqLuM9Ds4vnnn+fYsWN069aNiRMnMm/ePIYNG8bmzZtZunQpY8aMISwsjI8++ojIyEgGDBhAWFgYN2/eJCoqirZt29KoUSO6dOnC2bNnAYiKiqJ+/frUr1+f6dOn53jsjz76iNDQUOrXr/9X8mnXrh23Hoi9ePEiQUFBgJGEevbsSfv27enQoQP9+vXjl19++WtfTz31FIsWLSIjI4MxY8bQuHFj6tWrx6effmr170g/UOZEpq6LYf6WWIa0qcKIDiFmh2MzbaqXYsaAhjz/VRTPzIvky8FNnKa/Q8ujFa/Bub223WfZUOj2YY6LZ82axa+//sr69esJCAhg3rx5ALRo0YKePXvSo0cP+vTpY4S3YgUTJkwgPDyctLQ0hg8fzpIlSyhVqhTfffcd48aNY86cOQwaNIhp06bRpk0bxowZk/2prljBkiVL+PPPP/H39+fSpbtXz9mxYwd79uyhRIkSLF68mIULF3L//feTmprK2rVrmTlzJl988QVFixZl+/btpKSk0LJlSzp37kxwcHDef3cWukXgJBZsjeWT1Ud4uGFFxnaraXY4Ntexdhk+eTSMbScuMWbRHjL1cwaayQ4fPsy+ffvo1KkTYWFhvPfee8TFxXHlyhWuXLlCmzZtAHjiiSey3X7NmjUMGjQIf39/AEqUKHHXY3bq1Omv9bp168b69etJSUlhxYoVtGnThgIFCrBq1Sq+/PJLwsLCaNq0KQkJCURHR1t1rrpF4ATWHjzPG0v20aFmaT56ONRp+wTupmf98py5cpMPVxyifDE/xnbT9Wlczh2+uec3Sinq1KnDli1b/jb/ypUrVu3Xy8vrr6Eyby9zfatUNoCfnx/t2rVj5cqVfPfdd/Tr1++vuKZOnUqXLl2siiMr3SLI5w6fu86Ib3ZSt3xRpj3W0KGF48zwXJsqfz10tmBrrNnhaC6ucOHCXL9+PdvpGjVqEB8f/1ciSEtLY//+/RQrVoxixYrx+++/AxCRw2DfnTp1Yu7cuSRZxmO9dWkoKCiIqKgoABYtWnTH+B599FHmzp3Lpk2b6NrVGBa+S5cuzJw5k7Q0oz/tyJEj3Lhx457O/xbX/lRxcgmJKQyev52Cvl589mQ4BXxc/7q5iPDWA3VoX7M0by3dz9ZjCWaHpLmwfv368fHHH9OgQQOOHj3KU089xfPPP09YWBgZGRksWrSIV199lfr16xMWFsbmzZsBmDt3LkOHDiUsLIycKjh37dqVnj17Eh4eTlhYGBMmTADglVdeYebMmTRo0ICLFy/eMb7OnTvz22+/0bFjR3x8fAB45plnqF27Ng0bNqRu3bo899xzVt9lpMtQ51Mp6Rk8/vmf7Im7ysLnmlO/UjGzQ3Ko68lp9Jr+B1eT0lg6vJWpxfM06+gy1ObISxlqm7QIRKSriBwWkRgReS2b5RNFZJfldURErmRZlpFl2VJbxOMK3l12gO0nLjOhb323SwIAhf28+ezJcFLTM3luQaQe7UzT7MjqRCAinsB0oBtQG+gvIrWzrqOUekkpFaaUCgOmAj9mWXzz1jKlVE9r43EFP+08zVdbT/Jc2yo8UL+82eGYpmqpQkzqF8b+M9f49497c2yCa5pmHVu0CJoAMUqpY0qpVOBboNcd1u8PfGOD4zqliAgICgIPD+Pn7f1M0eevM/bHvTQJLsGYzjXMCDFf6VCrDC91rM6PO0/z7fZTZoejaS7JFomgApD1f2icZd4/iEggEAysyzLbT0QiRWSriPTO6SAiMsSyXmR8fLwNwna8iAgYMgRiY0Ep4+eQIf+fDG6kpPNCxA4K+noyrX8Dl79DKLeG3VeN1iEBvLV0P4fOXTM7HE1zOY7+pOkHLFJKZb3gG2jpvHgMmCQiVbPbUCk1WykVrpQKL1WqlCNitblx48ByJ9lfkpKM+QCv/7SPY/GJTOnfgNJF/BwfYD7l4SF88kgYRQp4MzRiB0mpjq3DommuzhaJ4DRQKct0Rcu87PTjtstCSqnTlp/HgA1AAxvElL0NH8Hyf8Gl43Y7xJ2cPJnz/CW7TrN452lGdqhOi6oBjg3MCZQq7MukR8M4dvEGby7Zb3Y4muZSbJEItgMhIhIsIj4YH/b/uPtHRGoCxYEtWeYVFxFfy/sAoCVw4PZtbebmZYicA1MbwsKBEBdlt0Nlp3Ll7OeXr5DJ64v30SiwOEPvy7ZBpGEMeTnsvmp8HxXHkl05fdfQNOu98cYbrFmzxuwwHMbqRKCUSgeGASuBg8BCpdR+EXlHRLLeBdQP+Fb9/daPWkCkiOwG1gMfKqXslwi6fQij9kLLkXB0PXzeHuZ0g0PLwfLItz2NHw+WsiN/8fdXVOgUgwImPRqm+wXuYmSHEBoFFuf1n/Zx9upNs8PR7OBuN1Q4wjvvvEPHjh0df2CzKKWc7tWoUSNlteRrSm2ZodQndZV6s4hSUxopFTlXqdSb1u/7Dr76SqnAQKVEjJ8D/31WBb66TC2KPGXX47qS4/GJqubrK9SAz7aqjIxMs8PR7uLAgQO5Xverr5Ty91fKuJ3CePn7G/Otcfz4cVWzZk31zDPPqNq1a6tOnTqppKQktXPnTtW0aVMVGhqqevfurS5duqSUUmrgwIHq+++/V0op9eqrr6patWqp0NBQ9fLLLyullLpw4YJ66KGHVHh4uAoPD1e///67dQHaQXa/dyBSZfOZavqH+r28bJIIbklPU2rP90rNam0khP9WVWrDR0olXrTdMXKw59QVVXXsL+rFiCiVmak/0PJiwZYTKvDVZWr+5uNmh6LdRV4SQWDg35PArVdgoHUxHD9+XHl6eqqdO3cqpZTq27evWrBggQoNDVUbNmxQSin1n//8R40cOVIp9f+J4OLFi6p69ep//f+8fPmyUkqp/v37q02bNimllIqNjVU1a9a0LkA7yEsi0NVHPb0gtA/UfRhO/A6bp8L68bDpE2gwAJq9CCVtf90+NT2TV77fTclCPrzf23UritrLgKaVWXXgPO8vP0iragFUKVXI7JA0G7jTDRXWCg4OJiwsDIBGjRpx9OhRrly5Qtu2bQEYOHAgffv2/ds2RYsWxc/Pj8GDB9OjRw969OgBGCWms45sdu3aNRITEylUyDn/DvUF6VtEILg1DFgIL/5pJIcdX8LURvDd43Bqu00PN219DIfPX+f9B0Mp6u9t0327AxHhvw/Xw9fLk9ELd5OeYf8+Hs3+crqhIqf5eeHr6/vXe09Pz1yVk/by8mLbtm306dOHZcuW/VUBNDMzk61bt7Jr1y527drF6dOnnTYJgE4E2StdE3pNg1H7oPVoOL4JvugIX3SBg8sg07q6NwfOXGPG+hgebFCBDrXK2Cho91O2qB/v9KrDrlNXmLf5hNnhaDaQ/Q0VxnxbK1q0KMWLF2fTpk0ALFiw4K/WwS2JiYlcvXqV7t27M3HiRHbv3g0YVUGnTp3613q7du2yfYAOpBPBnRQuAx3egJf2Q7f/wvWz8N0AmNYYtn8BaXm/ayUtI5Mxi3ZTzN+bN3rUvvsG2h31rF+eDjVLM2HVYU4mJN19Ay1fGzAAZs+GwECjkR4YaEwPGGCf482fP58xY8ZQr149du3axRtvvPG35devX6dHjx7Uq1ePVq1a8cknnwAwZcoUIiMjqVevHrVr12bWrFn2CdBBdBnqvMhIh0M/wx9T4MwO8C8JjZ+FJs9Cwdw9BDZ9fQwfrzzMzAEN6RZazs4Bu4ezV2/S6ZONNKhcjC+fbqL7W/IZXYbaHA4vQ+02PL2gzoPw7DoYtAIqNoHfPoSJdWDZS3Ax5o6bx1y4zuQ10XQPLauTgA2VK1qAV7vWYFP0RX7YoR8007S80ongXohAYAt47FsYuh3qPQo7I2BaOHw7AE5uNe56y0Ipxb8X76OAjydv96xrUuCua0DTQBoFFufdZQeIv55idjia5lR0IrBWqerQcwq8tA/ajIHYP2BOF/iiExxY8lfH8g87TrPt+CXGdqtJqcK+d9mpllceHsJHD4dyMzWDt3/WtYjyG2e8BO3M8vr71onAVgqVhvbjjI7l7hPgxkVY+CRMbUjSppl88stOGlYuxiPhle6+L+2eVCtdmKH3VWPZnrP8dsQ5S5W7Ij8/PxISEnQycBClFAkJCfj55b6Cse4stpfMDDj0C2yeAnHbuawKkd5oMKXaDzOShmYXKekZdJ1k3A7466jW+Hp5mhyRlpaWRlxcHMnJyWaH4jb8/PyoWLEi3t5/f0Ypp85inQjsLCr2Mu/Pmsv7ZdZT48om8PSB+v2g+TDjspJmc78diWfgnG280rk6w9qHmB2OpuUb+q4hE6RnZDJu8V7OFKlPxRcWw7BIo2zFnu9gemP4uh+c+OMfHcuaddpWL0W3umWZtj6GU5f0swWadjc6EdjRvM0nOHTuOm8+UIeCvl4QUA16TDT6EdqNhbhtMK87fNYe9i82nlPQbOI/PWojCO8us19Vc01zFToR2En89RQmr4nmvhql6FLntjISBQOg3WtGQrj/E0i+Ct8/ZQyY8+enkJJoSsyupHyxAozoEMKqA+dZf+iC2eFoWr5mk0QgIl1F5LCIxIjIa9ksf0pE4kVkl+X1TJZlA0Uk2vIaaIt48oMJKw+TnJ5hfDPN6UlX7wLQeDAM2w6PRkDhcrDiX8YDamvfgevnHBu0ixncKpgqpQry5tL9JKdZVx9K01yZ1YlARDyB6UA3oDbQX0SyK6LznVIqzPL63LJtCeBNoCnQBHhTRIpbG5PZ9sZdZWHUKQa1DM5deWQPT6jVAwavhMGrIbiNUQZ7UigsGQoXDtk/aBfk4+XBu73qcvJSEp9vOmZ2OJqWb9miRdAEiFFKHVNKpQLfAr1yuW0XYLVS6pJS6jKwGuhqg5hMo5TirZ/3U7KgD8PaV8v7Dio1gUcXwPAoaDgQ9v4AM5pCRF84vlF3LOdRy2oBdKlThhkbjnLhmr59UdOyY4tEUAE4lWU6zjLvdg+LyB4RWSQit56qyu22TmPp7jNExV7mX11qUsTPinEGSlaF+ycY/Qj3jYPTO2D+AzC7HexdpDuW8+Df3WuRlpHJxysPmx2KpuVLjuos/hkIUkrVw/jWPz+vOxCRISISKSKR8fH586nRpNR0Plh+iNAKRenTqKJtdlqwJLT9l5EQHpgMqTfgh8EwpQFsmQEp121zHBcWWLIgT7cMZtGOOPbGXTU7HE3Ld2yRCE4DWesmVLTM+4tSKkEpdasS2OdAo9xum2Ufs5VS4Uqp8FKlStkgbNubteEo564l8+YDtfHwsHEpZG8/aPQUDN0G/b+FYpVg5Vj4pA6sfhOunbXt8VzM0PbVKOHvw7vLDuhSB5p2G1skgu1AiIgEi4gP0A9YmnUFEclac7kncNDyfiXQWUSKWzqJO1vmOZ24y0l8uvEYPeuXJzyohP0O5OEBNbrBoOXwzDqo1t4oYzEpFBa/AOd1wbXsFPHz5uXONdh24hIr9um7sTQtK6sTgVIqHRiG8QF+EFiolNovIu+ISE/LaiNEZL+I7AZGAE9Ztr0EvIuRTLYD71jmOZ0JluvPr3Wr6biDVmwEfefB8B0Q/jQc+AlmtoAFD8GxDbpj+TaPNq5EzbKFeX/5QX07qaZloWsN2cDeuKs8MO13XmxXlX91dWAiuF3SJYicYzyUduMClA2FFiOMwXQ8rei4diGbYy7y2Od/8q+uNXix3T3c1aVpTkzXGrITpRTvLz9IiYI+PN+uqrnB+JeANq8YYyP0nAbpqfDjszC5PmyeCsnXzI0vH2hRLYCOtcowfV0MCYlGt1VEBAQFGVfdgoKMaU1zJzoRWGnD4Xi2HEtgRPtq1t0uaktevtDwCXhxKzz2PZSoAqteN55YXvU6XHXv4Rxf61aT5PRMpq6LISIChgyB2FjjSlpsrDGtk4HmTvSlISukZ2TSfcomUtMzWfVSW3y88nFePb0DtkyD/T8ZQ23W7QMthhmXj9zQ2B/3sijqFDe/7MrpuH/+uwUGwokTjo9L0+xJXxqyg0VRcRw5n8irXWvm7yQAUKEh9JkDI3ZCkyFw8GeY1Qq+7A0xa92uY/mljiF4eXhwOi7723xPnnRwQJpmonz+6ZV/JaWm88nqIzSsXIyudcuaHU7uFQ+Erh/A6P3Q8S24cBC+eghmtoRd3xj9Cm6gdBE/nmkdjGeRm9kur1zZwQFpmol0IrhHn286zoXrKYy7v1bO1UXzswLFodVLMGov9J4JKPjpeaNj+Y/JRmlsFzekTRUqdjqKp8/fbyX194fx400KStNMoBPBPYi/nsKnvx2la52yNAq048NjjuDlA2GPwQubYcAPEBACq98wnlheOQ6unLr7PpxUYT9v3nqpEMU676F0+QxEjL6B2bNhwACzo9M0x/EyOwBnNG1dNCnpmfyraw2zQ7EdEQjpaLzO7obN02DrTONV9yFoMRzK1Tc7Spt7rGkgc9ucoECHP/hlRGs8bV0aRNOcgG4R5NGpS0l8ve0kjzSulLuxBpxRufrw8Gcwcjc0ewEO/wqftjGqn0avdqmOZR8vD17pXIND566zeKd731aruS+dCPJo0ppoPEQY0T7E7FDsr1gl6DLe6Fju9A5cjIGIPjCjOeyMgPSUu+/DCdwfWo56FYvyyarDuvSE5pZ0IsiD6PPXWbwzjoEtgihb1M/scBzHryi0HGm0EB781BhRbcmLMKmeMZLazctmR2gVDw/h1a41OXM1mYg/9X2jmvvRiSAP/rfqCP4+Xjzf1uRSEmbx8oH6/eD53+GJxVCmNqx92+hYXvEaXI41O8J71rJaAC2qlmTG+hhupOhBfzT3ohNBLu0+dYVf95/jmdbBlCjoY3Y45hKBqu2NZPD871C7J2z/DKaEwfeDjKeYndArXWqQcCOVeZtPmB2KpjmUTgS5NGHVYUoU9OGZ1lXMDiV/KRsKD86CkXug+TCIWQOf3QfzesCRlZCZaXaEudawcnE61irNrN+OcjUpzexwNM1hdCLIhc1HL7Ip+iIvtqtKIV99x222ilaAzu8aQ2p2Hg+XjsPXj8CMZrDjS0hzjoHjR3eqwfXkdGZvOmp2KJrmMDoR3IVSio9XHqZsET8ebxZodjj5n18Ro5jdyF3w0OdGv8LS4cYIahs/NsZMyMdqly9Cj3rlmPvHCeKvu8ZdUZp2NzZJBCLSVUQOi0iMiLyWzfLRInJARPaIyFoRCcyyLENEdlleS2/f1mxrD15g58krjOwYgp+3p9nhOA9Pb6jXF57bBE8uhXL1YN17Rins5f8yWgz51OhO1UlJz2TGhhizQ9E0h7A6EYiIJzAd6AbUBvqLSO3bVtsJhCul6gGLgP9mWXZTKRVmefUkH8nMVExYdZigkv70aVTR7HCckwhUaQuP/wAvbDFGS4ucA1MbwsKBEBdldoT/UKVUIR5uWIGIrSc5cyX7onSa5kps0SJoAsQopY4ppVKBb4FeWVdQSq1XSiVZJrcCTvGp+vOeMxw6d52XOlXH21NfRbNamdrQe4ZR6K7lSDi6Hj5vD3O6waHl+apjeUSHEBSKqeuizQ5F0+zOFp9uFYCslcniLPNyMhhYkWXaT0QiRWSriPTOaSMRGWJZLzI+Pt6qgHMjPSOTiauPULNsYR6oV97ux3MrRcoZJbBH74euH8LVOPi2P0xvDJFzIc38b+EVi/szoGkgCyPjOH7xhtnhaJpdOfRrrog8DoQDH2eZHWgZMecxYJKIZPu0llJqtlIqXCkVXqpUKbvHunjnaU4kJDG6U3U8dCEy+/AtbNQyGrETHv4CfArCslEwsS789l+4kWBqeC/eVxVvT2HSmiOmxqFp9maLRHAaqJRluqJl3t+ISEdgHNBTKfXX7RhKqdOWn8eADUADG8RklbQMYzzbuhWK0Kl2GbPDcX2eXhDaB4b8BgOXQYVGsH680bH8y8uQYM6tnKUL+/FUi2CW7j7D4XPXTYlB0xzBFolgOxAiIsEi4gP0A/5294+INAA+xUgCF7LMLy4ivpb3AUBL4IANYrLKjzviOHkpiVEdqjvnoDPOSgSCW8OAhfDin0Zy2PElTG0E3z0Op7Y5PKTn21ahoI8Xk9fqVoHmuqxOBEqpdGAYsBI4CCxUSu0XkXdE5NZdQB8DhYDvb7tNtBYQKSK7gfXAh0opUxNBarrRGqhXsSgdapU2MxT3Vrom9JoGo/ZB65fh+Cb4ohN80RkOLoNMx1QJLebvw6CWQSzfe46DZ6855Jia5miinLC2fHh4uIqMjLTLvr/ZdpKxP+5lzlPhtK+pLwvlG6k3YOdXsGU6XImFElWh+VCo3x98/O166KtJabT6aB0tqwUw64lGdj2WptmTiERZ+mT/Rt8TmUVqeibT1sVQv1Ix7quhWwP5ik9BaPocDN8BfecZpbF/GQ2T6sL6D+DGRbsduqi/N4NaBfPr/nPsP+P6Yzlr7kcngiy+jzrF6Ss3GdUxRPcN5FeeXsZDac+ug0EroFJT+O1Do2P551HG4Dl2MLhVMIX9vJi8Rj9XoLkenQgsUtIzmL4uhrBKxWhX3f63p2pWEoHAFtD/Gxi63RgnYdfXMC0cvnkMYrfYdEjNogW8GdwqmFUHzrPvtG4VaK5FJwKLhZFxnLmazOhO+k4hp1OqOjwwGV7aB23GwMnNMLer0bl8YInNOpafbhVMET8vJulWgeZidCLAaA3MWB9Do8DitA4JMDsc7V4VKg3txxmlsLtPMPoNFj5p1DXa9pnR4WyFIn7ePNO6CmsOnmdP3BXbxKxp+YBOBMB3209x9moyL3XUrQGX4FMQmjwLw6PgkQVQsBQsf8XoR1j3HiReuPs+cjCoZRBFC3jrVoHmUtw+ESSnZTB9fQyNg4rTslpJs8PRbMnD0xhG85k18PQqCGwJGycYJSyWDof4w3neZWE/b4a0qcK6QxfYdeqK7WPWNBO4fSL4dttJzl9L0a0BV1e5KfSLgGGR0GAA7FkI05vA14/CiT/y1LE8sEUQxf29dQ0izWW4dSJITstg+oajNAkuQfOqujXgFgKqQY+JRj9Cu7EQtx3mdYfP2sO+HyEj/a67KOTrxbNtqrDhcDw7Tl52QNCaZl9unQgi/jxJ/HXdGnBLBQOg3WtGQugxEZKvwqJBMLUBbJ0FKYl33Hxg8yBKFPTRfQWaS3DbRHAzNYOZG47SrIpuDbg17wIQ/rRxyajf11C4PPz6KkysDWvehuvnst2soK8XQ9pUYeOReKJidatAc25umwgi/ozlYqLRGtA0PDyg5v0weCUMXg3BbeH3iTApFJYMhQuH/rHJk80DKVnQR/cVaHYXEQFBQcafaVCQMW1LbpkIklLTmfXbUVpWK0nTKro1oN2mUhN4dIFx+2nDgbD3B5jRFCL6wvGNf3Us+/t48VzbKmyKvsj2E5dMDlpzVRERMGQIxMYaf3qxsca0LZOBWyaCr7bGcjExVbcGtDsrWRXunwCjD8B9r8OZnTD/AZjdFvYugox0nmgWREAhXyau1q0CzT7GjYOkpL/PS0oy5tuK2yWCpNR0Pv3tGK1DAggPKmF2OJoz8C8BbccYYyM8MBlSk+CHwTAljAJRnzK8ZRk2H03gz2PmDq2puaaTJ/M2/17YJBGISFcROSwiMSLyWjbLfUXkO8vyP0UkKMuysZb5h0Wkiy3iuZMvt8SScCOVUbo1oOWVtx80egqGboP+30KxyrByLE9u7c7b/guZt3Kz2RFqLqhy5bzNvxdWJwIR8QSmA92A2kB/Eal922qDgctKqWrAROAjy7a1MYa2rAN0BWZY9mcXN1LSmb3xGG2ql6JRYHF7HUZzdR4eUKMbDFoOz6xDqnXgicylTD43kAtfPg3n95sdoeZCxo8H/9vGXvL3N+bbii1aBE2AGKXUMaVUKvAt0Ou2dXoB8y3vFwEdxLhxvxfwrVIqRSl1HIix7M8u5m85waUbqbzUMcReh9DcTcVG0HceaUOj+NGzC0WOL4OZLWDBQ3B0vU1LYWvuacAAmD0bKlTMBFFUqJTJ7NnGfFuxRSKoAJzKMh1nmZftOpYxjq8CJXO5LQAiMkREIkUkMj4+/p4CPXT2Ou1qlKJBZd0a0GzLt1QVkju8T9ObU4kNexnO7YUFveHT1kY5i4w0s0PUnNiAAfD4pH2EjP2VyL2pNk0C4ESdxUqp2UqpcKVUeKlS9zZwzJT+DZj1uB5zVrOPfk0qU6BIAK+c64gatRd6ToP0VPjxWZhcHzZPheRrZoepOaFTl5JYFBVH/yaVKFvUz+b7t0UiOA1UyjJd0TIv23VExAsoCiTkclub8vO2WxeE5ub8vD158b6qbD9xmT9OJELDJ+DFrfDY91CiCqx63SiFvep1uBpndriaE5m+PgYPD+GFdtXssn9bJILtQIiIBIuID0bn79Lb1lkKDLS87wOsU0opy/x+lruKgoEQYJsNYtI0UzzauBLlivoxcc0RlFJGx3L1zvDUMhiyAUI6wZYZRgvhxyHGJSRNu4NbrYHHmlS2S2sAbJAILNf8hwErgYPAQqXUfhF5R0R6Wlb7AigpIjHAaOA1y7b7gYXAAeBXYKhSyjbjCmqaCXy9PHnxvmpExV5mU/TFvy8s3wD6zIERO6HJEDi4DGa1gi97Q8xa3bGsZWvqumhLa6Cq3Y4hygn/+MLDw1VkZKTZYWhatlLSM7jv4w2UKerHjy+0yLmy7c3LEDXPqHaaeA5K14EWw6Huw+Dl49CYtfwpNuEG7f/3G080C+StnnWs3p+IRCmlwm+f7zSdxZrmLHy9PBnavho7T17htyN3uMOtQHFo9RKM2gu9ZwIKfnoeJteD3yfBzSsOiljLr6aui8HLQ3jRjq0B0IlA0+yib6NKVChWgIlrorlrq9vLB8Iegxc2w4AfIKA6rHnTGFJz5Ti4curO22su6cTFGyzeeZoBTQMpXcQ+fQO36ESgaXbg4+XBsPbV2H3qChsO5/K5FxEI6QgDl8JzG42nl7fONDqWf3gGzuyya8xa/jJ1XQzensLz7arY/Vg6EWianfRpVJGKxQsw6dYdRHlRrj48/BmM3A3NXoDDvxpVT+c/ANGrdceyizt+8QaLd8bxeNNAShe2b2sA3CgR2HtgB027nbenB8PbV2N33FXWH75wbzspVgm6jIfR+6HTu3AxBiL6wIzmsPMrSE+xbdBavjB1bTQ+Xh4819a+fQO3uEUicMTADpqWnYcaVqRyCX8m5aav4E78ikLLEUYL4cFPwcPTGDltUj3Y9IlxB5LmEo7GJ/LTrtM80SyQUoV9HXJMt0gEjhjYQdOy4+1p9BXsibvK2oP32CrIyssH6veD53+HJxZDmdqw9m34pA6seA0ux1p/DM1UU9dG4+vl6bDWALhJInDEwA6alpOHGlQgsKQ/k9beQ19BTkSgansjGTz/B9TuCds/gylh8P0gOL3DNsfRHCrmQiJLd5/hyeaBBBRyTGsA3CQROGJgB03LiZenB8Pbh7Dv9DVWHzhv+wOUrQsPzoKRe6D5MIhZA5/dB3PvhyMrITPT9sfU7GLqumj8vD0Z0sb+dwpl5RaJwBEDO2janfQOK09wQEHr+wrupGgF6PwuvLQfOo+Hyyfg60dgRjPY8SWkJdvnuJpNxFy4bmkNBFHSga0BcJNEcGtgh8BAo0UdGIjNB3bQtDvxstxBdODsNVbut0OrICu/ItBiGIzcBQ99bvQrLB0Ok0Jh48eQdMm+x9fuyeS1Mfib0BoAXWtI0xwmPSOTzhM34uPlwfIRrfHwyKEGka0pBcc3wuYpxmUjb39o8Dg0exFKBDsmBu2Ojpy/TpdJG3mhbVX+1bWm3Y6jaw1pmsm8PD0Y0SGEQ+eus3L/OccdWASqtIXHf4AXtkCdByFyLkxtCAufhDj9pcpsk9dG4+/tybOtHd8aAJ0INM2hHqhfnqqljL6CzEwTWuNlakPvGUahu5Yj4egG+LwDzOkGh5brjmUTHD53neV7z/JUyyCKFzSn6qxOBJrmQJ4ewogOIRw+f50V+xzYKrhdkXLQ8S3jieWuHxojpn3bH6Y3NloLaTfNi83NTFx9hII+XjzTypzWAFiZCESkhIisFpFoy89/jAovImEiskVE9ovIHhF5NMuyeSJyXER2WV5h1sSjac6gR73yVCtdiMlrj5jTKsjKt7BRy2jETmPQHJ9CsGyUUfl0w0dwI8Hc+Fzc3rir/Lr/HM+0DjatNQDWtwheA9YqpUKAtZbp2yUBTyql6gBdgUkiUizL8jFKqTDLa5eV8WhavufpIYzsEMKR84n8sves2eEYPL2MAXGGbICBy6BCI9jwvjHG8i8vQ8JRsyN0SRNWHaa4vzeDW5nbaW9tIugFzLe8nw/0vn0FpdQRpVS05f0Z4AJQysrjappTuz+0HNXLFGLy2mgyLK2CfFEYUQSCW8OAhfDinxDax3gGYWoj+O5xOKWHFLeV7Scu8duReJ5vW5XCft6mxmJtIiijlLr1leYcUOZOK4tIE8AHyPr1YrzlktFEEcnxKQoRGSIikSISGR+fy/rumpZPeXgIIztUJ+ZCIj/vPpM/CyOWrgm9psGofdD6ZTi+Cb7oBF90hoM/Q6YeXvxeKaX4eOVhShX25cnmQWaHc/fnCERkDVA2m0XjgPlKqWJZ1r2slPpHP4FlWTlgAzBQKbU1y7xzGMlhNnBUKfXO3YLWzxForiAzU9Fj6u/cSE3nxLR2nDz5z+cKAgPhxAnHx5at1BtG6est0+FKLJSoAs2HQv3HwMf/7ttrf9kUHc8TX2zj7Z51GNgiyGHHvefnCJRSHZVSdbN5LQHOWz7Mb32oZ1teUUSKAL8A424lAcu+zypDCjAXaHJvp6dpzsfDQxjTpQaxCUmczGE0ynxVGNGnIDR9DobvgL7zwK+Y0X8wsQ6sfx8SdUs9N5RSTFh5mArFCtCvSSWzwwGsvzS0FBhoeT8QWHL7CiLiAywGvlRKLbpt2a0kIhj9C/usjEfTnEq7GqVoHFQcnyLZ1wHKl4URPb2Mh9KeXQeDVkDlZvDbRzCpLvw8yhg8R8vR6gPn2R13lZEdQvD18jQ7HMD6RPAh0ElEooGOlmlEJFxEPres8wjQBngqm9tEI0RkL7AXCADeszIeTXMqIsKYLjUp3PoQPr5/f5gr3xdGFIHAFtD/Gxi63RgnYdfXMC0cvnkMYrfoITVvk5mp+GT1EaoEFOShhhXMDucvutaQpuUDT83dxrplfrA9lLg4oXJlIwk4XWHExAuw7TNjbISbl6FCOLQYDrUeMEZVc3NLd59hxDc7mdK/AT3rl3f48XWtIU3Lx17pXAOPkFOM/PwImZlGB7HTJQGAQqWh/Th46QB0nwBJCfD9QKOu0bbPjA5nN5Wekcmk1UeoWbYwPULLmR3O3+hEoGn5QN0KRelRrxxf/H6c+OsuMCC9jz80eRaGR8EjC6BgKVj+itGxvO49o+XgZn7ccZpjF28wulN1x1WezSWdCDQtnxjdqTop6ZnM2OBCna0ensYwms+sgadXQWBL2DjBKGGxdDjEHzY7QodITstg8tpo6lcsSqfad3zcyhQ6EWhaPlGlVCH6NqpIxNaTxF1OMjsc26vcFPpFwLBIaDAA9iyE6U3g60fhxO8u3bG8YEssp6/c5NVuNTFuksxfdCLQtHxkRIcQEJi8JtrsUOwnoBr0mGgMqdluLMRth3n3w2ftYd+PkJFudoQ2dfVmGtPWx9C2eilaVA0wO5xs6USgaflI+WIFeKJZID/siCPmwnWzw7GvggHQ7jUjIfSYCMlXYdEgmNoAts6ClESzI7SJmRuOci05jVftOPKYtXQi0LR85sV2VSng7cmElUfMDsUxvAtA+NPGJaN+X0Ph8vDrqzCxNqx5G66bOG6Dlc5evcncP47zYFgFapcvYnY4OdKJQNPymZKFfBnSpiq/7j9HVKwbDTTv4QE174fBK2HwaghuC79PhEmh8NNQuHDQ7AjzbOLqIygFL3WqbnYod6QTgablQ8+2CaZ0YV/G/3IQZ3zo02qVmsCjC2DEDmg4EPb9ADOaQURfOL7RKTqWj5y/zqKoOJ5sHkilEvm7KJ9OBJqWD/n7ePFy5+rsOHnF3CEtzVaiCtw/AUYfgPtehzM7Yf4DMLst7F0EGWlmR5ij//56mIK+Xgy9r5rZodyVTgSalk/1aVSJGmUK89Gvh0hNd/NB5f1LQNsxxtgID0yB1CT4YTBMaWCUxU7JXx3r209cYs3B87zQrqqpQ1Dmlk4EmpZPeXoIY7vXJDYhia+2xpodTv7g7QeNBsLQbdD/WyhWGVb+Gz6pA6vfhGtnzI4QpRQfLD9I2SJ+DGph7hCUuaUTgablY22rl6JVtQCmrIvm6s38exnE4Tw8oEY3GLQcnlkH1drD5ikwqR4sfgHO7zcttF/2nmXHySu81CmEAj7OUWhPJwJNy8dEjFbB1ZtprlV6wpYqNjIGyhmxExoPhgM/wcwWsOAhOLreoR3LyWkZfLD8ELXLFaFPo/wx6Exu6ESgaflcnfJFeahBReb+ccI1S0/YSvEg6PaR8YBa+//Aub2woDfMag27v3NIx/IXvx/n9JWbvN6jFp75rLDcnehEoGlO4JUu1RHg45XuUaTNKv4loM0r8NI+6DkNMlJh8RCYXB82T4Xka3Y57IXrycxYH0Pn2mXybSmJnFiVCESkhIisFpFoy8+cBq7PyDI62dIs84NF5E8RiRGR7yzDWmqadptyRQvwTOtgluw6Q1TsZbPDcQ5evtDwCXhxKzz2vXEr6qrXjVLYq16Hq3E2Pdwnq46QmpHJv7vXsul+HcHaFsFrwFqlVAiw1jKdnZtKqTDLq2eW+R8BE5VS1YDLwGAr49E0l/Viu2qUKeLL2z/vJzMz/z9QlW94eED1zvDUMhiyAUI6w5YZRgvhxyFwdo/Vhzhw5hrfRZ5iYPMgggIKWh+zg1mbCHoB8y3v52MMQJ8rlgHr2wO3BrTP0/aa5m4K+noxtlst9sRdZVGUbb/Nuo3yDaDPFzByFzQZAgeXwaet4cteELP2njqWlVK8u+wAxQp4M7xDiO1jdgBrE0EZpdRZy/tzQE4jLviJSKSIbBWR3pZ5JYErSqlbNWfjgBxHcxaRIZZ9RMbHx1sZtqY5p15h5WkUWJz/rjzEtWR9O+k9K1YZun4Ao/dDx7fgwiH46iGY2RJ2fQPpqbne1eoD59lyLIGXOlWnaAFv+8VsR3dNBCKyRkT2ZfPqlXU9ZRREySmdBloGTH4MmCQiVfMaqFJqtlIqXCkVXqpUqbxurmkuQUR464E6JNxIZepaFx6zwFEKFIdWL8GovdB7JqDgp+dhcj34fRLcvHLHzZPTMnj3lwOElC7EY00qOyJiu7hrIlBKdVRK1c3mtQQ4LyLlACw/sx2IVCl12vLzGLABaAAkAMVExMuyWkXgtNVnpGkuLrRiUR4Nr8TcP04Qc8E1avabzssHwh6DFzbD4z9AQHVY86bRsfzrv+HKyWw3m/XbUU5dusk7veri5em8N2FaG/lSYKDl/UBgye0riEhxEfG1vA8AWgIHLC2I9UCfO22vado/vdKlBgV8PHl32QH3rE5qLyJQrSMMXArPbYQa3eHPWTA5DBYNhjO7/lr1ZEISMzYc5YH65WletaRpIduCtYngQ6CTiEQDHS3TiEi4iHxuWacWECkiuzE++D9USh2wLHsVGC0iMRh9Bl9YGY+muYWAQr6M7BDCb0fiWXsw24a4Zq1y9eHhz2Dkbmj2AhxZaVQ9nf8ARK/mnZ/34e0hjHPC20VvJ874bSI8PFxFRkaaHYammSotI5PukzeRlJrBmtFtnaaujdNKvgpR82HrTLh+hsOZFTlbezDt+gw1nllwAiISZemv/RvnvailaW7O29OD93rX5fSVm0zWHcf251cUWo4geegO3vUZhaeXN+0OvW2MoLbpf3DTeR/004lA05xY0yol6duoIp9vOsbhc/mrJr+r+vT3OL641oQL/VfDE4uhTB1Y+45RCnvFa3DZ+UqG60SgaU5ubPdaFPbzYtzivfqJYzs7Fp/I9A0x9KhXjhYhpaBqeyMZPP8H1O4J2z+DKWHw/SA4vcPscHNNJwJNc3IlCvowtnstImMv833UKbPDcVmZmYqxP+7Fz8uDNx6o/feFZevCg7Ng5B5oMRxi1sBn98Hc++Hwr5CZv0eY04lA01xA30YVaRJUgg9WHCIhMcXscFzSd5Gn+PP4Jf7dvRalC/tlv1LRCtDpHaMUdufxcPkEfPMozGhqdDSnJTs05tzSiUDTXICIMP7BuiQmpzP+l4Nmh+NyLlxL5v3lB2kaXIJHG+diwBm/ItBimFHT6KHPjbuKfh4Bk+rCbx9D0iW7x5wXOhFomosIKVOYF9pV5cedp1l36LzZ4biUt37eT0p6Jh88FIpRLzOXPL2hXl94bhM8uRTKhcH694wnlpePgUvH7RZzXuhEoGkuZFj7alQvU4h//7hPj3FsI6sPnGf53nOMaF+NKqUK3dtORKBKW3h8EbywBeo8BJFzYWpDWPgkxJn7XJROBJrmQny9PJnQtz7xiSmM/+XA3TfQ7uhqUhqv/7SXGmUKM6RNnmtlZq9Mbeg93Sh013IkHNsAn3eAOV3h0C+mdCzrRKBpLqZexWIMaVOFhZFxbDisy09Y482l+0hITGVC3/r4eNn447JIOaME9kv7oeuHcPU0fPsYTG9stBbSbtr2eHegE4GmuaCRHUKoVroQY3/cq8ctuEcr9p7lp11nGNa+GqEVi9rvQL6FjVpGI3ZCnzngUwiWjYKJdWHDR3AjwX7HttCJQNNckJ+3Jx/3qcf5a8mMX6bvIsqr+OspjPtpH6EVijL0vmqOOainF9R92BhO86lfoGI4bHjf6FheNhoSjtrt0DoRaJqLalC5OM+3rcp3kaf4dd/Zu2+gAcbQk+MW7yUxJZ3/PVIfb0ePMyACQa3gse/gxT8htA/sXABTG8G3A4xnE2xMJwJNc2GjOlanXsWivPbjXs5dzZ8PM+U3i6LiWHXgPK90rk71MoXNDaZ0Teg1DUbtg9Yvw+ko8C5o88PoRKBpLszHy4PJ/RqQkpbJ6IW7dC2iuzgan8gbS/bTNLgEg1tVMTuc/1e4DHT4j5EQCtl+qF6rEoGIlBCR1SISbflZPJt17hORXVleybcGsBeReSJyPMuyMGvi0TTtn4IDCvJWz9psPprAZ5uOmR1OvpWclsHwr3fi520kT0+PPDw45iieXndf5x5Y2yJ4DVirlAoB1lqm/0YptV4pFaaUCgPaA0nAqiyrjLm1XCm1y8p4NE3LxiPhlehWtywTVh1mT9wVs8PJlz5ccYgDZ68xoW99yhbNoZaQi7I2EfQC5lvezwd632X9PsAKpVSSlcfVNC0PRIQPHgqldGE/XvhqB1eSUs0OKV9ZfeA88zaf4OmWwXSoVcbscBzO2kRQRil163aEc8DdfoP9gG9umzdeRPaIyMRbg9xnR0SGiEikiETGx8dbEbKmuadi/j5MH9CQC9eTGfWd7i+45cTFG7y8cBd1KxTh1W41zA7HFHdNBCKyRkT2ZfPqlXU9ZQx+nONfloiUA0KBlVlmjwVqAo2BEhiD2WdLKTVbKRWulAovVcr2nSWa5g7CKhXjjQfqsOFwPNPXx5gdjumSUtN5/qsoPDyEmQMa4evlnuM+37XnQSnVMadlInJeRMoppc5aPujv9Dz7I8BipdRfjzlmaU2kiMhc4JVcxq1p2j16vGlldsRe5pM1R6hfqRhtqrvnFyulFP9atIcj568zb1ATKpXwNzsk01h7aWgpMNDyfiCw5A7r9ue2y0KW5IEYdV17A/usjEfTtLu4NXZB9dKFGf7NTo7FJ5odkik+33ScZXvO8kqXGm6bDG+xNhF8CHQSkWigo2UaEQkXkc9vrSQiQUAl4Lfbto8Qkb3AXiAAeM/KeDRNywV/Hy8+HxiOp4fw9LztXL7hXp3H6w9f4IMVB+lWtywvtLVRVVEnZlUiUEolKKU6KKVClFIdlVKXLPMjlVLPZFnvhFKqglIq87bt2yulQpVSdZVSjyul3POriaaZoFIJf2Y/0YgzV5J5ISKK1HTjv2dEBAQFgYeH8TMiwtQwbW7/masMi9hBrXJFmNC3ft4GmnFR+sliTXNj4UEl+KhPKFuPXeL1n/YSEaEYMgRiY0Ep4+eQIa6TDM5evcnT87ZTpIA3c55qTEFf+zyg5Wx0ItA0N/dgg4oMb1+NhZFxDBudTtJtT/kkJcG4cebEZkvXktN4el4kN1IymPNUY8oUca+Hxu5EJwJN0xjdqTr9GlfiyoXsvyGfPOnggGwsKTWdp+duJ/r8dWYMaEitckXMDilf0YlA0zTLnUShFCqZfadx5coODsiGktMyGPJlFDtOXmZyvwZ5vkPI1ftMQCcCTdMsPD2EaRO98fTJ+Nt8f38YP96koKyUmp7JsK938HvMRf7bpz731yuXp+0jInDpPpNbdCLQNO0vA5/w4LPZ4F8iBVAElE1n9mwYMMDsyPIuOS2D57+KYs3BC7zbqw59GlXM8z7GjcNl+0yy0olA07S/GTTQk4RzXjz5xXYKDlxJRpUTZoeUZ4kp6Qyau531hy8w/sG6PNE86J72k1PfiLP3mdxOJwJN0/7Bz9uT2U82omOtMryxZD//W3UYo5xY/hd/PYUBn//JthOXmPhIGAOaBt7zvnLqG3HmPpPs6ESgaVq2fL08mfl4Qx4Nr8TUdTGM+m4XKekZd9/QRIfPXaf39D84fO4aMwc0pHeDClbtb/x4o48kK2fuM8mJTgSapuXI29ODDx8OZUyXGizZdYbHPvsz3459vPbgeR6euZm0jEy+f64FneuUtXqfAwbA7NkQGGiMKR8YiNP2mdyJOEtzL6vw8HAVGRlpdhia5lZ+2XOWMYt2U8Dbk0n9wmgdkj8KtaVlZPLxysPM3niMOuWL8PnAcMoVLWB2WPmSiEQppcJvn69bBJqm5cr99cqxdFhLShby4ck52/jo10Mkp5l7qSg24QZ9Z21h9sZjPN6sMj+80EIngXugE4GmablWrXRhfhrakkcaVWLmhqPcP2UTUbGXHR5HekYmn/52lC6TNnL0QiLTH2vIe71D8fN2z4FlrKUvDWmadk9+OxLPv3/cy5mrN+nbqCKjO9VwyKDvm6LjeX/5IQ6evUan2mV4t1ddtxts/l7ldGlIJwJN0+5ZYko6k9ccYf7mWDw8YGCLIJ5uGWzzgm5KKbafuMzUddFsir5IxeIF+Hf3WnSrW1aXkc4DuyQCEekLvAXUApoopbL9dBaRrsBkwBP4XCl1awCbYOBboCQQBTyhlLrrCBk6EWha/nLqUhITVh3m591n8PQQetavwCPhFWkcVAIPj3v/oL6enMbK/ef5cssJ9sRdpbi/N8Pah/B4s8puO76wNeyVCGoBmcCnwCvZJQIR8QSOAJ2AOGA70F8pdUBEFgI/KqW+FZFZwG6l1My7HVcnAk3Ln2ITbjDn9+N8HxVHUmoG5Yv60blOWZpVKUnT4BIUL+hzx+2VUhyNv8HWYwnMnZ/Bxq/LkX7ND99iKTw96jr/G1uCAj46Adwru14aEpEN5JwImgNvKaW6WKbHWhZ9CMQDZZVS6bevdyc6EWha/paUms7qA+dZsusMm49eJDnNGP2sVGFfqgQUpFRhXwr5euHj5UFSagbXk9M4dekmJxJukJSaQeL+8lxeWY/MtP//0Pf3d817+B0pp0TgiOF5KgCnskzHAU0xLgddUUqlZ5mf42OAIjIEGAJQ2dWe79Y0F+Pv40WvsAr0CqtASnoGu09dZcfJyxy9kMixizc4cOYaiSnppGZkUtDHC38fTyoUL0DTKiWoUaYwr0RUJDPt7zc13ir2phOB7d01EYjIGiC7R/TGKaWW2D6k7CmlZgOzwWgROOq4mqZZx9fLkybBJWgSXCLX2zx2Ovv5rlbsLb+4ayJQSnW08hingUpZpita5iUAxUTEy9IquDVf0zQ3V7myUfs/u/ma7TnigbLtQIiIBIuID9APWKqMzon1QB/LegMBh7UwNE3Lv9yl2Ft+YVUiEJEHRSQOaA78IiIrLfPLi8hyAMu3/WHASuAgsFAptd+yi1eB0SISg9Fn8IU18Wia5hrcpdhbfqEfKNM0TXMTuuicpmmali2dCDRN09ycTgSapmluTicCTdM0N6cTgaZpmptzyruGRCQeyOZxk1wJAC7aMBxnoM/ZPehzdn3Wnm+gUuofY4w6ZSKwhohEZnf7lCvT5+we9Dm7Pnudr740pGma5uZ0ItA0TXNz7pgIZpsdgAn0ObsHfc6uzy7n63Z9BJqmadrfuWOLQNM0TctCJwJN0zQ357KJQES6ishhEYkRkdeyWe4rIt9Zlv8pIkEmhGlTuTjn0SJyQET2iMhaEQk0I05buts5Z1nvYRFRIuLUtxrm5nxF5BHLv/N+Efna0THaWi7+riuLyHoR2Wn52+5uRpy2JCJzROSCiOzLYbmIyBTL72SPiDS06oBKKZd7AZ7AUaAK4APsBmrfts6LwCzL+37Ad2bH7YBzvg/wt7x/wR3O2bJeYWAjsBUINztuO/8bhwA7geKW6dJmx+2Ac54NvGB5Xxs4YXbcNjjvNkBDYF8Oy7sDKwABmgF/WnM8V20RNAFilFLHlFKpwLdAr9vW6QXMt7xfBHQQEXFgjLZ213NWSq1XSiVZJrdiDA/qzHLz7wzwLvARkOzI4OwgN+f7LDBdKXUZQCl1wcEx2lpuzlkBRSzviwJnHBifXSilNgKX7rBKL+BLZdiKMexvuXs9nqsmggrAqSzTcZZ52a6jjFHUrmKMkuascnPOWQ3G+EbhzO56zpYmcyWl1C+ODMxOcvNvXB2oLiJ/iMhWEenqsOjsIzfn/BbwuGW0xOXAcMeEZqq8/n+/o7sOXq+5HhF5HAgH2podiz2JiAfwCfCUyaE4khfG5aF2GC2+jSISqpS6YmZQdtYfmKeU+p+INAcWiEhdpVSm2YE5C1dtEZwGKmWZrmiZl+06IuKF0aRMcEh09pGbc0ZEOgLjgJ5KqRQHxWYvdzvnwkBdYIOInMC4lrrUiTuMc/NvHAcsVUqlKaWOA0cwEoOzys05DwYWAiiltgB+GMXZXFmu/r/nlqsmgu1AiIgEi4gPRmfw0tvWWQoMtLzvA6xTll4YJ3XXcxaRBsCnGEnA2a8dw13OWSl1VSkVoJQKUkoFYfSL9FRKOeuA17n5u/4JozWAiARgXCo65sAYbS0353wS6AAgIrUwEkG8Q6N0vKXAk5a7h5oBV5VSZ+91Zy55aUgplS4iw4CVGHcdzFFK7ReRd4BIpdRS4AuMJmQMRqdMP/Mitl4uz/ljoBDwvaVf/KRSqqdpQVspl+fsMnJ5viuBziJyAMgAxiilnLalm8tzfhn4TERewug4fsrJv9QhIt9gJPQAS9/Hm4A3gFJqFkZfSHcgBkgCBll1PCf/fWmapmlWctVLQ5qmaVou6USgaZrm5nQi0DRNc3M6EWiaprk5nQg0TdPcnE4EmqZpbk4nAk3TNDf3f9EdoecPxp74AAAAAElFTkSuQmCC\n",
"text/plain": [
- ""
+ ""
]
},
- "metadata": {},
+ "metadata": {
+ "needs_background": "light"
+ },
"output_type": "display_data"
}
],
@@ -212,24 +228,27 @@
},
{
"cell_type": "code",
- "execution_count": 7,
+ "execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
- "Fitting Parameters: [ 2.21147559e+01 -3.34560175e+01 1.13639167e+01 -2.82318048e-02]\n"
+ "parameters initialization: [0.80748537 0.26355156 0.15225828 0.69983347]\n",
+ "Fitting Parameters: [ 23.51836106 -35.90542618 12.59021898 -0.18343697]\n"
]
},
{
"data": {
- "image/png": "\n",
+ "image/png": "\n",
"text/plain": [
- ""
+ ""
]
},
- "metadata": {},
+ "metadata": {
+ "needs_background": "light"
+ },
"output_type": "display_data"
}
],
@@ -247,26 +266,30 @@
},
{
"cell_type": "code",
- "execution_count": 12,
+ "execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
- "Fitting Parameters: [-1.70872086e+04 7.01364939e+04 -1.18382087e+05 1.06032494e+05\n",
- " -5.43222991e+04 1.60701108e+04 -2.65984526e+03 2.12318870e+02\n",
- " -7.15931412e-02 3.53804263e-02]\n"
+ "parameters initialization: [0.91227185 0.61186299 0.8670106 0.46130526 0.24519055 0.66403862\n",
+ " 0.80093522 0.93778575 0.87814087 0.20415735]\n",
+ "Fitting Parameters: [ 1.39123363e+04 -6.39746319e+04 1.24383137e+05 -1.32914986e+05\n",
+ " 8.47957513e+04 -3.27261066e+04 7.35983355e+03 -8.84389244e+02\n",
+ " 4.92269900e+01 -1.32799082e-01]\n"
]
},
{
"data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAD8CAYAAACfF6SlAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzs3XdcVfX/wPHXhw2CKMOJDDcOcOA2NbfmyNTSaGlmtve0bHzTsmXDsmxZSWpZOXLmym2CW3ALiAsQRZENn98fB/2pgazLvVx4Px8PHvdyzrnnvM9F3/dzP+dz3h+ltUYIIUTlYmPpAIQQQpifJH8hhKiEJPkLIUQlJMlfCCEqIUn+QghRCUnyF0KISkiSvxBCVEKS/IUQohKS5C+EEJWQnaUDKIiXl5f29/e3dBhCCGFVIiIiErXW3oVtV26Tv7+/P+Hh4ZYOQwghrIpSKqYo20m3jxBCVEKS/IUQohKS5C+EEJWQJH8hhKiEJPkLIUQlJMlfCCEqIUn+QghRCZXbcf6inEhJgMSDkHQc0i9A5mWwcwJHV6jmD95NwN0HlLJ0pEKIYpDkL66XmQqHV8DRNXDsH7hQhPtFqtaFhr2g2VCo3xNs5AulEOWdJH8BWsOJfyH8ezjwF2SmgJM7+N8C7cdDjUDwbADOHuBQBbIzID0Zko5BfCREb4D9C2HHT1DdHzpMgJCxYOdo6TMTQhRAaa0tHUO+QkJCtJR3KGO5ObD/T9jyBZzaAY7u0GwIBN0Jfl3Axrbo+8rOhKhFsP07iN0M7r7QaxK0HCFdQkKYkVIqQmsdUth28v28MtIaohbDjM7w+4NGK37gh/BcFAydDgHdipf4AewcjEQ/dhncuwBcqsMf42DOaMK+u4C/v9Eb5O8PYWFlcVJCiOKQbp/K5sR2WP4SnIwAz0Yw4gdodrtp++kb3Gp8gGz7irCP9zJ+oQOpWcaqmBgYP954HhpqukMKIYpHWv4VWFgY/9/i9s0h7Nkf4LvecPE0DP0CHt0KLe4omwu0NrbQ6TEmbv2c1CyX61alpsLEiaY/pBCi6KTlX0GFhRkt7NRU4/eYE7aMn34XPOtK6CsDjKGaZhB70j7/5bFmObwQogDS8q+gJk78/8R/RWqWCxPnjjRb4gfw9S1geb3yOdBAiMpCkn9FpDWxsfknV3O3uCdPBpfre31wsU9l8sgw48KzEMIiTJL8lVLfK6XilVL7ClivlFKfKaWOKKX2KKXamOK4Ih8XT8HsO/CteiLf1QW1xMtKaCjMnAl+fsaITz8/mPnsEkJdH4O1U8wbjBDiKlO1/GcB/W+yfgDQKO9nPDDDRMcV1zq4DGZ0gditTH7uCC4u17esXVyMlri5hYZCdDTk5hqPoe+OgNb3wvr3jfsChBBmZ5Lkr7VeDyTdZJOhwE/asBWoppSqbYpjC4w7bpe9BHNGGXV2Ht5A6Os9mTlTXd/inllOhlcqBYM+gUZ9YdmLEL3R0hEJUemYa7RPXeDafoi4vGWnzXT8iivxMMwfA2f2QodHoM9bV8sqhIb+f7K/lJ5FeMx5vlh7kSPxKRxPvEzS5UwupGaSlaOxs1E42tvg5epIjapO1PeqQrPaVWnp406Tmm7Y2Jj4Ll1bOxj+LXzTC369D8avg2pm7pMSohIzV/LPL3P852qfUmo8RrcQvubunLZGu+fCX88ayX70XGgy4LrVsedSWbL3NMv3n2Fv3AVy897xOu5O1Pd2xc/ThWrO9jjY2ZCTC+nZOSRcyuDsxXR+DU8iNTMHAC9XB7o29KJ/i9rc2tQbR7ti3v1bECd3GD0HvukJc0Nh3CqpBySEmZgr+ccB9a753Qc4deNGWuuZwEwwavuYJzQrdKWbJ+IHowbPHd+Ae10AMrNzWbbvNLM2R7Mz9gIAwfWq8fitDelQ35PgetVwdSz8z56bq4lJSiUi5jwbDyew4XAiC3adwt3ZnsHBtXmgcwANa5hgyKhXIxj2NcwdDavehP7vln6fQohCmSv5LwIeV0rNBToAyVpr6fIpieQ4o5vkZAR0eQp6TgJbO9Iyc/hxSzTfbTxOwqUM6ntV4ZUBTRnYsjb1PFwK3e2NbGwUAV5VCPCqwoi2PmTn5LLxSCJ/7jzJb+FxzN4aS+/AGjzSowFt/TxKd05NBxrVQ7d+CQ16QqM+pdufEKJQJqnqqZSaA/QAvICzwBuAPYDW+iullAKmY4wISgXGaK1vWrJTqnrm49g/Rv9+dgbc/iU0G0pWTi5zt5/g89WHib+UwS2NvBjbNYDujbxN30+fJzElg5+2xPDzlmjOp2bRt1lNXhrQlAbepfgmkJVudP+knIVHNoNbTZPFK0RlUtSqnlLS2RpoDZs+hdVvGcXY7poN3o0Jj05i4p/7OHj2Eu38q/NCv6a0DyhlK7wYUjOz+X7jcWasO0p6di73dfLj+b5NqFKEbqV8xUfB192hcV/jHIUQxSbJv6LISoOFj8G+343qm0Onc0k7MWVpFHP+PUEddyfeGNKcvs1qoixUNz/hUgYf/32IOf/GUreaM1PuaEn3xt4l29mGj40PuTt/MmYGE0IUiyT/iuDSGZgzGk7tNCZG6foMEbHneXreLk6eT2PcLfV5qlejkre0TSw8OomXft/D0YTL3N3Bl0mDmuFkX8yRQTlZRvfPpTPw2DZwMd83GSEqApnMxdqd3g0zb4WEA3DXbHK6PMMnqw9z59dbAfhtQideHRhYbhI/QIi/B0ufuoWHu9fnl22xDJ2+icNnLxVvJ7b2RrnptCRYIXWfhSgrkvzLo8hF8H1/UDYwdgXJfv0YO2s7n6w6zJDgOix98pbSj7ApI452trwyIJAfx7YnMSWDwdM3smDnyeLtpHYQdHkadv8CxzeUTaBCVHKS/MubbV8bQzlrNIOH1nDYJoChX2xk89FEpgxrybS7WuHmlH+N/PKke2Nvlj11C0E+1Xh63i4+WHGA3NxidDF2e96YB3jZS5CTXXaBClFJSfIvL7SGVW8ZtW6aDIQH/mLDGRuGfbmZlIwcfnmoI3d3sK67nmtUdWL2gx0Y1a4eX6w9yiNhEaRmFjGR2ztD/ykQvx/CpfibEKYmyb88yMmGhY/Dxo+h7QNw508s3J/E2Fnb8anuzKLHu9DOv3x28xTGwc6Gd+9oyaRBzfg78ix3f7ON85czi/bipoOg/q2wZjKkJJRtoEJUMpL8LS0rHebdA7tmQ/eXYdAnfLs5lqfm7qKNb3XmPdyJOtWcLR1lqSilGNs1gK/uaUvk6Yvc+fUWziSnF+WFMOB9yLpsDP8UQpiMJH9LykozatocWga3fQS3vsJna47wzpIoBrSoxY9j2+PuXP7794uqb/Na/DimPaeT0xk+YzPRiZcLf5F3Y+gwAXaFwdnIsg9SiEpCkr+lZKbCL3fB0bXG0MZ24/hk1SE+/vsQw9v4MP3uNsUfI28FOjXwZM5DHUnLymH0N1uJPZda+ItueQ4c3YzCb0IIk5DkbwmZl+GXOyF6Awz7ClrfwyerDvHJqsOMaOvD+yOCsC2jujzlQUsfd8LGdbj6AXAiqZAPABcP6PosHF4hE78IYSKS/M0tO8Po44/ZZJRiDh7F9DWHryb+qcMrduK/IrB2VWY/2IFL6Vnc/e1WTl5Iu/kLOjwMVevC35Nk4nchTECSvznl5sAf4+HoGhj8GbQcQdi2GD5ceYhhretWmsR/RYu67swe14ELqVmEfrOVxJSMgje2d4ZbJxqlrCMXmC9IISooSf7mojX89YyRuPpOhjb3smzvaV5fsI+eTWtU+K6eggT5VGPWmPacuZjO2FnbuZxxk/sAgkdBjeaw+n9y45cQpSTJ31z+eR92/Ai3PA+dH2fz0USemruL1r7V+eLuNtjbVt4/RVs/4z3Yf+oij4TtICsnN/8NbWyh50RIOgp7fzVvkEJUMJU345jTvt9h3RQIvht6vkbU6YuM/ykCfy8Xvrs/BGeHijeqp7h6BdZkyrAWrD+UwEvz9xRcCqLJQKgVZHyYSutfiBKT5F8GwsLA3x9sbMDfJ4Owt/4G384w+BMSUjIZ92M4ro52/Di2PdVcHCwdbrlxVztfnuvTmD92nuSzNYfz30gp6PEKnD8Oe+aaN0AhKhBJ/iYWFgbjx0NMjNHNH3PSkfELpxHGr6RrO8b/HE7S5Uy+vT+E2u7WfeduWXi8Z0OGt/Hhk1WHWbKngGmemwyA2q3yWv9Z5g1QiApCkr+JTZwIqTcMW0/NcubVt1x56fc97Iy9wLS7gmlR190yAZZzSimm3NGCNr7VeO63Xew7mZzfRkbr/0IM7J5j/iCFKAmtIf4ApCZZOhJAkr/JxcYWsPwELNx1ihf6NaF/i9rmDcrKONrZ8vW9IXhWcWTcj+HEX8ynDlDjflCnNaz/ALKLWChOCEvJzoS5ofBlB5jWwriz38Ik+ZuYbwFVl23d0ri9VR0e7dHAvAFZKW83R765L4TktCzG/xxBRnbO9Rtcbf3Hwt7fLBOkEEW17l04uMQoVVLN17jfJz2fb7VmJMnfxCa/moCL/fX9Pso+h+ZDYnn3jiCLTbJujZrVqcrHdwaz68QFJi+J+u8GjfpCzRaw6RPILWB4qBCWdvE0bP4cgkcbc3Hf/gVcjocdP1k0LEn+ppSbQ6jdvcy8/UX8fLJRSuNUPZ26g/ex6JN6MqSzBAa0rM24rgH8tCWGhbtumA5SKej6DCQeMlpVQpRH/86E3Gzo/pLxe922xui/iB8tWqpEkr8pbf4cYrcQ+moXok/Y8cKve6k5fjWz3qmFn2cVS0dntV4a0JR2/tV5+fe9HLpxQvhmt0N1f9jwsdT8EeVPThZEzIKmt4FHwP8vbzkczh2GhAMWC02Sv6mc2Qtr3oHAwRA8innbY5kXfoLHb21Ir8Calo7Oqtnb2jD97jZUcbRjwuwIUq4tAWFrB52fhFM74Ph6ywUpRH6OrYO0JGh9z/XLm9xmPB7+2+whXSHJ3xRysuDPCeBcHQZ9ysGzKUxauJ+uDb14pk9jS0dXIdSs6sTno1sTnXiZl3/fg762ld8qFFxrGtNgClGe7J0PTu7QoOf1y6vWBo8GELvFMnEhyd80tkyHs/tg0MekO1TjiTk7cHOyZ9pdrSplsbay0qmBJ8/1bcJfe07zW3jc/6+wd4KOjxqtrJM7LBafENfJzYFDy41Wvp3jf9f7dTKSv4UGK0jyL62k47BuqjHZeOBgJi+J4tDZFD66Mxhvt3z+4KJUHunegM4NPHlj0X6OJqT8/4qQseDoDhunWS44Ia51cgekX4BGvfNf79sZ0s5D4kHzxpVHkn9paA1LngUbOxjwPiv2n+HnrTE8dEsA3Rt7Wzq6CsnGRvHxna1wsrfhqbk7yczOazU5VYV2YyFqMSQds2yQQgAcWQXKBurfmv/6eu2NRwt9W5XkXxr7fjcmZuk1iVPagxfn76FlXXde6NfU0pFVaLXcnZg6PIh9Jy/y0cprWk3tHzY+iLd+ZbnghLjiyCpjWKeLR/7rPeqDnTOc3W/euPJI8i+pzMuw8nWo05qctmN5Zt4usnJy+Wx0axzs5G0ta32b1+Kejr58vf4YGw4nGAur1oaWI2DnbOPrtBCWknbBGIF244Xea9nYQo2mEC/J37ps/AQunYL+U/l2Uwzbjifx1pDmBHjJeH5zmTiwGY1quPLsr7tJupxX36fjo5B12biBRghLidsOOhf8utx8u5rNpeVvVS6cgM2fQYvhHHJsxkcrD9G3WU1GtPWxdGSVirODLZ+Nbs2F1ExeX7jPWFg7CAK6wbavpdyzsJzYraBswSfk5tvVaA6XEyAl3jxxXcMkyV8p1V8pdVApdUQp9XI+6x9QSiUopXbl/YwzxXEtZtUbAGT1fINnf92Fq5MdU+5oKXV7LCCwdlWe7t2YJXtOs3j3KWNhp8eNb2X7ZaJ3YSGxW42GiEMhPQE1Ao1HC9zpW+rkr5SyBb4ABgDNgNFKqWb5bDpPa90q7+fb0h7XYmK3Ghd6Oz/JFzsy2HfyIlOGtcDLVYZ1WsrD3eoTXK8ary/cR/yldGjYBzwbwZbPpeSDML/sTDgZAb6dCt/Wo77xmHS8bGPKhyla/u2BI1rrY1rrTGAuMNQE+y1/tDYu8rrVZl/AWKavOcLtrepIfX4Ls7O14aORwaRl5vDqH/vQSkGnx+D0bojZZOnwRGVzZg9kp0G9DoVv6+4DNvYWGZ5siuRfFzhxze9xectuNFwptUcpNV8pVS+/HSmlxiulwpVS4QkJCSYIzcQOrYC4f8nq+gLPLjiEp6sDbw1pYemoBNCwhisv9GvCqqiz/L7jJASPAmcP2PKFpUMTlU3sVuPRt2Ph29rYQnU/Y05qMzNF8s+vo/vG79qLAX+tdRCwCsh3KIbWeqbWOkRrHeLtXc5uksrNhTX/A4/6fJzYnkNnU3hveBDuLvaWjkzkGdMlgPb+Hry1eD+nU4F24+DgMkg8YunQRGVyMhzcfcGtVtG296hvtS3/OODalrwPcOraDbTW57TWGXm/fgO0NcFxzWv/H3B2H9FBT/P1xlhGtavHrU1qWDoqcQ1bG8UHI4PIztG8/PtedLsHwdYetslNX8KMTu2Euq2Lvr1HfaPP38zXp0yR/LcDjZRSAUopB2AUsOjaDZRS13aKDwHymZapHMvJgrWTya3RjEd2+lHDzYlXbwu0dFQiH36eVXipfxP+OZTAgiPZ0GK4Mcm7hafME5VE2nk4Hw21WxX9NdUDIDMFLieWWVj5KXXy11pnA48DKzCS+q9a6/1KqbeVUkPyNntSKbVfKbUbeBJ4oLTHNatdv0DSMZZ4jSPq7GX+d3sLqjpJd095dW8nf1r7VuPtxZFcaDnW+I+16xdLhyUqg9O7jcc6xWj5V8ub+Ds51vTx3IRJxvlrrZdqrRtrrRtorSfnLZuktV6U9/wVrXVzrXWw1vpWrbXlpq8pruwM+Gcq6TXb8Nzu2twWVJs+zWRylvLM1kYxdXgQKRnZvBFub4y62Pa1zPMryt6pncZj7eCiv6ZqHeMx+eTNtzMxucO3MLt+gYsn+SDzDpwd7HhzcHNLRySKoHFNNx7t0ZCFu06xv95oYzTFEcvNmiQqiVO7oJpfwcXc8uOeVxng4qmbb2dikvxvJicLNn5MonsLvjsdwGu3BUqNfivy6K0NaFjDlQnhdch1rWW0/oUoS6d3Fa/LB8DFE+yc4GJc4duakCT/m9k7Hy7E8saFgXRt6C21e6yMo50tU4cHEXcpm3Vug+Hoakg4ZOmwREWVmmRc7K1TjIu9AEoZXT/S7VNO5OagN3xErEMDVue2Zsowqd1jjdr6Vee+jn68GN2G2XtH4d/cCxsb8PeHsDBLRycqlJJc7L2ial3p9ik3Iheizh3mvZTbeK5PU3w9XSwdkSihF/o3JfloEOMWf0pMggdaQ0wMjB8vHwDChM7sNR5rBRX/tVXrwkVp+Vtebi45/3zAceoSV6sXY7r4WzoiUQqujnZc3BhIRpbTdctTU2HiRAsFJSqe+Chwq128i71XuOe1/HNzTB9XAST55+fQcmwTIvk8awj/G9YKO1t5m6xdwhnbfJfHmndotajI4iP/v0RzcbnVBp0DqedMG9NNSFa7kdakrn6PWO2Na8hdBNerZumIhAn4+hZvuRDFkpsDCQfBu4TJ3zWvVEzKWdPFVAhJ/jfIPbIGl4Td/GR7B8/1k4qdFcXkyeByw2UbFxdjuRCldj7aKONc0pa/a96No5L8LSdh2RROaw9aDJggFTsrkNBQmDkTfH01oPFzj2X61BOEhlo6MlEhXJmJq0Z+81gVwdWWv/mmc5Tkf43kg+upmRTOCvc7GRoSYOlwhImFhkJMjCIi8ggHn2pDkM0blg5JVBTxkcajd5OSvb6KdPtY1MnFU0jSbnS981kZ01+BtQlsxO7qfWh69i+OxJr3rkpRQcVHGWUdHF1L9npHV3BwhRTzTWIlyT/P3oiNNEvZwr56d9PQRwq3VXSNBz9HFZXBpt8+Rcs8v6K04qNK3uVzhWsNafmbW1ZOLonLp3IZZ0JGvmjpcIQZVGsQQnz1Ntya/Ce/R8h4T1EK2ZmQeAhqNC3dflxrSvI3t/kr/6Fb5gbim9yDi7uXpcMRZuLV6yl8bRLYvDSMi+lZlg5HWKuko5CbbaKWv1zwNZszyenYbf2MXGVHwKAXLB2OMCObwEFkVqnDiOzFfPL3YUuHI6zVlYu9JR3meUUV6fYxqxmL1jOUf0hvGQpu0tdfqdja4dDpYTrbRLJ16z8cPHPJ0hEJaxR/AJQteDYq3X6qeEH6BcjJNk1chajUyX97dBL1Dn6PnQK3ns9aOhxhCW3uQ9s5M85+JW8u2i8Xf0XxxUeCZwOwdyp825tx8TQe086XPqYiqLTJPydX89Gfmwm1W0NuyxFQ3c/SIQlLcPFABY9iiM0mDhw7zpK9py0dkbA28VGl7/IBcK5uPKYllX5fRVBpk//c7bH0ODcHJ7Kw6/a8pcMRltRhAna5GTxVbTOTl0SRmmmer92iAshKg6RjJa/pc60rLX8zFXerlMn/Qmom3y/fxhi7vyFoBHg3tnRIwpJqNIUGPbnbZgUJySl8sfaIpSMS1iLhIKBN0/K/Ugpakn/Zmfb3Ie7O+gMHlY3q/rKlwxHlQYdHcEg9y6T6h/lm/XGiEy9bOiJhDeKjjMfSDvOEa1r+0u1TJqJOX2TF1l3cZ78aFTzauFAjRMPe4NmQ0blLsLdVvP1XpKUjEtYgPhJsHcCjfun35Swt/zKjtebNRft51nEhdioXpK9fXGFjA+0fxv7MDia3S2PNgXhWR5lvzLWwUgkHwKsJ2NqVfl8OLmDnLBd8y8KSvadJit7DCFahQsaCh1TuFNdoNRocqzI4bRENvKvw9l+RpGeZb1o9YYVMNdLnChdP6fYxtdTMbKYsiWJKlXkoJzfo8YqlQxLljaMbtL4X2wOLmNLbk5hzqXy38biloxLlVfpFSD5R+po+13KpLsnf1GasO0qTS1tolx2B6v5SySZZFhVfh/Ggc+mQ+Cf9m9di+pojnLqQZumoRHlU2glc8uPiKX3+phR7LpXZ6/fxscss8G4K7R6ydEiivKruD00GQvgPvNbPn1ytmbI0ytJRifLIVDV9ruXsIcnflN5ZEslLtnOolp0IQ6aDnYOlQxLlWYcJkJaET9wSJnRvwF97TrP1mHn+QworEh8F9lXA3dd0+3TxlAu+prL+UAJ2BxYySv2N6vQY1Gtn6ZBEeeffFWq2gK0zeKR7fepWc+bNRfvJzsm1dGSiPImPNPr7bUyYRl08IO0C5Jb9QIMKl/zDwsDf3/h7+Plp3p60lw8dZpJbNwR6yZytogiUgo6PQHwkTrH/8NptgRw4c4lf/pVJX8Q14g+YtssH8m700sYHQBmrUMk/LAzGj4eYGNAaYmMVm//oyLyD92Fz18/S3SOKruVIcK0Fmz+jf4tadGnoyUcrD5F0OdPSkYny4HIiXI43TU2fa5nxRq8KlfwnToTU1OuXpWW58ObGd6FqHcsEJayTnSN0nADH1qFO7+aNwc1Jycjmw5UHLR2ZKA+ulnUwdfLPq+yZbiUtf6VUf6XUQaXUEaXUf4rlKKUclVLz8tZvU0r5m+K4N4ot4Fv5iZMmuPtOVD5tx4CDG2z+nMY13bi/kz9z/o1l38lkS0cmLM2UNX2u5VzNeLSGbh+llC3wBTAAaAaMVkrd+I48CJzXWjcEpgFTS3vc/PgWcNG9oOVC3JRzNWh7P+z/E87H8FTvRni4OPCGTPoi4iPBqRq41TLtfp3cjUcrafm3B45orY9prTOBucDQG7YZCvyY93w+0EsppUxw7OtMngwuLtf/p3RxMZYLUSIdHzUuAG/9Endne17s34SImPMs2HXS0pEJS4qPMlr9pk5jTkbLP/zA8TJvYJgi+dcFTlzze1zesny30VpnA8mA5407UkqNV0qFK6XCExISih1IaChM/igDe/c0lNL4+cHMmcZyIUrEva5x8XfHT5CaxMi29QjycefdpQdIyZBJXyolrfOSvwnLOuRJynUG4Hhc2TcuTJH88/vou/EjqyjboLWeqbUO0VqHeHt7lyiYpyc4kXjGjuxsiI6WxC9MoPMTkJUK27/Dxkbx1pDmxF/KYPoamfSlUrp0GjKSTd/fD3y0+jip2pHeAY6UQefIdUyR/OOAetf87gOcKmgbpZQd4A6U2W1sVZ3ssbEp2zdOVCI1m0PDPvDv15CVRmvf6oxo68N3G49xLCHF0tEJcyuLsg5A5KmLzPwhm6af7sRz+Pv4+xvD18uKKZL/dqCRUipAKeUAjAIW3bDNIuD+vOcjgDVarpgJa9LlSbicALvnAPBi/yY42tnyP5n0pfK5MtLHhGP8tdaMee0M55YHcSK5LlorYmKM+5bK6gOg1Mk/rw//cWAFEAX8qrXer5R6Wyk1JG+z7wBPpdQR4FlA5k4U1sX/FqjTBjZ+AjlZ1HBz4unejVh7MEEmfals4qPAtSZU+c9lyxJbsvc0Eb/7oLNsr1uemmrcv1QWTDLOX2u9VGvdWGvdQGs9OW/ZJK31orzn6VrrkVrrhlrr9lrrY6Y4rhBmoxR0ewEuxMDe3wC4r5P/1UlfMrJl0pdKIz7SpF0+V+YaybnonO/6gu5fKq0KdYevEGWqyQCo2RI2fAS5OTjY2fDG4ObEnEvl2w0y6UulkJtr1PQxYZfPV+uOcio5nVp18i8cWFb3KUnyF6KolILuL8C5I8aNX0C3xt70bVaT6WuOcDpZJn2p8C5EQ3aayVr+sedS+Wr9MYYE1+HD921xcbl+fVnepyTJX4jiaDrYaPWt/8BoBQKv3daMHK15b9kBCwcnyly8aWfvemdJJHY2ilcHBhIaatyX5OdntDPK+j4lSf5CFIeNDXR73pjCL8oY1Obr6cKEbvVZuOsU/x43z0QcwkKuDPP0blLqXa0/lMDKyLM8dmtDark7AUaij4422hVlfZ+SJH8hiqv5MPBseF3r/5EeDanj7sQbi/aTkyujmCus+Chj5i6nqqXaTWZ2Lm8u3o+fpwvjbgkwUXDFI8lfiOKysYVbnoez++DAXwA4O9jy6m2BRJ2+KJO+VGTxUSbp7/9pSzTHEi4zaVAzHO1sC92+LEgHweOZAAAgAElEQVTyF6IkWo4Ez0awdvLVKfdua1mbjvU9+GjlQc7LpC8VT04WJB4qdU2f+EvpfLLqMLc28aZXYE0TBVd8kvyFKAlbO+g50ej73/MrAEop3hzSnEvp2Xz0t0z6UuGcOwq5WaW+2Pv+8oNkZOfw+iDT1wYqDkn+QpRU4FCoHQzrpkC20dJvWqsq93b045dtsew/JZO+VCgmqOmzI/Y88yPiGNs1gPreriYKrGQk+QtRUjY20HMSXIiFHT9eXfxM78ZUc3HgrUWRMulLRZJwAJQNeDUu0ctzczVvLtpPDTdHnujZyMTBFZ8kfyFKo2Ev8OsC/7wPmZcBcHex54V+Tfg3OolFu28scCus1tn9UD0A7PMvw1CY+RFx7IlL5pWBTXF1tPzUspL8hSgNpaDXJLgcD9u+urr4zpB6tKhblXeXHuCyTPpSMcRHQs2S9dMnp2UxdfkB2vpV5/ZWN851ZRmS/IUoLd+O0GQgbJgGKfEA2OZN+nLmYjpfrJVJX6xeZiokHYcazUv08k9XHSYpNZO3hjQv80laikqSvxCm0Od/Rs2XNe9cXdTWz4M7Wtfl2w3HiU68bMHgRKklHAB0iVr+B85c5Mct0Yxq50uLuu4mD62kJPkLYQpeDaH9w8Zcv2f2Xl388oCm2NsqmfTF2l0d6VO8lr/WmtcX7KOqkx0v9it9SQhTkuQvhKl0fwGcq8PyV4xJvoEaVZ14slcjVh+IZ+2BeAsHKErsbCTYOYFH8Uox/L7jJNujz/NS/6ZUr+JQRsGVjCR/IUzFuTrc+ipEb4ADS64uHtMlgPpexqQvmdn512wX5Vx8pFHMzabopRiSU7N4d2kUrX2rcWdIvcJfYGaS/IUwpbZjwLsprHjFuEgIONjZMGlwM44nXub7TTLpi1WKjyx2l88HKw9wPjWTd25vgY1N+bjIey1J/kKYkq0d3PaxcePXP1OvLu7RpAa9A2vw+erDnL2YbsEARbFdPgcpZ4t1sXdP3AXCtsVyXyd/mtcpPxd5ryXJXwhT8+8Cre+BLdONG4PyvD6oGVm5Wi7+Wpv4vL9hEWv65OQaF3m9XB15tm/J7gY2B0n+QpSFPv8DJ3dY/PTVmv9+nlV4rEdD/tpzmnUH5eKv1YiPMh6LmPzn/BvL7rhkXrstkKpO9mUYWOlI8heiLLh4QN/JEPcv7Jh1dfGEHvWp712F1xfuIy0zx3LxiaI7u9+4mO9Wq9BNE1My+GDFQTrV92RIcB0zBFdykvyFKCvBoyCgG6ycBOdjAHC0s+Wd21twIimNz9cctnCAokiuXOwtwp257y0zynm8PbT83MlbEEn+QpQVpWDIdOP5wseudv90buDFHW3qMnP9MQ6dvWTBAEWhcnONbp8iXOzdeuwc8yPiePCWABrVdDNDcKUjyV+IslTdD/q/a4z9v6bw28SBgbg62THxz73kypy/5VfyCchMKbS/Pz0rh1f/3Es9D2ee6mX5cs1FIclfiLLW+h5oPABWvQnxBwDwdHXk1QGBbI8+z28RJywbnyjY1bION0/+X647yrGEy0y+vSUuDpYv11wUkvyFKGtKwZDPwNEN5o+5evPXyBAf2vt7MGXpARJTMiwcpMjXlaG6N5m96/DZS8xYd4RhrevSrbG3mQIrPUn+QpiDaw24YybERxH20lz8/cHWVrFpcgfO7PBmypIoS0co8hMfCe6+4FQ139W5uZqX/9hLFUc7Xrut5NM7WoJ1fD8RoiJo2IuwrFmM/6IvqVnGolNxNtgnBPNT7m6GtUnglkbW03KsFM7shVotClz9y7+xRMSc58ORwXi6OpoxsNKTlr8QZjRx9lBSs1yuW5aVYUPKxkBe/n2vzPpVnmRehsTDUCso39VnL6YzddkBujT0ZHib8jE7V3FI8hfCjGJP5D/2OzPZkVPJaXyw4qCZIxIFOhsJaKjVMt/Vby7aT2ZOLpNvb1nux/TnR5K/EGbk61vQcsX9nfyZtTma7dFJ5g1K5O/MHuOx9n9b/sv3nWHZvjM83bsx/l5VzByYaZQq+SulPJRSfyulDuc9Vi9guxyl1K68n0WlOaYQ1mzyZHC5vtcHF4d0Jv8vmxf6NcGnujMvzd9DepaUfrC4M3vAqRq4X1+LP+lyJq8t2EuLulUZd0vxJncpT0rb8n8ZWK21bgSszvs9P2la61Z5P0NKeUwhrFZoKMycCX5+xghQv1qXmDnoMUKdH6aKveK9O4I4lniZT1ZJ6QeLO7PX6PK5oUvnzUX7SU7L4sORwdjbWm/nSWkjHwr8mPf8R+D2Uu5PiAovNBSio43KAdGn3Qh9pgXsmw+Ln6RrAw9GtavHzPVH2RN3wdKhVl452cYY/xsu9i7fd4ZFu0/xZM9GNK2V//BPa1Ha5F9Ta30aIO+xRgHbOSmlwpVSW5VS8gEhxLW6PgPdX4Kds2HJs7w6oDE13Jx4cf4emfbRUs4dgez06/r7r+3umdCjgQWDM41Cx/krpVYB+dUynViM4/hqrU8ppeoDa5RSe7XWR/M51nhgPIBvQVfGhKiIerwCOVmw8WOqpiUxZchkxs7ew2erD/N8vyaWjq7yuXKx95qRPm/kdff8/GAHq+7uuaLQ5K+17l3QOqXUWaVUba31aaVUbSDfGSq01qfyHo8ppdYBrYH/JH+t9UxgJkBISIhUuxKVh1LQ+w1w8YSVE+mZdp57Wr3Cl+uOcGvTGrT1y3cshSgrZ/aArSN4GTNxLd93msW7T/Fsn8YE1rbu7p4rSvvxtQi4P+/5/cDCGzdQSlVXSjnmPfcCugAyj50Q+en8OAz7GmI281bCk3Sseo7nft0lN3+Z2+k9Rj0fW3sSLmUw8c99NK9TlUcqQHfPFaVN/u8BfZRSh4E+eb+jlApRSn2bt00gEK6U2g2sBd7TWkvyF6IgwaPg3gXYpp3n59xXaHzhHyYvldo/ZqP11ZE+WmtenL+blIxspt3VqkJ091xRqjPRWp/TWvfSWjfKe0zKWx6utR6X93yz1rql1jo47/E7UwQuRIUWcAs8/A+23o2YaT+NxhFvs25ftKWjqhwunoS0JKgdzOytMaw9mMArA5rS2AomaCmOivMxJkRF4+4DY5aT3X4CD9itJGD+AJIPb7Z0VBXfyR0AnHBuzDtLoujW2Jv7O/tbNqYyIMlfiPLM3gm7gVOJGTQXe52BW9hA9OJnIFVKQJSZUzvQNnY8uTabKo52fDgiyCpr9xRGkr8QVsAvZAAru//JrOx+6B0/wudtYft3xvBQYVond3DWqQE7T6Xz7h0tqVHVydIRlQlJ/kJYift6BLG+wXMMyXqXy9UawZJnYXo72D0PcqUWkEnk5pIdt4PVl3wY1a4e/Zrnd4tTxSDJXwgrYWOj+GhkMAkuDRh88RXSR/wCDq7w53j4shPsmgPZmZYO06qdOxGFXdYlzlQJZNLgm8/ba+0k+QthRTxdHfl0VGuik1J5ZX9d9MP/wMhZoGxgwQT4NBg2fQrpyZYO1epk5+Tyy58LABgxZIjVTMReUpL8hbAyHet78lSvxvy58yTzd5yC5sPg0S0QOh+8GsLfk+Dj5rBiIiTHWTpcq/Hp6sNUSdxDto0Tfk3aWjqcMifJXwgr9HjPhnSq78mkhfs5eOaSUR6iUR+4fzGMXweN+8LWGfBJEMx/EE7ttHTI5dr6QwlMX3uEnm5x2NVtBbYVu9UPkvyFsEq2NopPR7XC1cmO8T+Hk5x6zaifOq1hxPfw1C7o+AgcWgEze8CsQXBwuVFLGggLA39/sLExHsPCLHEmlnfqQhr3vXqGM1/3pP7LG/B/dW6leC+U1uWzflpISIgODw+3dBhClGvh0UmM/mYrXRp68d397bC1yWc8enoyRPwI274y7l71akzYhQ8Z/243UlP/f3sXF2OimdBQM56AhaVn5dDpwcPsntsInWV7dbk1vxdKqQitdUhh20nLXwgrFuLvwRuDm7PuYALT/j6U/0ZO7tDlSXhqN9zxLdg7M/HDgOsSP0BqKkwsTqF2K6e15pU/9rJ3oe91iR8qx3shyV8IKxfawZe7Quoxfe0Rlu87XfCGtvYQNBLG/0PsxXr5bhIbW0ZBlkPfbTzOnztPknPJOd/1Ff29sKqrGllZWcTFxZGenm7pUCoNJycnfHx8sLe3t3QoogBKKd4a2pwDZy/x3K+78fOscvOa80rh6wsxMf9d5Vs3E3Aos1jLi01HEpmyNIp+zWuyvF7+ib6izydlVck/Li4ONzc3/P39K2StjfJGa825c+eIi4sjICDA0uGIm3Cyt+Xre9oy9IuNjJ21nQWPdaHmTcoSTJ4M48cb3RtXuNinMrn9U7DcC3q+Bg5VzBC5+R0+e4lHZkfQsIYrH93Zim42ivEP5ZKa9v8dIS4uxntUkVlVt096ejqenp6S+M1EKYWnp6d807IStdyd+P6BdlxMy2LsrO03nQAmNNS4oOnnZ4wS9fODmV/nEjrG1Rgi+lVXOLHdjNGbR/yldB74YTuO9rZ8/0A7XB3tjPfijQj83GNRShvvhZVe7C0Oq0r+gCR+M5P327o0r+PO9LvbEHX6Ik/M2Ul2TsETwIeGQnS0MfIzOhoj8Q+aBg/8ZRSM+74vrJ1SYeoGXc7IZuys7SRdzuT7+9vhU93l6rrQ4D+Ifr4duZlZxntRwRM/WGHyt3YPPPAA8+fPt3QYogK7tWkN3hragjUH4nlz8X6KPZzbvys8sgmC7oJ/pkLYCKsvIZ2dk8sTc3YSeeoi0+9uTUsf9+s3iN0KdduCXcW/3nGFJP9S0FqTm1twy0oIS7m3ox8Pd6/P7K2xfLSygCGgN+PkDsO+gsGfQvRGmNndmNfWCuXmal7+Yy9rDsTz9tAW9Aqsef0GmanGhO31OlgmQAuR5F9M0dHRBAYG8uijj9KmTRt+/vlnOnXqRJs2bRg5ciQpKSkAvP3227Rr144WLVowfvz44re+hCill/s3ZVQ7Ywjo1/8cLdlO2j4AY5YbXT8/DIAjq00aY1nTWvPm4v3Mj4jjqV6NuKej3383OhkBudng29H8AVqQVY32udZbi/cTeeqiSffZrE5V3hjcvNDtDh48yA8//MDbb7/NHXfcwapVq6hSpQpTp07l448/ZtKkSTz++ONMmjQJgHvvvZe//vqLwYMHmzReIW5GKcXkYS1Jycjm3WUHcHWyI7RDPsmvMD5tYdxqCBsJv9wJQ6ZDq9GmD9jEtNZMXX6Qn7bEML5bfZ7u3Sj/DWM2AwrqtTdrfJZmtcnfkvz8/OjYsSN//fUXkZGRdOnSBYDMzEw6deoEwNq1a3n//fdJTU0lKSmJ5s2bS/IXZmdro/j4zlZczsjmtQX7sFGK0e1LMIC9am0YswTm3WuUjk5Lgk6PmT5gE/ps9RG++ucooR18eWVA04IHLxxfD7WDwbm6eQO0MKtN/kVpoZeVKlWM8c9aa/r06cOcOXOuW5+ens6jjz5KeHg49erV480335ThksJiHOxsmHFPWx7+OYJX/thLZnZuySYkd3I3ykb/MQ5WvAo6Fzo/YfJ4S0trzQcrDvLluqMMb+PD/4a2KDjxZ6ZC3L/QYYJ5gywHpM+/FDp27MimTZs4cuQIAKmpqRw6dOhqovfy8iIlJUVG9wiLc7K3ZeZ9benTrCZvLNrPzPUlvAZg5wDDvzPmEFj5mjFxTDmiteatxZF8ue4od3fw5YMRQdjkV+zuihPbICcTArqbL8hywmpb/uWBt7c3s2bNYvTo0WRkZADwzjvv0LhxYx566CFatmyJv78/7dq1s3CkQoCjnS1fhrbh6Xm7mLL0ABfTsnmub+Pi38tha28UiEMZE8fYu0D7h8ok5uLIysll4p97+TU8jge7BvDabYGFn9vx9WBjV+ku9oKVlXSOiooiMDDQQhFVXvK+VyzZOblM/HMf88JPMKx1XaYOD8LBrgSdADnZ8Ou9cHAZjPgOWgw3fbBFdCk9i0fDdrDhcCJP9mrEM70bFe1D7ZteRvJ/cEXZB2kmUtJZCJEvO1sb3hvekuf6GFNB3vf9tusngykqWztj0hjfTvDHw3B0jemDLYJTF9IY+dUWthw9x/vDg3i2TxG/zaRfhFM7IKBb2QdZDknyF6ISUkrxRK9GTLsrmIiY8wz5YmPJhk7bO8PoOeDdBObeA6d2mT7Ym9hy9BxDpm/i5Pk0Zo1pz53t8i9Vna/j/xgXretXvv5+kOQvRKU2rLUPcx7qSHpWDsO+3MT8iBJM+O5cDe753RgqOWc0XLzJnAImorVmxrqjhH67larOdvzxaGe6NvIq3k4OrQBH90p3Z+8VkvyFqORC/D3464lbaO1bjed/283zv+3mYnoxu4HcasHdc40pI+eONoZQlpGESxk89FM4U5cfYEDL2ix6vCuNaroVbydaw+G/ocGtxgXsSkiSvxACbzdHZj/YgcdvbcgfO+LoP209Gw8nFm8ntVrC8G+Nrp8Fj1ydKN6Uluw5Td9p/7D+cCJvDG7G9NGtcXUswaDF07sh5Qw07mfyGK2FJH8hBGBcCH6+XxN+f6QzTg623PPdNl6av4fElIyi76TpQOjzFkQugPXvmyy2E0mpjP8pnMd+2YGvhwtLn+zKmC4BJS85fvhv47Fhb5PFaG0k+RfTZ599RmBgIKGhoSxatIj33nsPgAULFhAZGXl1u1mzZnHq1Kli7Ts6OpoWLVqYNF4hiqu1b3WWPnkL47vV5/cdcdz64Tq+3XCMzOwituQ7PwnBo2Hdu0a/eimkZGTz8cqD9Pr4HzYcTuTF/saHU8MaxezmudHhFVCnDbjWKN1+rJjc5FVMX375JcuWLbs6reGQIUMAI/kPGjSIZs2aAUbyb9GiBXXq1LFYrADZ2dnY2cmfWRSPk70trw4M5M4QH97+K4p3lkTx3cbjPNKjAXeG1MPJ3rbgFytlTApzdh/88RA8tBY8GxTr+JfSs/hpSwzfbDjGhdQshraqw8sDmlLbPf/J1ou387MQFw49Xi79vqxYqVr+SqmRSqn9SqlcpVSBNxUopforpQ4qpY4opaz2HZ8wYQLHjh1jyJAhTJs2jVmzZvH444+zefNmFi1axAsvvECrVq2YOnUq4eHhhIaG0qpVK9LS0oiIiKB79+60bduWfv36cfq0MSIiIiKC4OBgOnXqxBdffFHgsd9//31atmxJcHAwL79svIU9evTgyo1wiYmJ+Pv7A8YHz8iRIxk8eDB9+/blrrvuYunSpVf39cADD/D777+Tk5PDCy+8QLt27QgKCuLrr78uo3dOWKuGNdz4cUw7fhrbnrrVnJm0cD9dp67lgxUHOJF0k4u69s5w12xAGcXgMi8X6XgHz1zizUX76fzeGj5YcZA2vtVZ8FgXPh3V2jSJHyBqEaAhcIhp9melStsk3AfcARSYNZRStsAXQB8gDtiulFqktY4s6DVFsuxlOLO3VLv4j1otYcB7Ba7+6quvWL58OWvXrsXLy4tZs2YB0LlzZ4YMGcKgQYMYMWKEEd6yZXz44YeEhISQlZXFE088wcKFC/H29mbevHlMnDiR77//njFjxvD555/TvXt3XnjhhfxPddkyFixYwLZt23BxcSEpqfBZlbZs2cKePXvw8PDgzz//ZN68eQwcOJDMzExWr17NjBkz+O6773B3d2f79u1kZGTQpUsX+vbtK5O1i+sopejW2JtbGnmx5dg5vttwnBnrjvLF2qN0CPCgT7Oa9AqsSYDXDRO+V/c37vydPQIWPwV3fGN8K7iG1prD8Sn8HXmWlfvPsDsuGQdbG/q3qMVDt9T/74xbphC5ELwaQ43Kfdd6qZK/1joKCp3ntT1wRGt9LG/bucBQoHTJ34ocPHiQffv20adPHwBycnKoXbs2ycnJXLhwge7djZtM7r33XpYtW/af169atYoxY8bg4mLMOerh4VHoMfv06XN1uwEDBvDkk0+SkZHB8uXL6datG87OzqxcuZI9e/ZcLTyXnJzM4cOHJfmLfCml6NzAi84NvDidnMZv4XEs2XOad5YY3UK1qjoR5ONOi7ru1PNwpm41F6q6tqdmhxeovu19Trs1J7rBfcSdTyU2KZX9py6yM/Y85/PuLg72cWfiwECGt/XBo0oZTaeYkgAxm+CW5//zQVTZmKMzuC5w4prf44DS31VxkxZ6eaO1pnnz5mzZsuW65RcuXCjSaAWtdb7b2dnZXZ1G8saS0VfKTgM4OTnRo0cPVqxYwbx58xg9evTV/X7++ef061d5h7uJkqnt7syTvRrxZK9GnEhKZe3BeCJizrMnLpmVkWev21YRxEz7NvTY9D+eXJvLdt0UGwUBXlXo06wmbf2q06NJDWpWdSr7wKMWGXf1Nr+97I9VzhWa/JVSq4Ba+ayaqLVeWIRj5Jfd8q0mp5QaD4wH8PUtwYQTFuTm5salS5fy/b1JkyYkJCSwZcsWOnXqRFZWFocOHaJ58+a4u7uzceNGunbtSlhYWL777tu3L2+//TZ333331W4fDw8P/P39iYiIoH379oWWjR41ahTffvst4eHhV7ur+vXrx4wZM+jZsyf29vYcOnSIunXrXvfBIURh6nm4cF8nf+7r5A9AelYOJy+kcfJ8GikZ2aRn5XA580syN44iLPtLzoxaSS2fgJIVkyut3XPBqwnUaGb+Y5czhb77WuveWusW+fwUJfGD0dK/tuCGD5DvGEit9UytdYjWOsTb27uIuy8fRo0axQcffEDr1q05evQoDzzwABMmTKBVq1bk5OQwf/58XnrpJYKDg2nVqhWbN28G4IcffuCxxx6jU6dOODvnf0Grf//+DBkyhJCQEFq1asWHH34IwPPPP8+MGTPo3LkziYk3vyGnb9++rF+/nt69e+PgYHylHjduHM2aNaNNmza0aNGChx9+mOzsbBO+K6IycrK3pYG3K90aezOwZW3uaOPD7R0DqXLvHBxyUvFd/QgOWODfWcIhY+KW1qGVvssHTFTSWSm1Dnheax2ezzo74BDQCzgJbAfu1lrvv9k+paRz+SHvuzCZfb/D/LHQfjwM/MC8x/77Ddj8OTwbBW41zXtsMzJLSWel1DClVBzQCViilFqRt7yOUmopgNY6G3gcWAFEAb8WlviFEBVUi+HQ6XH4d6bRBWMuOdmwZx406lOhE39xlHa0z5/An/ksPwUMvOb3pcDSG7cTQlRCvd8y6v8sfsroe68dVPbHPLAYLp2G2z4q+2NZCSnvIIQwL1s7GPkDOHvAvHsgtfD7Vkpt6wzjvoPG/cv+WFZCkr8Qwvxca8CdPxG2oSP+vtnY2Gj8/aGAAW+lExdhTNTe4RGwuUlZikpGir4IISwibH07xi9tRWq6UU8/JgbGjzfWhYaa8ECbpoFjVWOUj7hKWv5CCIuYOJGrif+K1FRjucmc3AFRi6HTY+BYykqgFYwk/zI2adIkVq1aZekwhCh3YmOLt7xE1rxjXFvo+KgJd1oxVOjkHxYG/v5gY0PZ9ScW4u2336Z378o7YYQQBSnoJn7fupmmOcDhv+Hoauj6DDhVNc0+K5AKm/zDwoz+w5gYY7rOK/2Jpf0AiI6OJjAwkIceeojmzZvTt29f0tLS2LVrFx07diQoKIhhw4Zx/vx5wCiffKX0wssvv0yzZs0ICgri+eefByAhIYHhw4fTrl072rVrx6ZNm0oXoBBWYvJkyKtVeJWLfRqTu74E8QdKt/PMVFjyrFG9s8PDpdtXRaW1Lpc/bdu21TeKjIz8z7KC+PlpbaT963/8/Iq8i3wdP35c29ra6p07d2qttR45cqT++eefdcuWLfW6deu01lq//vrr+qmnntJaa33//ffr3377TZ87d043btxY5+bmaq21Pn/+vNZa69GjR+sNGzZorbWOiYnRTZs2LV2AZaA477sQxTF7tvF/UinjcfaMU1p/0Ejr9xtqHX+w5Dte/LTWb1TV+vgGU4VqNYBwXYQcW2FH+5Rlf2JAQACtWrUCoG3bthw9evS60sz3338/I0eOvO41VatWxcnJiXHjxnHbbbcxaNAgwCjXfO30jxcvXuTSpUu4ucnFKVHxhYbeOLKnNiQshlmD4If+MHou1GtfvJ3u+Q3Cv4cuT4F/V1OGW6FU2G6fAvsTTVAs1NHR8epzW1tbLly4UOhr7Ozs+Pfffxk+fDgLFiygf3/jZpPc3Fy2bNnCrl272LVrFydPnpTELyo37yYwdjk4uRsfArvmGF/ci+LIaljwCPh1gZ6vl22cVq7CJv98+xNdjOWm5u7uTvXq1dmwYQMAP//889VvAVekpKSQnJzMwIED+eSTT9i1axdgVNucPn361e2uLBeiUvNsAA+uAp8QWDAB5o8xJmIpiNawMwx+uQu8m8KoX8DWvuDtRcXt9rnyVXLiRKOrx9fXSPwmvXnkGj/++CMTJkwgNTWV+vXr88MPP1y3/tKlSwwdOpT09HS01kybNg2Azz77jMcee4ygoCCys7Pp1q0bX331VdkEKYQ1qeIJ9y+GjdNg3btwaCW0fwiCRxkJXinIzYHYrcY2R/6GgG5w50/gXM3S0Zd7JinpXBakpHP5Ie+7sLjEw8aY/SszcTm5g3N149tA1mVwdIfuL0KHCUbtoEqsqCWdK/e7JISwDl6N4M4fISUeDvwFZ/ZBZorxAVA3BJreBg4uhe9HXCXJXwhhPVxrQMhYS0dRIVTYC75CCCEKZnXJv7xeo6io5P0WomKyquTv5OTEuXPnJCGZidaac+fO4eTkZOlQhBAmZlV9/j4+PsTFxZGQcJPxvsKknJyc8PHxsXQYQggTs6rkb29vT0BAgKXDEEIIq2dV3T5CCCFMQ5K/EEJUQpL8hRCiEiq35R2UUglATCl24QUkmigca1HZzrmynS/IOVcWpTlnP621d2EbldvkX1pKqfCi1LeoSCrbOVe28wU558rCHOcs3T5CCFEJSfIXQohKqCIn/5mWDsACKts5V7bzBTnnyqLMz7nC9vkLIYQoWEVu+QshhCiAVRRtUxoAAAPTSURBVCd/pVR/pdRBpdQRpdTL+ax3VErNy1u/TSnlb/4oTasI5/ysUipSKbVHKbVaKeVniThNqbBzvma7EUoprZSy+pEhRTlnpdSdeX/r/UqpX8wdo6kV4d+2r1JqrVJqZ96/74GWiNNUlFLfK6XilVL7ClivlFKf5b0fe5RSbUwagNbaKn8AW+AoUB9wAHYDzW7Y5lHgq7zno4B5lo7bDOd8K+CS9/yRynDOedu5AeuBrUCIpeM2w9+5EbATqJ73ew1Lx22Gc54JPJL3vBkQbem4S3nO3YA2wL4C1g8ElgEK6AhsM+Xxrbnl3x44orU+prXOBOYCQ2/YZijwY97z+UAvpZQyY4ymVug5a63Xaq1T837dClh7Sc6i/J0B/ge8D6SbM7gyUpRzfgj4Qmt9HkBrHW/mGE2tKOesgap5z92BU2aMz+S01uuBpJtsMhT4SRu2AtWUUrVNdXxrTv51gRPX/B6XtyzfbbTW2UAy4GmW6MpGUc75Wg9itBysWaHnrJRqDdTTWv9lzsDKUFH+zo2BxkqpTUqprUqp/maLrmwU5ZzfBO5RSsUBS4EnzBOaxRT3/3uxWFVJ5xvk14K/cehSUbaxJkU+H6XUPUAI0L1MIyp7Nz1npZQNMA34v/bu3sWJKArj8O+FVSy0S6mwFhbC/gHaCYqFRSoLbXTF1kbEykKwFXtFFMFC0EbTbSOChYXbKiwsKotgJbiNIH68FncQUTCXfDqZ96kSGJJzksnJvecMyeqsApqBmvd5idL6OULZ3T2XtGL705Rjm5aanE8D92zfkHQYuN/k/GP64c3FVOtXm1f+74F9v93fy9/bwF/HSFqibBX/tc3639XkjKRjwBWgb/vLjGKblmE57wFWgGeS3lF6o4OWD31rz+0ntr/afgtsUL4M2qom5/PAQwDbL4BdlN/AWVRVn/dRtbn4vwQOSNovaSdloDv445gBcLa5fRJ46maS0lJDc25aILcohb/tfWAYkrPtbds928u2lylzjr7t9fmEOxE15/ZjynAfST1KG+jNTKOcrJqct4CjAJIOUor/Iv+t3wA401z1cwjYtv1hUg/e2raP7W+SLgBrlCsF7tp+JekasG57ANyhbA03KSv+U/OLeHyVOV8HdgOPmtn2lu3+3IIeU2XOC6Uy5zXguKTXwHfgsu2P84t6PJU5XwJuS7pIaX+stnkxJ+kBpW3Xa+YYV4EdALZvUuYaJ4BN4DNwbqLP3+LXLiIiRtTmtk9ERIwoxT8iooNS/CMiOijFPyKig1L8IyI6KMU/IqKDUvwjIjooxT8iooN+Aq9eY5x0J4CWAAAAAElFTkSuQmCC\n",
+ "image/png": "\n",
"text/plain": [
- ""
+ ""
]
},
- "metadata": {},
+ "metadata": {
+ "needs_background": "light"
+ },
"output_type": "display_data"
}
],
@@ -295,18 +318,18 @@
"source": [
"结果显示过拟合, 引入正则化项(regularizer),降低过拟合\n",
"\n",
- "$Q(x)=\\sum_{i=1}^n(h(x_i)-y_i)^2+\\lambda||w||^2$。\n",
+ "$Q(x)=\\sum_{i=1}^n(h(x_i)-y_i)^2+\\frac{\\lambda}{2}||w||^2$。\n",
"\n",
"回归问题中,损失函数是平方损失,正则化可以是参数向量的L2范数,也可以是L1范数。\n",
"\n",
- "- L1: regularization\\*abs(p)\n",
+ "- L1: regularization \\* abs(p)\n",
"\n",
- "- L2: 0.5 \\* regularization \\* np.square(p)"
+ "- L2: regularization \\* np.sqrt(np.square(p))"
]
},
{
"cell_type": "code",
- "execution_count": 9,
+ "execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
@@ -316,13 +339,13 @@
"def residuals_func_regularization(p, x, y):\n",
" ret = fit_func(p, x) - y\n",
" ret = np.append(ret,\n",
- " np.sqrt(0.5 * regularization * np.square(p))) # L2范数作为正则化项\n",
+ " 0.5 * regularization * np.square(p)) # L2范数的平方作为正则化项,前面需要乘以0.5\n",
" return ret"
]
},
{
"cell_type": "code",
- "execution_count": 10,
+ "execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
@@ -334,27 +357,29 @@
},
{
"cell_type": "code",
- "execution_count": 11,
+ "execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- ""
+ ""
]
},
- "execution_count": 11,
+ "execution_count": 28,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
- "image/png": "\n",
+ "image/png": "\n",
"text/plain": [
- ""
+ ""
]
},
- "metadata": {},
+ "metadata": {
+ "needs_background": "light"
+ },
"output_type": "display_data"
}
],
@@ -449,13 +474,17 @@
"\n",
"习题解答:https://github.com/datawhalechina/statistical-learning-method-solutions-manual\n",
"\n",
- "中文注释制作:机器学习初学者公众号:ID:ai-start-com\n",
+ "配置环境:python 3.8+\n",
"\n",
- "配置环境:python 3.5+\n",
- "\n",
- "代码全部测试通过。\n",
- "![gongzhong](../gongzhong.jpg)"
+ "代码全部测试通过。"
]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
}
],
"metadata": {
@@ -474,7 +503,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.7.6"
+ "version": "3.8.3"
}
},
"nbformat": 4,
diff --git "a/\347\254\25402\347\253\240 \346\204\237\347\237\245\346\234\272/.ipynb_checkpoints/2.Perceptron-checkpoint.ipynb" "b/\347\254\25402\347\253\240 \346\204\237\347\237\245\346\234\272/.ipynb_checkpoints/2.Perceptron-checkpoint.ipynb"
new file mode 100644
index 0000000..b2ec2b1
--- /dev/null
+++ "b/\347\254\25402\347\253\240 \346\204\237\347\237\245\346\234\272/.ipynb_checkpoints/2.Perceptron-checkpoint.ipynb"
@@ -0,0 +1,625 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 第2章 感知机"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "1.感知机是根据输入实例的特征向量$x$对其进行二类分类的线性分类模型:\n",
+ "\n",
+ "$$\n",
+ "f(x)=\\operatorname{sign}(w \\cdot x+b)\n",
+ "$$\n",
+ "\n",
+ "感知机模型对应于输入空间(特征空间)中的分离超平面$w \\cdot x+b=0$。\n",
+ "\n",
+ "2.感知机学习的策略是极小化损失函数:\n",
+ "\n",
+ "$$\n",
+ "\\min _{w, b} L(w, b)=-\\sum_{x_{i} \\in M} y_{i}\\left(w \\cdot x_{i}+b\\right)\n",
+ "$$\n",
+ "\n",
+ "损失函数对应于误分类点到分离超平面的总距离。\n",
+ "\n",
+ "3.感知机学习算法是基于随机梯度下降法的对损失函数的最优化算法,有原始形式和对偶形式。算法简单且易于实现。原始形式中,首先任意选取一个超平面,然后用梯度下降法不断极小化目标函数。在这个过程中一次随机选取一个误分类点使其梯度下降。\n",
+ " \n",
+ "4.当训练数据集线性可分时,感知机学习算法是收敛的。感知机算法在训练数据集上的误分类次数$k$满足不等式:\n",
+ "\n",
+ "$$\n",
+ "k \\leqslant\\left(\\frac{R}{\\gamma}\\right)^{2}\n",
+ "$$\n",
+ "\n",
+ "当训练数据集线性可分时,感知机学习算法存在无穷多个解,其解由于不同的初值或不同的迭代顺序而可能有所不同。\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 二分类模型\n",
+ "$f(x) = sign(w\\cdot x + b)$\n",
+ "\n",
+ "$\\operatorname{sign}(x)=\\left\\{\\begin{array}{ll}{+1,} & {x \\geqslant 0} \\\\ {-1,} & {x<0}\\end{array}\\right.$\n",
+ "\n",
+ "给定训练集:\n",
+ "\n",
+ "$T=\\left\\{\\left(x_{1}, y_{1}\\right),\\left(x_{2}, y_{2}\\right), \\cdots,\\left(x_{N}, y_{N}\\right)\\right\\}$\n",
+ "\n",
+ "定义感知机的损失函数 \n",
+ "\n",
+ "$L(w, b)=-\\sum_{x_{i} \\in M} y_{i}\\left(w \\cdot x_{i}+b\\right)$\n",
+ "\n",
+ "---\n",
+ "#### 算法\n",
+ "\n",
+ "随机梯度下降法 Stochastic Gradient Descent\n",
+ "\n",
+ "随机抽取一个误分类点使其梯度下降。\n",
+ "\n",
+ "$w = w + \\eta y_{i}x_{i}$\n",
+ "\n",
+ "$b = b + \\eta y_{i}$\n",
+ "\n",
+ "当实例点被误分类,即位于分离超平面的错误侧,则调整$w$, $b$的值,使分离超平面向该无分类点的一侧移动,直至误分类点被正确分类"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "拿出iris数据集中两个分类的数据和[sepal length,sepal width]作为特征"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "from sklearn.datasets import load_iris\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# load data\n",
+ "iris = load_iris()\n",
+ "df = pd.DataFrame(iris.data, columns=iris.feature_names)\n",
+ "df['label'] = iris.target"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " sepal length sepal width petal length petal width label\n",
+ "0 5.1 3.5 1.4 0.2 0\n",
+ "1 4.9 3.0 1.4 0.2 0\n",
+ "2 4.7 3.2 1.3 0.2 0\n",
+ "3 4.6 3.1 1.5 0.2 0\n",
+ "4 5.0 3.6 1.4 0.2 0\n",
+ ".. ... ... ... ... ...\n",
+ "145 6.7 3.0 5.2 2.3 2\n",
+ "146 6.3 2.5 5.0 1.9 2\n",
+ "147 6.5 3.0 5.2 2.0 2\n",
+ "148 6.2 3.4 5.4 2.3 2\n",
+ "149 5.9 3.0 5.1 1.8 2\n",
+ "\n",
+ "[150 rows x 5 columns]\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "0 50\n",
+ "1 50\n",
+ "2 50\n",
+ "Name: label, dtype: int64"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.columns = [\n",
+ " 'sepal length', 'sepal width', 'petal length', 'petal width', 'label'\n",
+ "]\n",
+ "print(df)\n",
+ "df.label.value_counts()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plt.scatter(df[:50]['sepal length'], df[:50]['sepal width'], label='0')\n",
+ "plt.scatter(df[50:100]['sepal length'], df[50:100]['sepal width'], label='1')\n",
+ "plt.xlabel('sepal length')\n",
+ "plt.ylabel('sepal width')\n",
+ "plt.legend()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "data = np.array(df.iloc[:100, [0, 1, -1]])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X, y = data[:,:-1], data[:,-1]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "y = np.array([1 if i == 1 else -1 for i in y])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Perceptron"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# 数据线性可分,二分类数据\n",
+ "# 此处为一元一次线性方程\n",
+ "class Model:\n",
+ " def __init__(self):\n",
+ " self.w = np.ones(len(data[0]) - 1, dtype=np.float32)\n",
+ " self.b = 0\n",
+ " self.l_rate = 0.1\n",
+ " # self.data = data\n",
+ "\n",
+ " def sign(self, x, w, b):\n",
+ " y = np.dot(x, w) + b\n",
+ " return y\n",
+ "\n",
+ " # 随机梯度下降法\n",
+ " def fit(self, X_train, y_train):\n",
+ " is_wrong = False\n",
+ " while not is_wrong:\n",
+ " wrong_count = 0\n",
+ " # 遍历所有实例\n",
+ " for d in range(len(X_train)):\n",
+ " X = X_train[d]\n",
+ " y = y_train[d]\n",
+ " # 如果抽取到一个误分类点,使其梯度下降\n",
+ " if y * self.sign(X, self.w, self.b) <= 0:\n",
+ " self.w = self.w + self.l_rate * np.dot(y, X)\n",
+ " self.b = self.b + self.l_rate * y\n",
+ " wrong_count += 1\n",
+ " # 如果有一个实例被分错了,则重新遍历\n",
+ " if wrong_count == 0:\n",
+ " is_wrong = True\n",
+ " return 'Perceptron Model!'\n",
+ "\n",
+ " def score(self):\n",
+ " pass"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'Perceptron Model!'"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "perceptron = Model()\n",
+ "perceptron.fit(X, y)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 28,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "x_points = np.linspace(4, 7, 10)\n",
+ "y_ = -(perceptron.w[0] * x_points + perceptron.b) / perceptron.w[1]\n",
+ "plt.plot(x_points, y_)\n",
+ "\n",
+ "plt.scatter(data[:50, 0], data[:50, 1], color='b', label='0')\n",
+ "plt.scatter(data[50:100, 0], data[50:100, 1], color='orange', label='1')\n",
+ "plt.xlabel('sepal length')\n",
+ "plt.ylabel('sepal width')\n",
+ "plt.legend()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### scikit-learn实例"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import sklearn\n",
+ "from sklearn.linear_model import Perceptron"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'0.23.1'"
+ ]
+ },
+ "execution_count": 30,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "sklearn.__version__"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Perceptron()"
+ ]
+ },
+ "execution_count": 31,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "clf = Perceptron(fit_intercept=True, \n",
+ " max_iter=1000, \n",
+ " shuffle=True)\n",
+ "clf.fit(X, y)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[ 23.2 -38.7]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Weights assigned to the features.\n",
+ "print(clf.coef_)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[-5.]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 截距 Constants in decision function.\n",
+ "print(clf.intercept_)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 38,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 画布大小\n",
+ "plt.figure(figsize=(10,10))\n",
+ "\n",
+ "# 标题\n",
+ "plt.rcParams['font.sans-serif']=['SimHei']\n",
+ "plt.rcParams['axes.unicode_minus'] = False\n",
+ "plt.title('Perceptron for iris dataset')\n",
+ "\n",
+ "plt.scatter(data[:50, 0], data[:50, 1], c='b', label='Iris-setosa',)\n",
+ "plt.scatter(data[50:100, 0], data[50:100, 1], c='orange', label='Iris-versicolor')\n",
+ "\n",
+ "# 画感知机的线\n",
+ "x_ponits = np.arange(4, 8)\n",
+ "y_ = -(clf.coef_[0][0]*x_ponits + clf.intercept_)/clf.coef_[0][1]\n",
+ "plt.plot(x_ponits, y_)\n",
+ "\n",
+ "# 其他部分\n",
+ "plt.legend() # 显示图例\n",
+ "plt.grid(False) # 不显示网格\n",
+ "plt.xlabel('sepal length')\n",
+ "plt.ylabel('sepal width')\n",
+ "plt.legend()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**注意 !**\n",
+ "\n",
+ "在上图中,有一个位于左下角的蓝点没有被正确分类,这是因为 SKlearn 的 Perceptron 实例中有一个`tol`参数。\n",
+ "\n",
+ "`tol` 参数规定了如果本次迭代的损失和上次迭代的损失之差小于一个特定值时,停止迭代。所以我们需要设置 `tol=None` 使之可以继续迭代:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "clf = Perceptron(fit_intercept=True, \n",
+ " max_iter=1000,\n",
+ " tol=None,\n",
+ " shuffle=True)\n",
+ "clf.fit(X, y)\n",
+ "\n",
+ "# 画布大小\n",
+ "plt.figure(figsize=(10,10))\n",
+ "\n",
+ "# 中文标题\n",
+ "plt.rcParams['font.sans-serif']=['SimHei']\n",
+ "plt.rcParams['axes.unicode_minus'] = False\n",
+ "plt.title('鸢尾花线性数据示例')\n",
+ "\n",
+ "plt.scatter(data[:50, 0], data[:50, 1], c='b', label='Iris-setosa',)\n",
+ "plt.scatter(data[50:100, 0], data[50:100, 1], c='orange', label='Iris-versicolor')\n",
+ "\n",
+ "# 画感知机的线\n",
+ "x_ponits = np.arange(4, 8)\n",
+ "y_ = -(clf.coef_[0][0]*x_ponits + clf.intercept_)/clf.coef_[0][1]\n",
+ "plt.plot(x_ponits, y_)\n",
+ "\n",
+ "# 其他部分\n",
+ "plt.legend() # 显示图例\n",
+ "plt.grid(False) # 不显示网格\n",
+ "plt.xlabel('sepal length')\n",
+ "plt.ylabel('sepal width')\n",
+ "plt.legend()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "现在可以看到,所有的两种鸢尾花都被正确分类了。\n",
+ "\n",
+ "----"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 第2章感知机-习题\n",
+ "\n",
+ "### 习题2.1\n",
+ " Minsky 与 Papert 指出:感知机因为是线性模型,所以不能表示复杂的函数,如异或 (XOR)。验证感知机为什么不能表示异或。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**解答:** \n",
+ "\n",
+ "对于异或函数XOR,全部的输入与对应的输出如下: \n",
+ "\n",
+ "|$x^{(1)}$
|$x^{(2)}$
|$y$|\n",
+ "|:-: | :-: | :-: | \n",
+ "| 1 | 1 |-1 | \n",
+ "| 1 | -1 | 1 | \n",
+ "|-1 | 1 | 1 | \n",
+ "|-1 | -1 |-1 | "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "参考代码:https://github.com/wzyonggege/statistical-learning-method\n",
+ "\n",
+ "本文代码更新地址:https://github.com/fengdu78/lihang-code\n",
+ "\n",
+ "习题解答:https://github.com/datawhalechina/statistical-learning-method-solutions-manual\n",
+ "\n",
+ "中文注释制作:机器学习初学者公众号:ID:ai-start-com\n",
+ "\n",
+ "配置环境:python 3.5+\n",
+ "\n",
+ "代码全部测试通过。\n",
+ "![gongzhong](../gongzhong.jpg)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git "a/\347\254\25402\347\253\240 \346\204\237\347\237\245\346\234\272/2.Perceptron.ipynb" "b/\347\254\25402\347\253\240 \346\204\237\347\237\245\346\234\272/2.Perceptron.ipynb"
index 70fc961..dcc9bc1 100644
--- "a/\347\254\25402\347\253\240 \346\204\237\347\237\245\346\234\272/2.Perceptron.ipynb"
+++ "b/\347\254\25402\347\253\240 \346\204\237\347\237\245\346\234\272/2.Perceptron.ipynb"
@@ -58,7 +58,7 @@
"---\n",
"#### 算法\n",
"\n",
- "随即梯度下降法 Stochastic Gradient Descent\n",
+ "随机梯度下降法 Stochastic Gradient Descent\n",
"\n",
"随机抽取一个误分类点使其梯度下降。\n",
"\n",
@@ -91,7 +91,7 @@
},
{
"cell_type": "code",
- "execution_count": 2,
+ "execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
@@ -103,19 +103,39 @@
},
{
"cell_type": "code",
- "execution_count": 3,
+ "execution_count": 7,
"metadata": {},
"outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " sepal length sepal width petal length petal width label\n",
+ "0 5.1 3.5 1.4 0.2 0\n",
+ "1 4.9 3.0 1.4 0.2 0\n",
+ "2 4.7 3.2 1.3 0.2 0\n",
+ "3 4.6 3.1 1.5 0.2 0\n",
+ "4 5.0 3.6 1.4 0.2 0\n",
+ ".. ... ... ... ... ...\n",
+ "145 6.7 3.0 5.2 2.3 2\n",
+ "146 6.3 2.5 5.0 1.9 2\n",
+ "147 6.5 3.0 5.2 2.0 2\n",
+ "148 6.2 3.4 5.4 2.3 2\n",
+ "149 5.9 3.0 5.1 1.8 2\n",
+ "\n",
+ "[150 rows x 5 columns]\n"
+ ]
+ },
{
"data": {
"text/plain": [
- "2 50\n",
- "1 50\n",
"0 50\n",
+ "1 50\n",
+ "2 50\n",
"Name: label, dtype: int64"
]
},
- "execution_count": 3,
+ "execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
@@ -124,27 +144,28 @@
"df.columns = [\n",
" 'sepal length', 'sepal width', 'petal length', 'petal width', 'label'\n",
"]\n",
+ "print(df)\n",
"df.label.value_counts()"
]
},
{
"cell_type": "code",
- "execution_count": 4,
+ "execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- ""
+ ""
]
},
- "execution_count": 4,
+ "execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
- "image/png": "\n",
+ "image/png": "\n",
"text/plain": [
""
]
@@ -165,7 +186,7 @@
},
{
"cell_type": "code",
- "execution_count": 5,
+ "execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
@@ -174,7 +195,7 @@
},
{
"cell_type": "code",
- "execution_count": 6,
+ "execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
@@ -183,7 +204,7 @@
},
{
"cell_type": "code",
- "execution_count": 7,
+ "execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
@@ -199,7 +220,7 @@
},
{
"cell_type": "code",
- "execution_count": 8,
+ "execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
@@ -221,13 +242,16 @@
" is_wrong = False\n",
" while not is_wrong:\n",
" wrong_count = 0\n",
+ " # 遍历所有实例\n",
" for d in range(len(X_train)):\n",
" X = X_train[d]\n",
" y = y_train[d]\n",
+ " # 如果抽取到一个误分类点,使其梯度下降\n",
" if y * self.sign(X, self.w, self.b) <= 0:\n",
" self.w = self.w + self.l_rate * np.dot(y, X)\n",
" self.b = self.b + self.l_rate * y\n",
" wrong_count += 1\n",
+ " # 如果有一个实例被分错了,则重新遍历\n",
" if wrong_count == 0:\n",
" is_wrong = True\n",
" return 'Perceptron Model!'\n",
@@ -238,7 +262,7 @@
},
{
"cell_type": "code",
- "execution_count": 9,
+ "execution_count": 20,
"metadata": {},
"outputs": [
{
@@ -247,7 +271,7 @@
"'Perceptron Model!'"
]
},
- "execution_count": 9,
+ "execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
@@ -259,22 +283,22 @@
},
{
"cell_type": "code",
- "execution_count": 10,
+ "execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- ""
+ ""
]
},
- "execution_count": 10,
+ "execution_count": 28,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
- "image/png": "\n",
+ "image/png": "\n",
"text/plain": [
""
]
@@ -290,8 +314,8 @@
"y_ = -(perceptron.w[0] * x_points + perceptron.b) / perceptron.w[1]\n",
"plt.plot(x_points, y_)\n",
"\n",
- "plt.plot(data[:50, 0], data[:50, 1], 'bo', color='blue', label='0')\n",
- "plt.plot(data[50:100, 0], data[50:100, 1], 'bo', color='orange', label='1')\n",
+ "plt.scatter(data[:50, 0], data[:50, 1], color='b', label='0')\n",
+ "plt.scatter(data[50:100, 0], data[50:100, 1], color='orange', label='1')\n",
"plt.xlabel('sepal length')\n",
"plt.ylabel('sepal width')\n",
"plt.legend()"
@@ -306,7 +330,7 @@
},
{
"cell_type": "code",
- "execution_count": 11,
+ "execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
@@ -316,7 +340,7 @@
},
{
"cell_type": "code",
- "execution_count": 12,
+ "execution_count": 30,
"metadata": {},
"outputs": [
{
@@ -325,7 +349,7 @@
"'0.23.1'"
]
},
- "execution_count": 12,
+ "execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
@@ -336,7 +360,7 @@
},
{
"cell_type": "code",
- "execution_count": 13,
+ "execution_count": 31,
"metadata": {},
"outputs": [
{
@@ -345,7 +369,7 @@
"Perceptron()"
]
},
- "execution_count": 13,
+ "execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
@@ -359,7 +383,7 @@
},
{
"cell_type": "code",
- "execution_count": 14,
+ "execution_count": 32,
"metadata": {},
"outputs": [
{
@@ -377,7 +401,7 @@
},
{
"cell_type": "code",
- "execution_count": 15,
+ "execution_count": 33,
"metadata": {},
"outputs": [
{
@@ -395,22 +419,22 @@
},
{
"cell_type": "code",
- "execution_count": 16,
+ "execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- ""
+ ""
]
},
- "execution_count": 16,
+ "execution_count": 39,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
- "image/png": "\n",
+ "image/png": "\n",
"text/plain": [
""
]
@@ -425,10 +449,10 @@
"# 画布大小\n",
"plt.figure(figsize=(10,10))\n",
"\n",
- "# 中文标题\n",
+ "# 标题\n",
"plt.rcParams['font.sans-serif']=['SimHei']\n",
"plt.rcParams['axes.unicode_minus'] = False\n",
- "plt.title('鸢尾花线性数据示例')\n",
+ "plt.title('Perceptron for iris dataset')\n",
"\n",
"plt.scatter(data[:50, 0], data[:50, 1], c='b', label='Iris-setosa',)\n",
"plt.scatter(data[50:100, 0], data[50:100, 1], c='orange', label='Iris-versicolor')\n",
@@ -459,22 +483,22 @@
},
{
"cell_type": "code",
- "execution_count": 17,
+ "execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- ""
+ ""
]
},
- "execution_count": 17,
+ "execution_count": 40,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
- "image/png": "\n",
+ "image/png": "\n",
"text/plain": [
""
]
@@ -495,10 +519,10 @@
"# 画布大小\n",
"plt.figure(figsize=(10,10))\n",
"\n",
- "# 中文标题\n",
+ "# 标题\n",
"plt.rcParams['font.sans-serif']=['SimHei']\n",
"plt.rcParams['axes.unicode_minus'] = False\n",
- "plt.title('鸢尾花线性数据示例')\n",
+ "plt.title('Perceptron for iris dataset')\n",
"\n",
"plt.scatter(data[:50, 0], data[:50, 1], c='b', label='Iris-setosa',)\n",
"plt.scatter(data[50:100, 0], data[50:100, 1], c='orange', label='Iris-versicolor')\n",
@@ -563,10 +587,9 @@
"\n",
"中文注释制作:机器学习初学者公众号:ID:ai-start-com\n",
"\n",
- "配置环境:python 3.5+\n",
+ "配置环境:python 3.8+\n",
"\n",
- "代码全部测试通过。\n",
- "![gongzhong](../gongzhong.jpg)"
+ "代码全部测试通过。"
]
},
{
@@ -593,7 +616,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.7.6"
+ "version": "3.8.3"
}
},
"nbformat": 4,
diff --git "a/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/.ipynb_checkpoints/3.KNearestNeighbors-checkpoint.ipynb" "b/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/.ipynb_checkpoints/3.KNearestNeighbors-checkpoint.ipynb"
new file mode 100644
index 0000000..a2ac61a
--- /dev/null
+++ "b/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/.ipynb_checkpoints/3.KNearestNeighbors-checkpoint.ipynb"
@@ -0,0 +1,1113 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 第3章 k近邻法"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "1.$k$近邻法是基本且简单的分类与回归方法。$k$近邻法的基本做法是:对给定的训练实例点和输入实例点,首先确定输入实例点的$k$个最近邻训练实例点,然后利用这$k$个训练实例点的类的多数来预测输入实例点的类。\n",
+ "\n",
+ "2.$k$近邻模型对应于基于训练数据集对特征空间的一个划分。$k$近邻法中,当训练集、距离度量、$k$值及分类决策规则确定后,其结果唯一确定。\n",
+ "\n",
+ "3.$k$近邻法三要素:距离度量、$k$值的选择和分类决策规则。常用的距离度量是欧氏距离及更一般的**pL**距离。$k$值小时,$k$近邻模型更复杂;$k$值大时,$k$近邻模型更简单。$k$值的选择反映了对近似误差与估计误差之间的权衡,通常由交叉验证选择最优的$k$。\n",
+ "\n",
+ "常用的分类决策规则是多数表决,对应于经验风险最小化。\n",
+ "\n",
+ "4.$k$近邻法的实现需要考虑如何快速搜索k个最近邻点。**kd**树是一种便于对k维空间中的数据进行快速检索的数据结构。kd树是二叉树,表示对$k$维空间的一个划分,其每个结点对应于$k$维空间划分中的一个超矩形区域。利用**kd**树可以省去对大部分数据点的搜索, 从而减少搜索的计算量。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 距离度量"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "设特征空间$x$是$n$维实数向量空间 ,$x_{i}, x_{j} \\in \\mathcal{X}$,$x_{i}=\\left(x_{i}^{(1)}, x_{i}^{(2)}, \\cdots, x_{i}^{(n)}\\right)^{\\mathrm{T}}$,$x_{j}=\\left(x_{j}^{(1)}, x_{j}^{(2)}, \\cdots, x_{j}^{(n)}\\right)^{\\mathrm{T}}$\n",
+ ",则:$x_i$,$x_j$的$L_p$距离定义为:\n",
+ "\n",
+ "\n",
+ "$L_{p}\\left(x_{i}, x_{j}\\right)=\\left(\\sum_{i=1}^{n}\\left|x_{i}^{(i)}-x_{j}^{(l)}\\right|^{p}\\right)^{\\frac{1}{p}}$\n",
+ "\n",
+ "- $p= 1$ 曼哈顿距离\n",
+ "- $p= 2$ 欧氏距离\n",
+ "- $p= \\infty$ 切比雪夫距离"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import math\n",
+ "from itertools import combinations"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def L(x, y, p=2):\n",
+ " # x1 = [1, 1], x2 = [5,1]\n",
+ " if len(x) == len(y) and len(x) > 1:\n",
+ " sum = 0\n",
+ " for i in range(len(x)):\n",
+ " sum += math.pow(abs(x[i] - y[i]), p)\n",
+ " return math.pow(sum, 1 / p)\n",
+ " else:\n",
+ " return 0"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 课本例3.1"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "x1 = [1, 1]\n",
+ "x2 = [5, 1]\n",
+ "x3 = [4, 4]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(4.0, '1-[5, 1]')\n",
+ "(4.0, '1-[5, 1]')\n",
+ "(3.7797631496846193, '1-[4, 4]')\n",
+ "(3.5676213450081633, '1-[4, 4]')\n"
+ ]
+ }
+ ],
+ "source": [
+ "# x1, x2\n",
+ "for i in range(1, 5):\n",
+ " r = {'1-{}'.format(c): L(x1, c, p=i) for c in [x2, x3]}\n",
+ " print(min(zip(r.values(), r.keys())))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "python实现,遍历所有数据点,找出$n$个距离最近的点的分类情况,少数服从多数"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline\n",
+ "\n",
+ "from sklearn.datasets import load_iris\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "from collections import Counter"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# data\n",
+ "iris = load_iris()\n",
+ "df = pd.DataFrame(iris.data, columns=iris.feature_names)\n",
+ "df['label'] = iris.target\n",
+ "df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']\n",
+ "# data = np.array(df.iloc[:100, [0, 1, -1]])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " sepal length | \n",
+ " sepal width | \n",
+ " petal length | \n",
+ " petal width | \n",
+ " label | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 5.1 | \n",
+ " 3.5 | \n",
+ " 1.4 | \n",
+ " 0.2 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 4.9 | \n",
+ " 3.0 | \n",
+ " 1.4 | \n",
+ " 0.2 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 4.7 | \n",
+ " 3.2 | \n",
+ " 1.3 | \n",
+ " 0.2 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 4.6 | \n",
+ " 3.1 | \n",
+ " 1.5 | \n",
+ " 0.2 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 5.0 | \n",
+ " 3.6 | \n",
+ " 1.4 | \n",
+ " 0.2 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ "
\n",
+ " \n",
+ " 145 | \n",
+ " 6.7 | \n",
+ " 3.0 | \n",
+ " 5.2 | \n",
+ " 2.3 | \n",
+ " 2 | \n",
+ "
\n",
+ " \n",
+ " 146 | \n",
+ " 6.3 | \n",
+ " 2.5 | \n",
+ " 5.0 | \n",
+ " 1.9 | \n",
+ " 2 | \n",
+ "
\n",
+ " \n",
+ " 147 | \n",
+ " 6.5 | \n",
+ " 3.0 | \n",
+ " 5.2 | \n",
+ " 2.0 | \n",
+ " 2 | \n",
+ "
\n",
+ " \n",
+ " 148 | \n",
+ " 6.2 | \n",
+ " 3.4 | \n",
+ " 5.4 | \n",
+ " 2.3 | \n",
+ " 2 | \n",
+ "
\n",
+ " \n",
+ " 149 | \n",
+ " 5.9 | \n",
+ " 3.0 | \n",
+ " 5.1 | \n",
+ " 1.8 | \n",
+ " 2 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
150 rows × 5 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " sepal length sepal width petal length petal width label\n",
+ "0 5.1 3.5 1.4 0.2 0\n",
+ "1 4.9 3.0 1.4 0.2 0\n",
+ "2 4.7 3.2 1.3 0.2 0\n",
+ "3 4.6 3.1 1.5 0.2 0\n",
+ "4 5.0 3.6 1.4 0.2 0\n",
+ ".. ... ... ... ... ...\n",
+ "145 6.7 3.0 5.2 2.3 2\n",
+ "146 6.3 2.5 5.0 1.9 2\n",
+ "147 6.5 3.0 5.2 2.0 2\n",
+ "148 6.2 3.4 5.4 2.3 2\n",
+ "149 5.9 3.0 5.1 1.8 2\n",
+ "\n",
+ "[150 rows x 5 columns]"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plt.scatter(df[:50]['sepal length'], df[:50]['sepal width'], label='0')\n",
+ "plt.scatter(df[50:100]['sepal length'], df[50:100]['sepal width'], label='1')\n",
+ "plt.xlabel('sepal length')\n",
+ "plt.ylabel('sepal width')\n",
+ "plt.legend()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# 选取label为0和1的实例\n",
+ "data = np.array(df.iloc[:100, [0, 1, -1]])\n",
+ "X, y = data[:,:-1], data[:,-1]\n",
+ "# 使用交叉验证,训练集:测试集=8:2\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "class KNN:\n",
+ " def __init__(self, X_train, y_train, n_neighbors=3, p=2):\n",
+ " \"\"\"\n",
+ " parameter: n_neighbors 临近点个数\n",
+ " parameter: p 距离度量\n",
+ " \"\"\"\n",
+ " self.n = n_neighbors\n",
+ " self.p = p\n",
+ " self.X_train = X_train\n",
+ " self.y_train = y_train\n",
+ "\n",
+ " def predict(self, X):\n",
+ " # 取出n个点\n",
+ " knn_list = []\n",
+ " for i in range(self.n):\n",
+ " # 计算p范数距离\n",
+ " dist = np.linalg.norm(X - self.X_train[i], ord=self.p)\n",
+ " # 存入距离和实例类别\n",
+ " knn_list.append((dist, self.y_train[i]))\n",
+ " \n",
+ " # 遍历所有训练实例点\n",
+ " for i in range(self.n, len(self.X_train)):\n",
+ " # 找到n个点中距离最远的一个点\n",
+ " max_index = knn_list.index(max(knn_list, key=lambda x: x[0]))\n",
+ " # 计算p范数距离\n",
+ " dist = np.linalg.norm(X - self.X_train[i], ord=self.p)\n",
+ " # 如果knn_list最远的点比这个点还要远,则替换成这个点\n",
+ " if knn_list[max_index][0] > dist:\n",
+ " knn_list[max_index] = (dist, self.y_train[i])\n",
+ "\n",
+ " # 统计n个点的类别\n",
+ " knn = [k[-1] for k in knn_list]\n",
+ " count_pairs = Counter(knn)\n",
+ "# max_count = sorted(count_pairs, key=lambda x: x)[-1]\n",
+ " # 选出最多的类别\n",
+ " max_count = sorted(count_pairs.items(), key=lambda x: x[1])[-1][0]\n",
+ " return max_count\n",
+ "\n",
+ " def score(self, X_test, y_test):\n",
+ " right_count = 0\n",
+ " n = 10\n",
+ " for X, y in zip(X_test, y_test):\n",
+ " label = self.predict(X)\n",
+ " if label == y:\n",
+ " right_count += 1\n",
+ " # 返回预测label正确的比例\n",
+ " return right_count / len(X_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "clf = KNN(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "1.0"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "clf.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Test Point: 1.0\n"
+ ]
+ }
+ ],
+ "source": [
+ "test_point = [6.0, 3.0]\n",
+ "print('Test Point: {}'.format(clf.predict(test_point)))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plt.scatter(df[:50]['sepal length'], df[:50]['sepal width'], label='0')\n",
+ "plt.scatter(df[50:100]['sepal length'], df[50:100]['sepal width'], label='1')\n",
+ "plt.scatter(test_point[0], test_point[1], label='test_point')\n",
+ "plt.xlabel('sepal length')\n",
+ "plt.ylabel('sepal width')\n",
+ "plt.legend()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### scikit-learn实例"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.neighbors import KNeighborsClassifier"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "KNeighborsClassifier()"
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "clf_sk = KNeighborsClassifier()\n",
+ "clf_sk.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "1.0"
+ ]
+ },
+ "execution_count": 27,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "clf_sk.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "### sklearn.neighbors.KNeighborsClassifier\n",
+ "\n",
+ "- n_neighbors: 临近点个数, default=5\n",
+ "- p: 距离度量, default=2\n",
+ "- algorithm: 近邻算法,可选{'auto', 'ball_tree', 'kd_tree', 'brute'}, default=’auto’\n",
+ "- weights: 确定近邻的权重, default=’uniform’\n",
+ "\n",
+ " Weight function used in prediction. Possible values:\n",
+ " \n",
+ " ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.\n",
+ " \n",
+ " ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### kd树"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**kd**树是一种对k维空间中的实例点进行存储以便对其进行快速检索的树形数据结构。\n",
+ "\n",
+ "**kd**树是二叉树,表示对$k$维空间的一个划分(partition)。构造**kd**树相当于不断地用垂直于坐标轴的超平面将$k$维空间切分,构成一系列的k维超矩形区域。kd树的每个结点对应于一个$k$维超矩形区域。\n",
+ "\n",
+ "构造**kd**树的方法如下:\n",
+ "\n",
+ "构造根结点,使根结点对应于$k$维空间中包含所有实例点的超矩形区域;通过下面的递归方法,不断地对$k$维空间进行切分,生成子结点。在超矩形区域(结点)上选择一个坐标轴和在此坐标轴上的一个切分点,确定一个超平面,这个超平面通过选定的切分点并垂直于选定的坐标轴,将当前超矩形区域切分为左右两个子区域\n",
+ "(子结点);这时,实例被分到两个子区域。这个过程直到子区域内没有实例时终止(终止时的结点为叶结点)。在此过程中,将实例保存在相应的结点上。\n",
+ "\n",
+ "通常,依次选择坐标轴对空间切分,选择训练实例点在选定坐标轴上的中位数\n",
+ "(median)为切分点,这样得到的**kd**树是平衡的。注意,平衡的**kd**树搜索时的效率未必是最优的。\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 构造平衡kd树算法\n",
+ "输入:$k$维空间数据集$T=\\{x_1,x_2,…,x_N\\}$,\n",
+ "\n",
+ "其中$x_{i}=\\left(x_{i}^{(1)}, x_{i}^{(2)}, \\cdots, x_{i}^{(k)}\\right)^{\\mathrm{T}}$ ,$i=1,2,…,N$;\n",
+ "\n",
+ "输出:**kd**树。\n",
+ "\n",
+ "(1)开始:构造根结点,根结点对应于包含$T$的$k$维空间的超矩形区域。\n",
+ "\n",
+ "选择$x^{(1)}$为坐标轴,以T中所有实例的$x^{(1)}$坐标的中位数为切分点,将根结点对应的超矩形区域切分为两个子区域。切分由通过切分点并与坐标轴$x^{(1)}$垂直的超平面实现。\n",
+ "\n",
+ "由根结点生成深度为1的左、右子结点:左子结点对应坐标$x^{(1)}$小于切分点的子区域, 右子结点对应于坐标$x^{(1)}$大于切分点的子区域。\n",
+ "\n",
+ "将落在切分超平面上的实例点保存在根结点。\n",
+ "\n",
+ "(2)重复:对深度为$j$的结点,选择$x^{(1)}$为切分的坐标轴,$l=j(modk)+1$,以该结点的区域中所有实例的$x^{(1)}$坐标的中位数为切分点,将该结点对应的超矩形区域切分为两个子区域。切分由通过切分点并与坐标轴$x^{(1)}$垂直的超平面实现。\n",
+ "\n",
+ "由该结点生成深度为$j+1$的左、右子结点:左子结点对应坐标$x^{(1)}$小于切分点的子区域,右子结点对应坐标$x^{(1)}$大于切分点的子区域。\n",
+ "\n",
+ "将落在切分超平面上的实例点保存在该结点。\n",
+ "\n",
+ "(3)直到两个子区域没有实例存在时停止。从而形成**kd**树的区域划分。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# kd-tree每个结点中主要包含的数据结构如下\n",
+ "class KdNode(object):\n",
+ " def __init__(self, dom_elt, split, left, right):\n",
+ " self.dom_elt = dom_elt # k维向量节点(k维空间中的一个样本点)\n",
+ " self.split = split # 整数(进行分割维度的序号)\n",
+ " self.left = left # 该结点分割超平面左子空间构成的kd-tree\n",
+ " self.right = right # 该结点分割超平面右子空间构成的kd-tree\n",
+ "\n",
+ "\n",
+ "class KdTree(object):\n",
+ " def __init__(self, data):\n",
+ " k = len(data[0]) # 数据维度\n",
+ "\n",
+ " def CreateNode(split, data_set): # 按第split维划分数据集exset创建KdNode\n",
+ " if not data_set: # 数据集为空\n",
+ " return None\n",
+ " # key参数的值为一个函数,此函数只有一个参数且返回一个值用来进行比较\n",
+ " # operator模块提供的itemgetter函数用于获取对象的哪些维的数据,参数为需要获取的数据在对象中的序号\n",
+ " # data_set.sort(key=itemgetter(split)) # 按要进行分割的那一维数据排序\n",
+ " data_set.sort(key=lambda x: x[split])\n",
+ " split_pos = len(data_set) // 2 # //为Python中的整数除法\n",
+ " median = data_set[split_pos] # 中位数分割点\n",
+ " split_next = (split + 1) % k # cycle coordinates\n",
+ "\n",
+ " # 递归的创建kd树\n",
+ " return KdNode(\n",
+ " median,\n",
+ " split,\n",
+ " CreateNode(split_next, data_set[:split_pos]), # 创建左子树\n",
+ " CreateNode(split_next, data_set[split_pos + 1:])) # 创建右子树\n",
+ "\n",
+ " self.root = CreateNode(0, data) # 从第0维分量开始构建kd树,返回根节点\n",
+ "\n",
+ "\n",
+ "# KDTree的前序遍历\n",
+ "def preorder(root):\n",
+ " print(root.dom_elt)\n",
+ " if root.left: # 节点不为空\n",
+ " preorder(root.left)\n",
+ " if root.right:\n",
+ " preorder(root.right)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# 对构建好的kd树进行搜索,寻找与目标点最近的样本点:\n",
+ "from math import sqrt\n",
+ "from collections import namedtuple\n",
+ "\n",
+ "# 定义一个namedtuple,分别存放最近坐标点、最近距离和访问过的节点数\n",
+ "result = namedtuple(\"Result_tuple\",\n",
+ " \"nearest_point nearest_dist nodes_visited\")\n",
+ "\n",
+ "\n",
+ "def find_nearest(tree, point):\n",
+ " k = len(point) # 数据维度\n",
+ "\n",
+ " def travel(kd_node, target, max_dist):\n",
+ " if kd_node is None:\n",
+ " return result([0] * k, float(\"inf\"),\n",
+ " 0) # python中用float(\"inf\")和float(\"-inf\")表示正负无穷\n",
+ "\n",
+ " nodes_visited = 1\n",
+ "\n",
+ " s = kd_node.split # 进行分割的维度\n",
+ " pivot = kd_node.dom_elt # 进行分割的“轴”\n",
+ "\n",
+ " if target[s] <= pivot[s]: # 如果目标点第s维小于分割轴的对应值(目标离左子树更近)\n",
+ " nearer_node = kd_node.left # 下一个访问节点为左子树根节点\n",
+ " further_node = kd_node.right # 同时记录下右子树\n",
+ " else: # 目标离右子树更近\n",
+ " nearer_node = kd_node.right # 下一个访问节点为右子树根节点\n",
+ " further_node = kd_node.left\n",
+ "\n",
+ " temp1 = travel(nearer_node, target, max_dist) # 进行遍历找到包含目标点的区域\n",
+ "\n",
+ " nearest = temp1.nearest_point # 以此叶结点作为“当前最近点”\n",
+ " dist = temp1.nearest_dist # 更新最近距离\n",
+ "\n",
+ " nodes_visited += temp1.nodes_visited\n",
+ "\n",
+ " if dist < max_dist:\n",
+ " max_dist = dist # 最近点将在以目标点为球心,max_dist为半径的超球体内\n",
+ "\n",
+ " temp_dist = abs(pivot[s] - target[s]) # 第s维上目标点与分割超平面的距离\n",
+ " if max_dist < temp_dist: # 判断超球体是否与超平面相交\n",
+ " return result(nearest, dist, nodes_visited) # 不相交则可以直接返回,不用继续判断\n",
+ "\n",
+ " #----------------------------------------------------------------------\n",
+ " # 计算目标点与分割点的欧氏距离\n",
+ " temp_dist = sqrt(sum((p1 - p2)**2 for p1, p2 in zip(pivot, target)))\n",
+ "\n",
+ " if temp_dist < dist: # 如果“更近”\n",
+ " nearest = pivot # 更新最近点\n",
+ " dist = temp_dist # 更新最近距离\n",
+ " max_dist = dist # 更新超球体半径\n",
+ "\n",
+ " # 检查另一个子结点对应的区域是否有更近的点\n",
+ " temp2 = travel(further_node, target, max_dist)\n",
+ "\n",
+ " nodes_visited += temp2.nodes_visited\n",
+ " if temp2.nearest_dist < dist: # 如果另一个子结点内存在更近距离\n",
+ " nearest = temp2.nearest_point # 更新最近点\n",
+ " dist = temp2.nearest_dist # 更新最近距离\n",
+ "\n",
+ " return result(nearest, dist, nodes_visited)\n",
+ "\n",
+ " return travel(tree.root, point, float(\"inf\")) # 从根节点开始递归"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 例3.2"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[7, 2]\n",
+ "[5, 4]\n",
+ "[2, 3]\n",
+ "[4, 7]\n",
+ "[9, 6]\n",
+ "[8, 1]\n"
+ ]
+ }
+ ],
+ "source": [
+ "data = [[2,3],[5,4],[9,6],[4,7],[8,1],[7,2]]\n",
+ "kd = KdTree(data)\n",
+ "preorder(kd.root)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import time\n",
+ "from random import random\n",
+ "\n",
+ "# 产生一个k维随机向量,每维分量值在0~1之间\n",
+ "def random_point(k):\n",
+ " return [random() for _ in range(k)]\n",
+ " \n",
+ "# 产生n个k维随机向量 \n",
+ "def random_points(k, n):\n",
+ " return [random_point(k) for _ in range(n)] "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Result_tuple(nearest_point=[2, 3], nearest_dist=1.8027756377319946, nodes_visited=4)\n"
+ ]
+ }
+ ],
+ "source": [
+ "ret = find_nearest(kd, [3,4.5])\n",
+ "print (ret)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "time: 5.403439998626709 s\n",
+ "Result_tuple(nearest_point=[0.09910258486020529, 0.5042390385455003, 0.8050068418001802], nearest_dist=0.006621424811579601, nodes_visited=86)\n"
+ ]
+ }
+ ],
+ "source": [
+ "N = 400000\n",
+ "t0 = time.time()\n",
+ "kd2 = KdTree(random_points(3, N)) # 构建包含四十万个3维空间样本点的kd树\n",
+ "ret2 = find_nearest(kd2, [0.1,0.5,0.8]) # 四十万个样本点中寻找离目标最近的点\n",
+ "t1 = time.time()\n",
+ "print (\"time: \",t1-t0, \"s\")\n",
+ "print (ret2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 第3章 k近邻法-习题\n",
+ "\n",
+ "### 习题3.1\n",
+ " 参照图3.1,在二维空间中给出实例点,画出$k$为1和2时的$k$近邻法构成的空间划分,并对其进行比较,体会$k$值选择与模型复杂度及预测准确率的关系。\n",
+ "\n",
+ "**解答:**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%matplotlib inline\n",
+ "import numpy as np\n",
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "import matplotlib.pyplot as plt\n",
+ "from matplotlib.colors import ListedColormap\n",
+ "\n",
+ "data = np.array([[5, 12, 1], [6, 21, 0], [14, 5, 0], [16, 10, 0], [13, 19, 0],\n",
+ " [13, 32, 1], [17, 27, 1], [18, 24, 1], [20, 20,\n",
+ " 0], [23, 14, 1],\n",
+ " [23, 25, 1], [23, 31, 1], [26, 8, 0], [30, 17, 1],\n",
+ " [30, 26, 1], [34, 8, 0], [34, 19, 1], [37, 28, 1]])\n",
+ "X_train = data[:, 0:2]\n",
+ "y_train = data[:, 2]\n",
+ "\n",
+ "models = (KNeighborsClassifier(n_neighbors=1, n_jobs=-1),\n",
+ " KNeighborsClassifier(n_neighbors=2, n_jobs=-1))\n",
+ "models = (clf.fit(X_train, y_train) for clf in models)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAA2cAAAE/CAYAAADCCbvWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nOzde3SbV3rf++8GQRIkQRLUlZIoCbJ1oWTZA9myTc/IM8jMJOPcmqRx0s6k6aBN4/S0OeewzVmrbv4Jpj1t1HNyQdZJmmTSZGHSZJJmOUnTpMmkk8trR7bpsWzDtmxTliy9omgJkigJpCAKJEHs8wdAiDdJJEXyBcDfZy2sAV7cHtAabDzv3vt5jLUWERERERER8ZbP6wBEREREREREyZmIiIiIiEhFUHImIiIiIiJSAZSciYiIiIiIVAAlZyIiIiIiIhVAyZmIiIiIiEgFUHImFccYkzXGPLDAx1pjzO473Bczxhxb3uiWjzHmR4wx/+su90eNMYOLeD3HGPPPlic6ERGRmTQ+l+/X+CwrRsmZ3JMxxjXGfH7a7X9ojLlujPnMPI+Nlr6Qf2XW8WPGmNhC3s9aG7TWnrnvwCuctfZ3rbXfMXX7bgPZajHG/LAx5hVjzKgxxvEyFhERuTuNzyujQsfnnzPGnDLG3DDG9Btj/rGX8cjKUXImi2KM+TLwK8B3W2tfvMPDbgL/2BgTXq24Vpoxxu91DKvkGpAAjnodiIiILJzG55p3E/heoB34MvBLxphPehuSrAQlZ7JgxpjngJ8HvmCtfeUuD80ASeBn7vJa/9QY80HpDN9fGmN2TruvfIbKGLPeGPOnxpgRY8zrxpj/e56lEJ8vnU26boz5FWOMmflW5v8zxgyXzjR9btodW40x/8MYc80Yc9oY8+PT7osbY14wxvyOMWYEiBljnjDGHC/FcskY8wt3+GwvGmN+sHT9SOnzfFfp9ueNManS9fKyDmPMS6Wnv11aNvIPpr3eTxljLhtjLhpj/skd/+ozY9hijHnHGPN/LeTxU6y1f2Wt/QPgwmKeJyIi3tH4vCbG55+x1vZbawvW2teAvwOeWsxrSHVQciYL9b8B/x74nLX2+AIe/x+AHzTG7Jt9hzHm+4GfBv4+sJHiF8zv3eF1foXi2aJOimeKvjzPY74HeBz4BPDDwBem3fckcAbYQHEw+iNjzLrSfb8HDAJbgWeB/zh9cAC+D3gBCAG/C/wS8EvW2jbgQeAP7hDzi0C0dP3Tpff/zLTbc85oWms/Xbr6idKykf9Wut1J8SzZNuDHgF8xxnTc4X0BKJ0RfRH4ZWvtz5WO/WdjTOYOl3fu9noiIlLRND6vsfHZGNNE8e/63t3eT6qTkjNZqG8H+oB3F/Jga20a+DXg381z908AP2ut/cBamwf+IxCZfnYOwBhTB/wg8DPW2lFr7fvA1+Z5vaPW2oy1dgD4WyAy7b7LQMJaO1H6Qj0JfLcxZjtwBPg31tqctTYF/BfgR6c991Vr7X8vnaW6BUwAu40xG6y1WWtt3x0+/ovM/LL/2Wm3P8M8X/53MQH8u1L8fw5kgTkD6jQHAIfi3+yrUwettf/CWhu6w+WRRcQjIiKVRePz2huffw14G/jLRcQrVULJmSzUPwf2Av9l1rKEu/lPwBeMMZ+YdXwnxbXSGWNMhuI+J0Px7NN0GwE/cH7asfPMlZ52fRQITrv9sbXWTrt9juKZuK3ANWvtjVn3TY9h9nv9GMW/QX9pCcf3zBMLwKvAXmPMZooD0W8D240xG4AngJfu8Lz5XC0NkFNmf77ZfgT4mOIZRRERqX0an9fQ+GyM+X+Bg8APz/r7SY1QciYLdRn4HPA08J8X8gRr7VWKxSX+/ay7zgM/MevsUNM86+SvAHmga9qx7YuMe9uswWoHxf1UF4B1xpjWWfd9PP0jzPo8p6y1XwQ2URzYXjDGtMx+Q2vtKPAG8H8CJ6y148ArwL8GPrLWDi3yMyxGHBgCvl46swmAMebXSmvl57toWYSISPXS+LxGxmdjzFeA7wS+w1o7soKxioeUnMmCWWsvAJ8FnjHG/OICn/YLwCeB/dOO/Rrwb40xDwEYY9qNMT80z/tNAn8ExI0xzcaYbmCxpWM3Af+HMaa+9B77gT+31p6n+IX8s8aYgDHmEYpn3n73Ti9kjPlHxpiN1toCxU3VAJN3ePiLwE9ye4mEM+v2fC4BC+ofcxcTwA8BLcB/Ncb4AKy1/7y0Vn6+y0NTTzbG1BljAhTPiPpKf5v6+4xJRERWkMbnNTE+/1vgS8C3l5JrqVFKzmRRSl+anwWeNcb87AIePwL8P8C6acf+mOKZrd83xUpLJyieCZrPT1LccJsG/ivFTcJjiwj5NWAPxbNV/wF4dtqX2heBMMWzdH9McR34N+/yWs8A7xljshQ3H/9Da23uDo99EWjl9hKJ2bfnEwe+VlpO8sP3+Fx3VDoT+PcpDny/NTUALNCPAreAX6V4FvYW8BtLjUVERFaHxueaH5//I8UZxFPTZtZ+eqmxSOUyWq4q1cQY85+ATmvtfFWhRERExAMan0WWh2bOpKIZY7qNMY+YoicoLm34Y6/jEhERWcs0PousjLXSVV2qVyvFpRJbKW56/nngTzyNSERERDQ+i6wALWsUERERERGpAFrWKCIiIiIiUgGUnImIiIiIiFSAVd1z1tzebEOdodV8SxGRNevihxeHrLUbvY5DKt+G5mYbDq3S+Hz1Kql141BXx6aWTavzniIiFeRu4/OqJmehzhDP/fpzq/mWIiJr1le+7SvnvI5BqkM4FOL4c6s4PieThJ8dJLNhlN6e3tV7XxGRCnC38VnLGkVERGR1xWK4L3RB7k59gkVE1iYlZzXIWsvwpWEG3h3g4w8+ZvzWuNchiYiIzMtxHa9DWFW3btzi/HvnGXh3gOy1rNfhiEiFUZ+zGpMfz/PGN9/g4tBF7HqLyRt8r/iI9ETY8dAOr8MTEREpCocJZQdJuX0ARMNRb+NZYdZaTr1+ig/e/QC70YIBjsOu8C4e/vTD+Op0vlxElJzVnPdffp8LExcIfTqEMQaA/GieN7/1Jq0drXRs7fA4QhERESAaxXUg2t1HitpP0C5+eJETH56g/el26hrqALCTlo+Of0RLqoXdj+32OEIRqQQ6TVNDxm+N455xaX+ovZyYAfib/dSF6zhz4oyH0YmIiMwSjeJ8o5MQAa8jWXGn3j1F096mcmIGYOoMrQ+1curEKWzBehidiFQKJWc1JJfNQQB89XP/szaua2T4+rAHUYmIiNxDLoebcb2OYkWNZEZoXNc453h9sJ7x/DgTYxMeRCUilUbJWQ1pbGnE5iyFfGHOfePD4wTbgh5EJSIichexGOEMZIYGSaaSXkezYlpaWxgfnlugKz+ax+/z42/QThMRUXJWUxqbG+na3sXIhyNYe3t5xOT4JBNnJ3jgoQc8jE5ERGR+Tufz9J4Iksmka7Z6456Dexj9cBQ7eXt8tgXLSP8ID3Y/qIIgIgKoIEjNefjph7n5P29yte8qvg0+7ISFS3Dw4EHWb1/vdXgiIiLziqdCJCMZr8NYMV37u7h2+Rpnj52FzWB8BnvJsm3dNvYc3uN1eCJSIZSc1ZiGpgaO/MARrp6/ytWLV6lvqKfzqU5aOlq8Dk1EROTusllS6VRNVm00PsMj0UcIXwlz2b2MtZYND22gY2vHjCJeIrK2aQ69BvnqfGwMb6T7qW4efOxBJWYVqDBZoDA5d2+giMiaFYsROx2EkRESfQmvo1kRxhjaN7Wz54k97H1yL+u2rVNiVmFswWp8Fk9p5kxkFY1cGeHk8ZNcOH8BgK07ttJ9uJvWDa0eRyYi4r14pJd4MkkoVtx7VoszaFKZxm+N8+HrH+J+6JKfzNOxvoPuR7vZ/MBmr0OTNUYzZyKrZOTKCC/92Uukm9K0fraV1s+2kg6keelPX+LG1RtehyciUjHWQt8zqRz58Twv/4+XOX3jNE1PN9H+He2M7hjllRdfYbB/0OvwZI1RciaySk6+cRK7y9IabsXn9+Hz+2gNt1IIF/jwjQ+9Dk9EpHJks6Tcvpqt3CiV5eOTH5OpzxB6KERdYx3GGJo2NdHyWAvvvvauljnKqlJyJrIKCpMFLgxcoKVr7v6/lu0tDJ4dnNH+QERkzYrFcF/oIpTNK0GTVTF4dpBA19zZ2oa2Bsbrxxm5MuJBVLJWKTkTWQXa8C0isghTCZqWN4rXdN5UVpmSM5FVYHyGbTu2cfP8zTn3ZQeydO3qUgInIjJbLud1BLIGbN+1ndz5uf/WxofHaZxspG1jmwdRyVql5Exklex7fB8+18fImREK+QKFfIGRMyPUnatj3+F9XocnIlJZwmHCGUidPkYylfQ6GqlhW/dtpWOyg8y7GfK38tiCZTQ9ys03b/LIk4/gq9PPZVk9+tcmskpa17fyme/7DNvy28j+TZbs32TpmuziM9/3GYLrgl6HJyJSWaJRnP4eIkN+MkODStBkxfgb/Hzyez/JvnX7GHt1jOFvDtN2sY2nP/c0W/dt9To8WWPU50xkFQXXBXnsOx7jUfsooL1oIiJ3FY3iECWeSpA4WEzQYpGY11FJDWpoamD/p/bT/clusMXtCCJe0MyZiAeMMUrMREQWKB7ppfdEUDNosuKMMUrMxFP3TM6MMQFjzLeMMW8bY94zxnyldHyXMeY1Y8wpY8x/M8Y0rHy4IiIiAmtvfI5nIkSyWgIuIrVtITNnY8BnrbWfACLAM8aYHuA/Ab9ord0DXAd+bOXCFBERkVnW3vicy5HJZbyOQkRkxdwzObNF2dLN+tLFAp8FXigd/xrw/SsSoYiIiMyx5sbnaBTnG50wMkKiL+F1NCIiK2JBe86MMXXGmBRwGfgm8BGQsdbmSw8ZBLbd4bnPGWOOG2OOX75ymWQqqfXiIiIiy2C5xucro6OrE/D9isXIfH0HoUxOvyVEpCYtKDmz1k5aayNAF/AEsH++h93huV+11h621h5uqoewm9GGXhERkWWwXOPzxubmlQxzeYXDhHMBr6MQEVkRi6rWaK3NAA7QA4SMMVOl+LuAC/d6/r7RJpxUZEbFJc2kiYiI3J/7HZ+rTi5HJpPGcR2vIxERWVYLqda40RgTKl1vAj4PfAD8LfBs6WFfBv7knu/W2grRaLkkbrg/Tbg/TebygNaPi4iILMKyjs/VpNScOpTNk3L7lKCJSE1ZyMzZFuBvjTHvAK8D37TW/hnwb4B/bYw5DawHfnMxbxyP9OJ0Po/T+Ty977dpg6+IiMjirMj4XBWiUdzjR4hkArgZ1+toRESWjf9eD7DWvgMcmuf4GYrr2+9bPNJLPJkk9KXiDFpvT+9yvKyIiEjNWo3xWUREVtc9k7NVE4uRSSYJPztIwjkKgeJm30hnhGg46m1sIiIiUnlyufIe9lgk5nU0IiL3bVEFQVZcLIb7Qhe9qQC9fRAaymo9uYiIiMwVjeJ0Pk9kyK8q0CJSMypn5mxKLEa8dDXuOIQPHyPl9gFoBk1ERERmcPp7iEZSuBu8jkRE5P5V1szZbFMbftOQOn1MM2giIiIyVy7ndQQiIsuispMzKJfMjQz5SZ0+RsI5SsI5quULIiIiAtEo0XRAbXlEpCZUfnIG5XXlmd/pIpPspDdV/BJWgiYiIiLxSG+5LY9+G4hINauO5GxKLFbck1b6EtZZMhEREYFigha51uB1GCIi96W6krNppp8lS/QlcFxHe9JERETWuEwu43UIIiJLVrXJGRQTtMzXdxAayuL29xX3pGkmTUREZE1yvtFJaChb7JcqIlKFKq+U/mLFYriOU7zuuuUm1qFQJ+FQGFAJfhGpHSNXRjjff54bIzdo72hnx/4dtHS0eB2WSGWIxXCTyfJvgUi4R78BRGRV5LI5BvsHuXr5KoGmANv3bqdjawfGmEW9TvUnZwDRaPmqm0wSfSYNmQxuoI9MoPQQfTmLSJUbeG+AN7/1Jr4uH/71fi5lLnHqj0/xZPRJNj+w2evwRCpD6aSt+qSKyGoZvjTMsT8/Rn5Tnvr19eRH85z9q7N07+1mX8++RSVotZGcTReL4UzNpAHR7j5S+WPF6/pyFpEqdWvkFm+99hbBniD+5tJXdyeMbxnnded1nul6Bn9D7X2liyxJNIrrlH4DBFIa/0VkxdiC5fhfH8fsN7R3tpePF7YX6H+ln807N9OxtWPBr1fVe87uKBotX6b3SFN5XRGpVhc/uojdZG8nZiUN7Q3k2/Ncca94FJlIhSr1PxMRWUnDl4e5YW/Q3Nk847iv3odvu4+BkwOLer3aTM6mm5agZYYGSaaS5YuISLUYuzWGr+kOX9kBmBibWN2ARKpFNqsxX0RWzERuAhOYf9miv9lP7lZuUa9X+8kZlJtY954IEu5PE+5Pq4m1iFSVjk0dFK4V5hy31sI1aF3f6kFUIpUtHumdcXJWRGS5BdcHsRlLIT93jB6/Ms76zesX9XprIzkriUd6cTqfLyZqamItIlVkY3gjrROt3Dhzo5iQUVznPnJyhA0tGwhtCXkcoUhlmjo5q/5nIrISmlqbCIfDDJ8YLido1lpG06M0XG2ga1/Xol5vTSVn081uYi0iUsnq/HU89d1PEcqEGH5pmOE3hhl+cZjN+c08/szjiy7VKyIiIsvj4U8/zK7WXYy8OMLIGyOMvDxC49lGPvVdnyIQXNze1zVd2ise6SWeTBL60swZtN6eXg+jEhGZX3N7M0d+4AjZq1ly2RzN7c3qcSayUKWTsRrjRWS51dXXEflchH039pG9lqW+sZ72ze1LOnG6ZmfOymIxMl/fQW8f9PZBaChLwjmK4zpeR1bzCpMFMukM1z6+Rn4873U4IlXBGEPrhlY2hjcqMRNZIK2WWRxrLTeGbnB18CpjN8e8DkekajS1NrFx50ZCnaElr2hZ0zNnZbEY8dLVeDJJ+NlBNa5cYemP0qReTjFWNwZ14L/l58ChA4Q/EdbyLBERWXbxSC+kEiR6vI6ksmWvZXnjb97g+s3rmCYDNyD8QJiDRw5SV1/ndXgiNU/J2WyxGK7jED58TAnaCrl+4Tp9L/XRdKiJ9lCxWV9+NM9bb7yFv8HP9gPbPY5QRERk7Rm/Nc7Lf/YyE7smaO8qLskq5AucefcMhRcLHPr8Ia9DFKl5WtY4n2gU9/gRImlInT5GwjlKwjmqMrzL5FTqFP4H/TSGGsvH/M1+Wh5u4YM3P8AWrIfRiYhIrYqnQtq+cBcXPrzArbZbBLcHy6tYfH4foUdCnDt3jtHhUY8jFKl9Ss7upNS8OvM7XWSSnUTSlPukOK6jL/X7cCV9habNTXOON4YaGR0bZTw37kFUIiJS82Ix3Be6CGXzpNIpr6OpOEOXhqjfWD/nuKkzmA7Djas3PIhKZG3Rssa7iUbLVx0gnkqQ3D2Im8mQIUcqnVLVpyVoDDQymZukrnHm2vXCRAFfwUedX2vaRURkhcRixFIJEhu8DqTyTI3P8xoDf4N+NoqsNM2cLUI80ov7QhduMkRvKqCqT0u0a98ubp6+WW6kO+XG2Rt0hbv05S8iIisvl9MqmFm69nRRGCxQmCjMOJ4byhGYCNCxpcOjyETWDiVnixWLFas7qizvku18eCeb6zaT+VaG0Yuj3Lp0i0wqQ3AoyIGnDngdnoiI1Lh4JlLeV6795LeFOkPs27OP4VeHuXHuBrmhHMP9w0y8O8Hjn3scX51+NoqsNE1R3IfZTaxDgRAAsUjM28AqnL/Bz5Pf8yTpj9KcP32eQqHA1ge2sm3vNuoDc9e6i4iILKtoFMeBaHcfKX/a62gqhjGG7qe62bR9E+dOnuPWx7cIbwqz46kdNLc3ex2eyJqg5Ox+xWJkkkmiz6QhkMEN5Eg4R4mEe1SC/y7q/HVs27eNbfu2eR2KiIisRdEoTtIlFFNyNp0xhvXb17N++3qvQxFZk5ScLYdYDMdxyjfVI01kaQqTBT7u/xj3Q5fxsXE2b93Mrod30dLR4nVoIlKr8nmSqaRWvYjchbWWoYEhzpw4Q/ZGlrZQGw8cfID1XUril5sWDy+XaLR8md4jTZuNRRamMFng9b94nddPvE52W5bJA5N8lPuIv/3vf8v1i9e9Dk9EalEsRu+JIJnLA9o/LnIXp4+f5tiLxxhqH2LywCSXmi/x0l+9xNm3z3odWs1RcrYSSj3SIkP+8mbjqYuIzO/iqYt8fPNjOp7ooGlTEw1tDbTvbce330fqpdSc6p4iIstBBb5E7u5m5ibvv/s+bT1ttHS10NDWQHBHkNYnW3n3+LuM3RzzOsSaouRspUxL0ML9acL96XITaxGZ69ypcwR2BjDGzDjetLmJ4dwwN6/f9CgyEal1StBE7uzS2UvYzZa6hpl9aP1NfgobClw5d8WjyGqTkrOVFI3idD5fvkwtnVCCJjLXxPjEnC9+KG5ONw2GyYk7NEYVEVkG8Ugvma/vUP8zkVnyE3lMvZn3PltvNT4vMyVnq2jqzJzWtovM1dnVSe5ibs7x/Ggef86voiAisipCBLwOQaSirOtch71i52wvsNZihgztm9s9iqw2KTlbZVo6ITK/nQ/tpOFKA9mBLLZQHAAmbk4w8tYI3Z/oxt+g4rIisgqyWVJuH4m+BIm+hGbRZM1b37WeDc0bGH5vmEK+AEBhokDmnQxb1m9RcrbMlJx5oLx0QgmaSFkgGODI9x4hdDXE8IvDDB8bZuL1CSIPRXjg0ANehycia0EshvtCF72pAL19EBoqJmpK0GQtMz7Dk9/5JDubdpJ9Mcvwy8NkX8ryYPuDPPbtj83ZKy73R6eivRKL0ZtKkOjxOhCRytG6vpUjP3CEWyO3yE/kaW5vps4/dx+aiMiKicWIl67GHYdodx+p/O3+pVNCoU71RpM1oz5Qz6HPHeKhWw8xdnOMQDBAfaDe67BqkpIzr+Xm7rGR2jZ2c4xrH18DYN22dTS2NHocUeVpamvyOgQRkWJhLwdw3ZmHn0mTyg+qeXWNmZyY5OrgVfLjedo2thFcF/Q6pIrT0NRAQ1OD12HUtHsmZ8aY7cBvA51AAfiqtfaXjDFx4MeBqfqZP22t/fOVCrQWxVMhkrsHSThHiYR7iIajXockK8hay6nXT9H/bj92ncVi8f2dj/2P7Gf34d1aFiAii6LxeZVEo3MOOUA8lSBxYEAJWo24dOYSx188zkRwAhqAV6BrSxeRz0a051lW1UL2nOWBn7LW7gd6gH9pjDlQuu8XrbWR0kVf/IsVi+EeP0Iom5+zXEJqz8cffMyJD0/Q8nQL7YfaCR0K0XKkhXdPvsuFkxe8Dk9Eqo/GZw+pAnPtyF7L8tqLr1H/aD2hx0OEPhGi/TPtDOYGOXHshNfhyRpzz+TMWnvRWvtm6foN4ANg20oHtmZEo7gvdHkdhawway0n3z5Jy4GWGb286hrraN7fzMm3T3oYnYhUI43P3lMF5tpw7r1z2C5LQ/vt5XrGZ2g72Ma5s+cYuznmYXSy1iyqWqMxJgwcAl4rHfpJY8w7xpjfMsZ0LHNsa0s+j+M65YvUFluw3Bi5QUPH3HXajesaGbk+Ui4fLyKyWBqfvaMKzNXv2tA1GtfN3f/t8/sgCKPDox5EJWvVgpMzY0wQ+EOg11o7Avwq8CAQAS4CP3+H5z1njDlujDl+ZVT/uOcVDhMZ8uP29+H295E6fYxkKul1VLKMjM/Q0NhAfjQ/5778zTyBpgDGpz1nIrJ4Gp8rQCxG5us7CA1lSThHvY5GFqkl2MJEdmLOcVuw2FFLQ7MKYMjqWVByZoypp/jF/7vW2j8CsNZestZOWmsLwG8AT8z3XGvtV621h621hzc2Ny9X3LUlGsXpfB63rwe3r4feE0Eyl4ubjKcuUt2MMew+sJsbJ29g7e0ZMluwZE9mefDAgx5GJyLVSuNzBSn1SAtl8ySco1oFU0V2du8k7+YpTBRmHM+6WTZ2bKQl1OJRZLIWLaRaowF+E/jAWvsL045vsdZeLN38AUA7Ju9XqSJUnCikEjjX0hAIkApmSfQl6O3p9TQ8uT8PHnqQa5eucfHVi9R1FvedFS4W2BraqibLIrJoGp8rUCyG6ziED9/ui6ZKzJVvXdc6Htr/EO8fex+z1eAL+JgcmiSYC3Loew55HZ6sMQupDfop4EeBd40xqdKxnwa+aIyJABZwgZ9YkQjXqHikl2JzFYiHUiQOFNeyhwIhAJXtrUJ19XU8+d1PcnXwKhfdi/iMj81Pb2Z913otaRSRpdD4XImiUVyHcvNqUIJW6Ywx7H1iL527Orlw+gLjY+OsO7COzgc771pGX20UZCWY6UusVtrhrVvt8eeeW7X3qynJJNFn0gC4wTyZoJ/e6PMeByUilewr3/aVN6y1h72OQyqfxucV4DjFBG1DntCGLv2IrzGJvgRks+D3q1etLNrdxmd11asWsRjO1PVkkvCzxebVoVAnAOFQuCa/GAqTBQZODHD6vdOMZkdpX9fO3k/spXN3p5o2i4hI5YpGcZzSDBqDNTnLcvnsZT5Mfci1K9cINAfYfWA3Ox/ZSZ2/7t5PrjKO6+BmXAAymTShbB73hS7Czw5qCassq0WV0pcKUWpeHUlDuD8Ng8UvhlrbfGyt5a2/eos3+99k8sAkwc8GGd0xyquvvsqZN894HZ6IiMjdlQp+RYb8ZIYGa6rA17l3z/HySy8zsmWE4GeD8DCkzqQ4/pfHKUwW7v0CVcRxnWICNjhIuD9NJE2xR23p91gom6/J32HiDc2cVatoFIdo8frU0on8MVLp4raDUCBU9Wforl+4zvnL5+n4VEd5T1bTpiYa2hs48Xcn6NrfRWPz3L4kIiIilcTpfJ54KkHiwEBNzKBNjE3wzrfeofWpVvzNxZ+SDe0NdDzawYW+C1w9f5WN4Y0eR3l/kqkkmVymeCObJTLkx+k/Ui7eRqz0wKk9hpFUeWZN5H5o5qwWRKM4/cUS/L19EHFzNXGGLu2mMVvMnGIZdY11sB6unr/qUWQiIiKLE4/00vt+G5nLA1XfrPr6hesU2grlxGyK8Rn8W/1cOHvBo8iWRzKVJDM0SMTN0dsHvSeCOP09txMzkRWkmbNaEY0WS/CX1MIZOmvtHasYWrN6hWxERESWQzzSC6lEuQLzlGprlWOtvePpfR+L/mcAACAASURBVOMzc/qFVYN5Z8o6n4fOBb5A7vaJ8Wr93SWVQTNnNaoWztBt2r6JyfQksyuKFiYKmKuGjq0dHkUmIiKyNPFIL5mv7yjOyPRBaChLwjnqdViL0rGlA5MxTI5NzjhurSV/Mc+WnVs8imxpEn0JMpcHyv9Nek8Ei4nZQtXw3kJZfZo5q2Gzz9BV25m5Dds30BnsJP12mtZ9rfib/IyPjJN9L8u+7n00tTZ5HaKIiMjixWLES1fj0yowT6n00uwNTQ10P9zNieMnaHmohcZQI5O5SUZOjbDBv4FNuzbNeHyiLwG53D1fdzU/94yYxsfpfb+t+LvpPtTa3kLxhpKzGlfNCZrxGR5/5nFOHT/FmVfPkC1kaWps4tFHHmXnwzu9Dk9EROT+xWK4jgOuC1A1pdn3PL6HQEuA/rf6yeQy+I2fvfv2svfxvfjqiguzpqocTpWdv5vV/NyJvgSMjJD5+o7bB2OxZXnt8u8ulKDJ0ig5WwPikd7bZ+aqLEHzN/jZ/8n97HtyH5P5Sfz1/jvuQxMREalK0wpNuNMrMJeSlVCos+J+5Btj2PHQDrYf2E5+PE+dvw5fne922fmSUDaPe/wIxKJ3fT3XcQgfPrYiCdq8Mb2wY9kSstnikV6c9FHcTfd+rMhsSs7WilgMd9rSiUpfMjGbr85XPhMnIiJSs0rNq6dm0qLPpEnli/uYwqFw8VgFjd/GGF6++HL5dur0sWIxjW+UKmmEwwurcjhVkr6UmM64awmfd3rPsXln71YoMZuuXGBEZBGUnK0l0xK0algyISIisiZNS2YcihWYk7sHcTMZMuRwM27FzKQlU0kymTQhApDLEcn4i2Xn7zFTNq9SYhrt7sPN90EgsKTPOzcmij3KlhLTEjnf6KzKFUviPTO7Et5KOrx1qz3+3HOr9n5yB6WlA5mgv+pm0ERk4b7ybV95w1p72Os4pPJpfK4CySQA8UiGxIERaGsj0hkBVu9E6/TZKAA34xarHL7fRjwVKh5crhmpWZ83tGnHXWcOp2JzM26xR9lSZu+WW+mEeGZDUAmazHC38VkzZ2vRPEsHlKCJiIhUsFLSEwdIJUjuzuJm+sj486TSqRX/8Z/oS0A2Syg/7adjPk/v6VKVw8gyv+Hszzs+iJtOz/t5p5pGT8UWyZR6lMWWOabFisWIpRIkNngch1QVJWdr1bSlAymOVdQSiUriuI4SVxERqSjxSC9xxynecN3yfvJQqDhTFA6Fl2XsmurXlcllblc3DIdnPmgVZqSmf954KFWuQB0KhMqx9b7fRjwTWbWYFiWX0+8JWTAlZ2tZNIpDlGj6KCnU1X66cmWn/OqckRQREVmU6RUek0miz6Qhk8EN5Ehl0sWHLDEZKJfAz0E4V9y35Xxj5aobLkjp88aJEp/6vIEM5HJE0/ffo2ylxDMRnHQfKbTXXxZGyZmoaWKJ4zq4GReguJE4myd2OjjjDB2wZv8+IiJSoWIxnKmZNCiXpJ8a0xYykzbfGOi+0HV7pmwVi2nc06zPW3EzZdNNrVSiuJVEK5XkXpScCTCzaeJarCw0tV49MnT7/xJO/5HiF34qgdOZA9Kk1o2vyb+PiIhUuBm90orbFkgXZ9BSGwbvmhQ4rlMugV8+tsrVDRetkhOy2WZsJdFKJbk7JWdSVk7QSjNFtZ6ATK2lB25Xdurvuf2FXyryNGOpxLReccu9tl9ERGRZlLYtlG9O274wn7uNgbJMpm8lCar/mdyZuvrKDPFIb3HD70gxQatVyVSSzOUBwv1pwv1pek8Ei5Wd7nUmLhbDfaGLSBrC/WkYLPaMm11eWEREpFI4nc/TeyJYHvNmXxY8Bsp9i6YDXocgFU4zZzJXLEZm2gxRb/R5ryNaFjOSzanKTlOzYos5QxiL4Uxddxy1JBARkYp314IZmiVbXSNrY4WSLI1mzmR+pRmiUDZPwjla9TNDib5EMSHro3h5f5kqO0WjOP09RIb8pE4fu+OSEREREZF4pJfe99tqfoWSLJ1mzuTOYjHc0gxayq2+ErCzZ8oyX1+hMsCzNvpOvW8oENKGXxEREZmhvMe/x+tIpBIpOZO7i8Vwp5buVUmPjnJ/llIpfIB4aoX7s5Q2+sZTtxPCtd6aQEREREQWR8mZ3FsV9OgoN40uCWXzuMeP3N7cHFmdOGYslZxqTeAcLd4OBLS+XERERIinQiR319beflke2nMmCzNtb1Vm6M7leL2QTCWL/VnSkEl2kkl2zkzMPDJV+TKT7KQ3FdD6chERESmqsb39snw0cyYLN71Hh8dNFJOpJJlMmhAByOWIZEr9WSqtYWZpKWUciE9VwFSFJhEREanyvf2yMjRzJos21S8lc3lgVWfQHNfBcZ1yj7LeVAA3GcL9/c7q6M8ydZZsKFs+SzZ1ERERkTUoFsM9foRImhnbM2Tt0syZLEm50hADqzITlOhLQC5XnCnLZuk9XSqFv0p7yZbNtLNkrpsCIEMO0NkyERGRNSkaxUm6hGJpryORClCVydnI2BhvfvwxV4aHaWtu5lBXF5taWrwOa80pJ2gHVqaZ4tSMUiqdut00OhUCQitbeXGllSpg4roARJ9Jl5tYT1GiJiLVaHxyknfSadzLl2loaOChLVt4oKMDY4zXoYmIVIWqS84+unaNP3z1VQ5OTNDt93NlcpKv9ffzbY89xuGuLq/DW3PikV7iySShLy3vDFrCOUqoOKFEKJ/HfaFUCr/aZsruZNoSTKfUqsDNF5czZPz5iqyIKSJyN8O5HMlXXmHzyAj7/X5uFQp849QptjzwAN//8MP4lKCJiNxTVSVnE5OT/NG3vsU/qKtj57SZskMTE/zGG2/wwPr1rGtq8jDCNSoWIzNV7GKJJWEd18HNuADFQh/ZPO4LXRAOl94jumzhVpxSoRUcp3iz1MxaPdJEpJr82bvv8uiNGzwdCpWPPVYo8NunT/P2xo0c2rLFw+hEKpjjEH1GSxqlqKoKgpy8epUtuRw7A4EZxzvq64lYS+rCBY8ik/spCTvVoyzsZgi7GSJpiqXwY7HiDFOlF/pYLqXP6lXBFRGRpboxNsbghQv0tLbOOF7v8/F0IMBbZ896FJlIhSutnkltyBMKdXodjVSAqpo5y46Ps/4O962vq+PjW7dWNR6ZpbSXKnz42D1Lwk5POjJDg0SG/Dj909YsrpWE7A5mF1wJBYpnojWTJiKV6ObEBG0Uk7HZ1tfXkx0dXf2gRKqB65KKQiR8RPvNBaiy5GxzSwtvAdbaOZuL3clJutrbvQlMbotGcZ3S0rxSkYvZXzbJVLKckBWVepSt8YRstqkEzenMAWlS68bVI01EKlJHIMBwXR3ZyUmCdXUz7nNzOTZv2+ZRZCJVIBBQYiZlVZWchUMhzPr1HLt2jSNtbeUE7YObNznb1MR3az17ZYhGcZypvVPHynvJpmQuDxQrL0amJRmayZ/XjL/RtH19U0sfwqGwvtBFxHONfj+R3bv50w8+4Nn29vIM2tDEBC8WCvzgAw94HKGISHWoquTMGMMXH3+cP3jjDVJXrrDdGK5Yy83WVr70xBME/FX1cWpbKUGLh1I4nbc3uUbTAWBWYiYLU+qRFo9kcDrTuME8qUzxb6sETUS89vm9e/nT8XF+8cwZdhvDLWDQ7+cLTzzBzmlFQkRE5M6qLptpDwT4Z5/8JIMjIwyNjnKwsZEHOjpUorcSRaPEic48phmy+xOLEZ+6vsD9fSIiq8Hv8/EDjzzCtT17GBgept7n49l162jUiVOR+ZVWxEDQ60ikglTlN6Yxhu3t7WzXHjNZooK1XMpmsRT3MtbNs4m94i1gf5+IyGpb19SktjZyX66OjnIrn2dDc3Ptrooq9YilrU17yWWGe/6LN8ZsB36b4pxHAfiqtfaXjDHrgP8GhAEX+GFr7fWVC1Vkebx/+TL/6+238d+8iQ+41dTEZx9+mENbt3od2uLN2t+XSqcACAVCquwoUuM0PkutuZTN8qdvv83wlSu0GcNVn49De/fy+T17qvMk6l3EIxklZjKvhfxLzwM/Za3dD/QA/9IYcwB4Hvhra+0e4K9Lt0Uq2pnr1/mLV17hBycn+clQiH8RCvEjxvDia6/x3qVLXoe3NNN6o/X2QW8f6pEmsjZofJaakR0f57++/DKPXrvGv2pv58fb2/nJ5mauvP8+3+jv9zo8kVVzz+TMWnvRWvtm6foN4ANgG/B9wNdKD/sa8P0rFaTIcnnp5Em+UF/P9mmNzDsbGvh7gQAv9fdjrfUwuvsTj/SWL73vt5G5XOyRJiK1SeOz1JLjg4N037rFo62t5ToCwbo6nm1v593Tp8mOj3sc4fKKp0KEhrIknKNehyIVZlFzxMaYMHAIeA3YbK29CMUBAti03MGJLLfzly+zb569ELsCAa5dv8745KQHUS2/qQSNkRHNoImsARqfpdqdv3yZfQ0Nc44HfD62W8vHIyMeRLWCYjHcF7oIZfMknKM4ruN1RFIhFpycGWOCwB8CvdbaBf8/xBjznDHmuDHm+JXR0aXEKLJsGvx+RguFOcfHrAWfr3bWtDtOsXm1z0c4FPY6GhFZQRqfpRY0NDTMOz4D3AQaZjU3rwnTErSU26cETYAFJmfGmHqKX/y/a639o9LhS8aYLaX7twCX53uutfar1trD1trDG5ublyNmkSU7uGsXr2Szc46/duMG+3bswF8jyVm0u49UJ0R2H1EFR5EapvFZasXBri5em5hgctb2gnO5HNmmptrtlReL4R4/QiQNqdPHcFynfJG16Z6/RI0xBvhN4ANr7S9Mu+t/AF8uXf8y8CfLH57I8oo++CCn29v54+vXOZfLcT6X439mMrzZ0sLnu7u9Dm9ZhUKdSsxEapjGZ6kl+zduJBQO87VMhpOjo6THxzk2PMwfTEzwvY89Vtv9bKNRnP4eIkN+3P4+XDdFyu3TtoQ1aiHNIz4F/CjwrjEmVTr208BR4A+MMT8GDAA/tDIhiiyfloYG/tmRI7w+OMhfnj+PtZY9u3fz4zt2EJxnrbuISAXT+Cw1w2cMz0YivLNlC6+ePcutsTG2bt3KPw6H2RxcA02aS61xcN3izWfSpEjjuI5OtK4x90zOrLXHgDudrvjc8oYjsvKa6uv59K5dfHrXLq9DERFZMo3PUmt8xhDp7CTS2el1KN6IRstXnWSScCzjXSzimRptu762DedyvDYwwMClSzTU13Nw504+sXlz7RS7EPFYJp3Bfc9lZGSEtvY2wgfChDprdD+EiCyb8clJ3rxwgQ/On6dQKLB72zYe7+qiub7e69CkUjgOUGxSnSHnbSxVaHR4lIH3B7iSvkJDYwPhfWE27dqE8VXPslglZzXm4o0b/M6xY3xibIxvDwS4VSjQd/Ei73V18cXHHquZghciXjn37jneOv4Wvp0+Gnc2kslkcP/C5dEnHmXHQzu8Dk9EKlQunyf56qt0DA3xdCBAHfDu22/z6x99xD85coTQtP6bsgY5DtHuPtye4s2MP08o1KUljYuQSWd4+S9eJt+Zp3FHIyO5ES68doGwGybybZGqSdCUnNWYP3v7bb5jcpJPTKtqtLepid8ZHOStLVt4vKvLw+hkNcRTCVIH82geZ/nlsjne/tbbBD8ZxN9U/PpsXNdIvjNP6tUUm8ObaWxp9DhKEalEf3f2LFuGhvh7oRCmVNxiV1MTLw4P883+fn4oEvE4Qll1jjNzj9mGPJHdR8p3KzFbOGstbzpvYvYb2jvby8ebtzTjvuqy7dw2Nu2qjpaPSs5qyNXRUUaGhni4vX3GcZ8xfDIQ4EXXVXJW4+KpBIkDIxBsIxaJeR1Ozbl05hKFjYVyYjbF3+ynsKHA5bOX2X5wu0fRiUgle+fMGb4cDJYTsylPtrbyCwMDTDz8MPW12MtL5pdMEn52EKLFYicZIBJW+5ulujF0g5GJEdo3z/wNbHyG+h31DJwcUHImqy+XzxM0Zt5ys8G6OsYmJjyISlaN4+B056Ctjd6eXq+jqUkT4xNwh4kx22CL94uIzCM3Pk5rS8uc443G4CsUmCgUlJzVumSyfDX87CCZoJ9I+PaMqRKzpcuP5zENZs7JD4C6QB1j18Y8iGpplJzVkI0tLWT8fobzedr9M//Tnrp1i65t2zyKTFZNIEAooAWNKyW0KQQfFpdPTB8ArLWYa4b2h9vv8mwRWcu2b9rEh1eu8PCssvCDY2M0t7bS5NdPspqWTBL60gCU2/b46Y0+72lItSS4LojJGibHJ6lrmHmSY+zyGJu2VMesGSg5qykNdXU8sW8ff/jOO/xwezvBujqstXyUy/FqXR2xcNjrEGUFxFOJ4pUQpIJZQtpttmLWd61nXcM6rvdfp21vG6bOYCctwx8Os6FpA+u2rfM6RBGpUEf27uWPL15k/dgYWxuLU/BXJyb4k1u3ePqRR+Y94y+1Ix7JaGXLCmpoamD3/t30v9VPe6Sdusbib+DRC6M0XGlg+2eqZ8uBkrMa85kHH6RgLb988iSbJicZtZZCays/9OijbJpnOYVUsVJlp9TBPJTOxIYCXdprtoKMz/Dkdz3JOy++w4UXL0ALcBO6tnbxyHfqx5WI3NkDHR184amn+P2336Z5eBg/cK2+nk8/9hiHtm71OjyRqtf9ZDc+4+PUsVMUWgrYMUuoOcSj3/0ogWD1VENVclZjfMbwuT17+NSuXVy8cYNGv58t82xAlirlOMRDqeLV7ly5spPWqa+exuZGHv/Ox7l14xa5GzkCrQGaWpu8DktEqsDBzZvZ//nPc+HGDQrWsrW1VfvM1oBysS7avA6lpvnqfHQ/1c2Dhx4kez2Lv8FfXO5YZb+BlZzVqIDfz66ODq/DkOVUquyUCfohEAACRDojSsw80tTapKRMRBatzudje7v2p64VU4lZaNMOrWxZJfWBejq2VO9vYCVnIpUsmSyuUweSz2ZLlZ16lJCJiIhUiza1t5GFU3ImUqmmKju1TS2DCGojsYiIiEgNU3ImUkHiqQROZw6A1JfGVdlJREREZA1RcibiNccBIB5KFTcMt7URChQL4msZhIiISJVyHJzuHFA9lQLFe0rORDwUTR+FSPFLOxXMEtqgDcMiIiJVb6rdTSdEOiNeRyNVRMmZyGoqzZIBxS/tDXlCG4pNo0OElJiJiIjUgHJipiJeskhKzkRWS+ksmlvMxcj41aNMRESkVoVCnRrjZdGUnImspKmZMtct9yiLhHvKd+tLW0RERESmKDkTWSmlptH4/dADmYB6lImIiIjInSk5E1lOyWT56lSPsqmNwErK7i0/nmfgvQHOnjzLxPgEm7ZsYndkN20b2+79ZBEREVkR1lounrrIR+99RHYkS1uojd0P72bTrk0YY7wOr6YoORNZJvFUguSzWQgGyZCDgHqULcbkxCR9/7OPK/YKzfua8Qf8DF4cZPBPB/nUt3+K9dvXex2iiIjImtT/Sj8fnP2Apj1N1HfXM5wZ5pVjr3Dw2kH2HN7jdXg1RcmZyP0ozZTFI5lyjzLNlC3NhQ8vcCV/hdDhUPksXNsDbdxqvcVbf/cWn/vi53R2TkREKl48lSB1YJyQ14Esk+y1LCc/PEnoSAhfvQ8Af5OfxnWNvH/sfbr2ddHU2uRxlLVDyZnIEkXTR0n9o3xxTxkQCqlH2f0YOD1AYGdgTgIW2BBg+INhsleztG5o9Sg6ERGRe4unEuWTtbXym+DS2UvQSTkxm1LXWIfdaBk6N8T2g9s9iq72KDkTWSjHIR5KFa925ko9yrpq5svXa5OTk5i6uTNjxhhMnaEwWfAgKhERkYVzOnOENtXWydrJyUmou8OddWh8XmZKzkQWwnEIHz5GJuiHQAAIEOmMaOniMtqyfQvvXXiPwPrAjOMT2QnqJ+oJrg96FJmIiMgCOA5EAvd8WLXZsG0D9qTF7rEzVrfYgoUr0PFEh4fR1Z5VTc4ujF4mnkrMOBaPqGCCVKhkkngkU7x6OFvuUaaEbGXsOLCDM/1nGPlohNZwK6bOMD48TvadLI8+9ih1/judthMREZGV0rG1gy2hLVx8+yJt+9uoa6wjfyvPjfdvsGPLDm05WGarmpxdDkKiZ9qBkRFIJZSgSeUp9Si7PVMW1EzZCmtsaeTpv/c0J14+wcW/vQh+aPI38fhjj9O1v8vr8ERERNYkYwyHv3CYk986yZljZyj4C9QV6tjfvZ89j+9Rsa5ltqrJ2aaWTTzX81z5djKVJOEbxEkfJZq+v2ngeCoEsdh9Rihr2fRZ3cSXipt5VQp/dTW3N/PEdz3BRG6C/ESeQEsA49OXvoiIVDjHIdrdRyqYJ1QzdRpv8zf4eejIQ3Q/2c34rXEamhqoq9eKlpXg6Z6zWCRGMpUkFcyQCt/fayUODJBJJpWgyZJMr65UpMTMS/WBeuoD9V6HISIicm9TiVknRMJHanqVTV19HU31Kpu/kjwvCLJc1WwSfQlCXxogkj5636/l9PdANHr/QUlFi6cSOJ05gGI/khqrriQiIiKrJBAgFArVdGImq8Pz5Gy59Pb0kkwlcTfd3+tkMmnCwWO4zrSDStRqg+OUr0a7+0gdLJbCBwixfCcKRERERESWomaSM1ieH9eO65By+wj39BHOBSCXw0m6Wi5Z7UpLDorFPSiuCVePMhERERGpIDWVnC2HqeloN+PiUppJe3YQV/vZqs+0mbKpHmWhUHGTbiSkyosiIiKyDFwXtyfndRRSI5SczWP2j/ap/WwqOFJFSqXw8Rf/iWcC6lEmIiIiyyyZJPSlgWKFZ63GkWXg8zqAatDb0wttbYS+NDCnibZUEMcpXqb1KAt39xDu7lFiJiIiIsvLcYg+k1brHVlWmjlboN6eXhJ9pXLrapxdceKpBMnDWfD7yUSBQFBflCIiIrKyAgFCgdrraybeUXK2CFMVIRMMKEGrBMkkAPFIptyjLNIZAeYuTRURERERqXRKzhZpqnF2wjeIkz6K0/m81yGtSfFUguSzWQgGyZAjFFKPMhERERGpbvfcc2aM+S1jzGVjzIlpx+LGmI+NManS5btWNszKEovECG3oIrUhTzR9dEZVQFkhpb1kJJNE00dJHMxCVxfhcIRIuEeJmYisSRqjRURqy0JmzpLALwO/Pev4L1prf27ZI6oSsUis2BONY0S7+4r5mZpVr4xSj7LUkXy5+mIopB5lIiJojBYRqSn3TM6stS8ZY8IrH0r1mdrXlPL3EQ4ew3VQgrbcHKfcoywSPqK9ZCIi02iMFvHI1InjYJ4QKggiC5fou3vl9/sppf+Txph3SksqOu7jdapaNBwlEu4plm0/fKxcpEKWQTI5LTFTKXwRkUXQGC2yUqYSsw15Qhu0kkcWLtGXgJGRuz5mqQVBfhX494At/e/PA/90vgcaY54DngNo39y+xLerbOUZNLeP8LODuGpWvTzCYWDQ6yhERKrNgsbo6ePzjvbaHJ9FVkwgQGhDSImZzMtxHVLp1JzjoaEs7gs7MJy743OXNHNmrb1krZ201haA3wCeuMtjv2qtPWytPdzc3ryUt6sK0XCU3ujzZDYECT87qBm05RCN4h4/QiQNqdPHcFzH64hERCreQsfo6ePzxubaHZ9FRFaT4zqk3D5CQ1l6+5hxcV/ouucEzpJmzowxW6y1F0s3fwA4cbfHryWRzghubm6mLEsUjeI4FJcPcAw34+oslYjIXWiMFhFZXVMJGQD5PJEhP07/kbm1KCL3fq17JmfGmN8DosAGY8wg8DNA1BgTobhkwgV+YuHh174MOeKRDHGvA6kVMxK0QRzX0f4zERE0Rot4wnVxe3JeRyEem1rR5WZcMkODxYTsG53FO8PhJRcJXEi1xi/Oc/g3l/Rua0A0HCWVTpE4MAKpBPFIr9ch1YZSghbu6fM6EhGRiqExWmSVJZOEnx0kE/DTq5U8a1YylSSTSRMiALkckYwfp/N5iN3/ay+1IIjcRW9PL8lUkgQDStCWWz6Pm3G9jmLJrLVcv3CdywOXAdi0YxMdWzswxngcmYiIyNp2KZvlrcsXyebH2d22noc2bqS+ru72AxyH6DNpMhuCRDojWsmzxkzNlKXSKRgZoff9NuKpUhuFZSwEqORshcQisWKC5hvESR8tZtNyf6JRYqkUiYYBEn0JenuqK+mdzE/y5jff5OOhjzGdxWSs/2/62bZhG499x2P46u6ns4WIiIgs1V+ePc3vXzyB2WKoazH82eVTdLlt/NQnnqKjqan4oGiUaCqFG8ziZvrI+POk0qmq+z0ii5foS0A2SyjvJ5TPEzvdVpx8WcAessVScraCphK0FINE00dx+nvUpPo+xSO9kEqQOJglmUpWVXGQs6mznL95no5PdWB8xeTMPmg5/8Z51qXW8eBjD3ocoYiIyNrz0bVr/N7lE2x7vJX6+tJM2Tb4+PwIyf4U/+rQU+XHxiO9xB2neD1U3MaS6EsQ6Zz5K10zatVpvsrgUzNlma/vKLV5YkV/zys5W2EzErTuPhwHJWj3KZ6J4GRTuBu8jmThrLWcfu80rY+1lhMzAOMzBPcFOZ06reRMRETEAy9ePEfD9rrbiVnJ1q5W3jl/iaHRUTZMbzdR+h0XJ0o8mST6TBo3M3NPfMLtIxLuUZJWRcol8GfVeolkwPnGjlXrYazkbBXEIrHif3COKUFbLrkcmVzG6ygWzBYsuVyOUDA057761nqGR4exBTsjcRMREZGVlx7L0tJSP+e4MYa6ZsPI2NjM5Gy6WAynNJNW5rqEnx0k5faV98mHQ2ElahXIcZ3yf6NMJk0om8c9fmTuA2PRVYtJydkqmfo/ZMrfRzh4DNdBCdpSRaM4yeIXX8I5Sm+08vfzGZ+hpbWFsetjNHY0zrhv/Po4re2tSsxEREQ8sKs5xJmR67S3B2Ycn5wsULgJ66f2nN3JPL/nXMch2t0HmeKJ5FTwWPGhStAqRjKVLJbAzwaLB3LM35tslSk5W0XlBM3tI3z4GG7SXbUp0poTiq9DOAAAHhhJREFUi+Emk4T/Yboq9p4ZY9j3yD6Ov3Oc+sfr8dUXi38UJgrc/OAmjx963OMIRURE1qbPbAvzzXfPMLp+gubm4gyatZbzZ0Z4un0H7YHAPV5hHqUWQOWb3X2kODaj4nSl/3aZbmqGqZpink8ylSxfL/cm65+2X7ACJk6UnK2yaDhKNBwl4Rwl/OwgbjKpBG2pwmHCuQyu13Es0PaHtnNz5CYnXzoJpf1yZshw4MABuvZ3eRuciIjIGrW1tZX//YEn+PU33+BKx01oAHsNDjV28qWHHl76C0/7oe8QJZo+CkNpAFLrxqvi5DJM24uVzZPIVMeKpfkkU0kylweIXGsAIJoOFgvNdXoc2CzGWrtqb7Z131b73K8/t2rvV+kSfYnb1V+UoC1eaclAqpOq2nR7a+QWVwevYoxhXdc6mlrvsVxCKprjOsVKThVo+N8Ov2GtPex1HFL5Dm/dao8/p/FZ1rZbExP0Dw2Ry+fpamujq61txfqQ/v/t3W1sXNd95/HvIYfi04gaihJFPdG0ZNmyYytj+SGyrCjjuMmmybbpInG6CVJk2kXdAl0g3L6pdt+UKVBAWHR3580iRbLNTnYRtwmUJummhZEg9oWtpHQs2zeyJTMSbY0pShpJjDSihuKIHPLsi5mh+CiNyCHvvcPfBzBI3qHJ/8EV58xvzlOPmyDx0DC0tExd89OW/DP6tlyOaBqcFzsKB3CHQ7CY0USvlc4m88H5w+ZrX1uwf9bImYe693WT6E0Q+dIA3Tqs+u4VpwzE6MXNB2cud2NLI9se0khZNZj+bmK8P+x1OXN8zesCREQCpLGujkc3b16R31U6GqiktCW/HwLa1FqsoRCxdAPQQE8mCvEYqWSSnmhwNmSbyR/B7E4UzjxWCmiySKWANm0udxCmCEjwJN3k3B1Cc7lbOzv5YJ76bF/7keKZiIhfTQ8KPckkkS8NzPuaMNoRXfY3n2f0cdlsYS1Wx6G5U/7icXqWtRJROJPgmxHQBgMzh1uCY+67iCXFdxN9GMxERCRA4nEy84xKJe/L4uYKZ6gtV0ArLbPpPlmaYhkOxAhTtVI4k+pQCmjRYB1OLf5V9ruIIiIilTDPqFSP49D1+FHcVO/UGrBKjKTNGKHz0VosUTiTapPL3fl7RO5g9ruIPe42bdojIiIrLxYj5UBPpBDMnI4cbnZpyzjm9nHamM5PFM6kesRixFyXxPoB3yyqleBI9CZuhfuxsZnvIkYX/v9ERESWVSxGD7FbX6YP4zJIwjkMQCTScdugVtq8qiSSzZM6Mi2QqY/zFYUzqSql3Y8SDw1r7ZnckZNyAKZ2XEwdmbaLpt5FFBERH3I6DkEyCUBPNEPioQGSbpKuSNe8368+LlgUzjyWcA4TyeYL06b0zkVF9ES7cdKHSbV7XYn41dQW+BQ295jqtNRZiYhIEBT7qx4AN0FybJBUZp4t7nM5ohlw+g5APLZy9cmiKZx5KNGb0IvCZTRn23NZstJI0+347ay5+Wq+9S5ipHglor9BEREJpJ5oNz3FkbR5dXVpV+EAUTjzWLw/rBeFy6B0in3COUx37JDX5VSFRG8Cslki+YWfNjKhPG7a9c16v9IW+LNr1ruIIiJSVfRasmr4JpxZa7ly7gpnf32Wmzdv0rapje27t1PfXO91aRJE8TipZLIQ0LQ5yKI5KYdUJlUYhRweJvNCZ+EduIWkUlOhOBLpoCvSteIjaaWagamzyZy+fXO/Ue8iipTtYjbLz8+f5ezoNbY2ruXpzZ1sXrvW67JERKqOL8KZtZaTR09y6swpQp0hatfXcuH8BU6/fZqnP/M0LRtb7vxDRGaLx4m7CRI69+yuTa3JykFXrgFyOZwXy9tqN5VMEvtUGjIZUg29JFK9RLv2rUhIK43uRbNhyOWIpYsHaepsMpFFezN9gf/53i+xWyG8YQ3Hr1/kX97u50+6HuMjW7bd+QeIiEjZfBHOfnP2N5xKnWLd/nXUhGoAaOpoYuTcCG+8/Aax52IYYzyuUgIrl8NJOb5bC+U3M0acMulb6yFLI2XlTgGMx3EcZ+rL0uGZpZ9d6dG0pJsEmBrd6z7ZQk+muLuORsdElmRkbIy/7T9G294mmprqAFi/vpHR9nH+1xtv8mDbRlrqNcNFRKRSfBHOBn49QKgzNBXMSpq2NHHtvWtcH7qu0TNZlJ5MFCfdi5s/Cvhvswq/cFIObv/RwogTQG6Ja7KmhaKUA7HdvVDcRcoNL+3wzOlmj5SVO7onIuV559IlbrZNTAWzksbGOsY2Zjl+Mc2Bzns8qk5EpPr4IpzlcjlCkbmlGGMwDYbxm+MeVCVVIRbDcQrhwKVyoaAalEacYPrarGkjTpWaChiL4RCD4mha4V4Mzvj95d6Tio3uiUhZbuTHMQ123sdqGgzZcfXPIiKV5Itw1tbexuXLl2nc2Djj+uT4JAxDuDXsUWVSFYrhIJY+PBUKVntAS7pJMpcGiF5ZU7wSKhxquZxrs4qjaQ4xetwETkcaAHf9WFn3ZGodXDZPV7bw1OW8qGMoRJbTlvBaGDBw79zH7FXYtkWzWkREKskX4azzwU76v9/P6IZRGtsLAW0yP8m149fYed9O7dgoFeF0HKLHTZDYtzrPP0v0Jm59UVqbFfVmF8vpv7fHTZBgYGZ988lmi6N7B25Nm4wvW4kiAuxqa2PH+6188EGGbZ0tGGOw1nJ+8Drbb67jwQ3acUlEpJJ8Ec4aWxrZ/6n9vPHSG2ROZzD1Bq7Bjh07+NCBD3ldnkjgJXoTU4GswLtgNltPtBvcOwQzAMKFjT60yYfIiqkxhq/u+Qh/9+5bvH3hIjVhw2TW8mDdRv54z15qa2ru/ENERKRsvghnAK2bW3n2i89y7dI1xm+Os7ZtLQ3hBq/LWjalF8ugKSErbnh4VZx9NnukLPOCfzfL8EtQFJG51jU08OePPkU6m+XK6Cit9zTQEQ5rF2URkWXgm3AGYGoMkY6I12Usu+mjGHpRurJKozSJh6o3oE1fmxXvL6zX7HH9G8xEJBg6wmE6wloDLiKynHwVzlYTBTPvTAW0fV5XUjlOysFNu4UvcrnCLobHpq3NinpWmoiIiIiUSeFMJOCmzigbCuG8WNxusatLa7NEREREAkbhTFalHjdC8r5BEs5humOHvC7nriXdJJlMYSt68vniLob7dM6XiIiISIApnMnqFI+TSibp+vxgYNaeOSkHgFQmRebSQGFqrFtcoxmPL+8ZZSIiIiKy7BTOZPWKx4m7CRIBOKYn0ZsorCWjAbJZuvuLaxa1lkxERESkaiicieRyOCmHWFfM60pmKI2UuWn31u6ebgSIaOdFERERkSqkcCarWk8mipPuxc0fBfBNQEs4h4nkCp9H8nlSR4pb4WukTERERKRqKZytMCflQDYL6KwYX4jFcByI7e7F5SipTIp4NL7iZTgph1QmBUAmky5shX9kW2HXRdBGHyIiIiKrgMLZCkq6STJDg0SHQvRkNATiG7EYDjFi6cO4DJJ0kysa0KYOjc5BV64BcuD0HVAgExEREVll7hjOjDHfAv4tcMla+3Dx2nrgu0AXkAK+YK29unxlVodMLnNry/MqPYPqYjbLv545w+ClS9SvWcOeri4e27qVUE2N16XdkdNxiB43QSKaXvY1aEk3OfV5KbA7fdNOxa7Sfx8iUlnqo6VcI2Nj9A4McGpwEGst92/bxkc6O1lbX+91aSIyTTmvmJPAp2ZdOwT8zFq7C/hZ8WspQyzdULUvvN+/epVvv/wyG95/ny9MTPBsNsvpY8f4zuuvk5+c9Lq8svS4kcKOiMuoNILalcrQlZoV2Ev/iYiUJ4n6aLmD4Zs3+earrzL6zjv87s2b/N7YGOMnTvDNV14hk8t5XZ6ITHPHkTNr7SvGmK5Zlz8LxIqffxtwgL+oYF0SMNZafvzmm3wuFGJnYyMA7cC9DQ383/Pn+VU6zWNbtnhbZLmyWdy0W9GRsxkjZaUzykpTW2MxnVEmIouiPlrK4fT388j16zwbiUxd21JfT9O1a/zs17/mcx/+sIfVich0i51rtslaewGg+LG9ciVJEF3IZqnNZtnRMHPUyRjDk/X1nBgY8KiyuxSPE+8Pw/Bw4WyxCkj0JshcGqCrL01XX7oQzKLdGiUTkeWiPlqmWGt558wZPrJ27ZzHngyHefeDD5i01oPKRGQ+y74hiDHmeeB5gHWb1i33rxOPjE1M0EghjM3WVFPD2Pj4yhe1SD3RbnqSSSLxxa89mxHsSmeURbsLX2uUTER8YHr/3LlO/XM1G8/naZxn7Xd9TQ12cpJJa6mZp/8WkZW32JGzi8aYzQDFj5cW+kZr7TestY9bax9vWte0yF8nfrc5HOZSbS3X8/k5j72by3FPR/ASyWLXniWcw0SGsnT3QncvZF7ovBXMRESWX1l99PT+eWOT+udqZYyhc9Mm+m7cmPPY6dFROjZsCMSmXSKrxWJHzv4J+ApwuPjxRxWrSAKpPhTiyd27+d7x43yupYVIKMSktRwfGeF4fT3Pb9/udYl3L5vFTfUCtz+c2kk5uGm38EUud+uMsni8cE2nJojIylIfLTMcfOABfvjKK4RzOTrr6zHGcDaX45/HxvjM7t1elyci05Szlf7fU1hYvMEYMwj8JYUn/O8ZY/4DMAA8t5xFSjA8s3MntTU1fKOvj7UjI4wAkQ0b+IMPf5h1Dcu7A2LFxeOkkkm6Pj9424A2dUZZNl9Yq0ZDYaMPnVEmIitAfbSUY+f69Xx6/35+9PbbMDyMASaamvjkU0/xwIYNXpcnItOUs1vjFxd46NkK11LVEs5hItk8Pe62qh1JMcbwsR072H/PPQzduEFDKERrcefGQCoFtHhmxuWkmySTSRe+yOeLW+Ef0OYeIrLi1EdLuR5sb2f3xz/O5Rs3sNaysblZ68xEfGjZNwSRwuYQc6a6VbG62lo2z7MrVDUonVEWHQrhvFhcR9fVpWAmIiK+Z4yhvbnZ6zJE5DYUzlZCLrdqglnV6eqiK92Lmz9amN44NnZr58W418WJiIiISDVROFtmTsrxugRZilgMxwFSqVvXFLJFREREZBkonC2j0mYR0TSFqW8STJqyKCIiIiIrQOFsmcxYm9S3Ty/wRURERETktnTq4DJQMBMRERERkbulkbMKmxHMOg5Bh9cViYiIiIhIEGjkrIKSbpLMpQG63wkXgpmIiIiIiEiZFM4qxEk5ZDLpW9usi4iIiIiI3AWFswqK0ECPG/G6DBERERERCSCtOasQN+0SyWYBhTPxp3Q2yyunT9N/7hzGGHZv387Hdu1ifWOj16UFyqS1vD44yOunT3M1m6U1HObJ++/nia1bMcZ4XZ6IiATMyNgYr7z/Pm+fOcPY+Didmzbx0fvv597WVq9LC5zTv/kNR0+d4tzlyzSsWcOeHTs4uGMHDaHgRJ7gVOpjid4EkaEsqSPbdECx+NK54WG+8+qrfGxigt9pbmYCePPMGb514QJ/dPCgAtpd+H8nTnDl1Ck+29TE5rVrOX/zJj/95S+5+MAD/M6HPuR1eSIiEiCj4+N86xe/YOfVq/xxOExzfT19ly/z/XSazzz1FA+2t3tdYmC4Fy7w8muv8W/q6rgvHOb6xARHT5wgefEif7R/P2tqa70usSya1rgETsoh4RxWMBPf+9m77/KJyUk+0tJCY20t4dpaDq5bxxO5HK/093tdXmBcuH6d9/r7+XIkwvaGBkLG0NnQwB9EIpw6fZpLIyNelygiIgHy+uAg265e5dOtrbTW1bGmpoY94TDP1dfzk+PHmbTW6xIDIT85yU9/9Su+1NTEQ83NrKmpoa2ujt+NRFg3NIR74YLXJZZN4WyRnJSDm+olks0rmImvjU1McDad5pFweM5je5ub6Tt71oOqgunXQ0PssZa6mplPnWtqanjYWvouX/aoMhERCaK+s2fZO8/slc76empGRvSmX5kGh4dpvXmTTWvWzLhujGFvfT19g4MeVXb3NK1xEWYEs2MHIB7zuiSRBVlrwdp534mpNYbJyckVrymoJq2lZoF1ZbXFx0VERMplraV2nn7FGEOttepXyjS5wOscCN5rHY2c3SUn5eD2HyWaphDMYjGvSxK5rfpQiI6NG+m7cWPOY8dHRti1dasHVQXTfW1tnAAmZnWWE9Zywhh2tbV5U5iIiATSfVu3cnx0dM719NgYo42NbGpu9qCq4NnW0sLlUIir4+NzHjueywXqtY7C2V1IuslCMBsK4fTtUzCTwHjmoYf453yekyMjTFpL3lrevH6dV2pr+eiuXV6XFxjbW1rYsH07RzKZqQ7gyvg437t6lY577mFrS4vHFYqISJA8uX07fc3NvHrtGjcnJ7HWcmZ0lO+OjPDMI49QW6OX6uVYU1vLwUce4YXr1zmby2GtZXRigpeuXePsunXs3bLF6xLLpmmNZUq6STJDg4Vg1nEIOryuSKR8O1pb+dzBg7x88iQ/uHwZjKFz82a+/OCDdMyzFk3mZ4zhuUcfxWlp4Zv9/UyOjFBTX89je/bwsR07vC5PREQCZm19PX944AA/7evjb86epWZyknWRCM/u3cvDmzZ5XV6gPNXZSUMoxA/7+sheu4atrWX3vffyhw88QGNdndfllU3h7C5Es2GcvqiCmQTSjtZWdjz9NDfzeYwxgdlS1m9CNTX81q5dfPy++8jl8zSEQguuQxMREbmT1sZGvvDoo4zv2UN+cpKGUEjnZi7So1u2EN28mVw+T11tLaEAjjwqnN2NXM7rCkSWrD5ABzH6WY0xNAXonTgREfG3utpa6vTG6ZIZYwI1UjabXqWVIdGbgOFhYukWrTMTEREREZFlEbyxvhVWCmbdJ1voiXZ7XY6IiIiIiFQpjZzdRimYZV7o1CHTIiIiIiKyrBTOFpBwDhcOmT6iYCYiIiIiIstP4WwWJ+XgpnqLwWybgpmIiIiIiKwIhbNZUplUIZgdOwDxmNfliIiIiIjIKqENQebRlQ1pV0YREREREVlRCmfTOCmHzNCg12WIiIiIiMgqpGmNRUk3SWZokOhQCKdvH3R4XZGIiIiIiKwmCmfMCmYdhxTMRERERERkxa36aY1JN0nm0gDd74QLwUxERERERMQDq3rkrHTIdPfJFnqi3V6XIyIiIiIiq9iqHTlTMBMRERERET9ZlSNnpWCWeaFTh0yLiIiIiIgvrLqRs0RvgshQVsFMRERERER8ZdWFM4B4f1jBTEREREREfGVVhTMn5UAu53UZIiIiIiIic6yacOakHNxUL5Fsnp5M1OtyREREREREZlgVG4I4KQe3/2jhkOm+AxCLeV2SL1hr6b9yhZPnz5OfmKCrvZ09mzZRV1vrdWkiIiKr2sVslrfOneP6jRu0RyLs3bKFtfX1XpclIstsSeHMGJMCrgMTQN5a+3gliqqkpJskMzRYDGb7FMyKJq3liOvymw8+YG9NDfU1NZxMpfhFJEL8qafUAYiIBFwQ+miZ378ODPDzt97icWBrbS0DAwN8/d13+cL+/XRFIl6XJyLLqBIjZ89Ya4cq8HMqTsFsYW+cO8eNM2d4vrWVWmMAiAIvXbvGv5w4we/v3ettgSIiUgm+7aNlfpdGRvj5W2/xfHMzLaHCy7RHgAdHRzny2mv8p098gtqaVbMqRWTVqdq/7lIw634njNNxSMFslrfef5+DjY1Twazk6bVrOTM4yI3xcY8qExERWb3cc+fYC1PBrGRHYyNto6OcvnLFm8JEZEUsNZxZ4CfGmDeMMc/P9w3GmOeNMceMMcduXLuxxF9XHiflkMmk6X4nTE+0e0V+Z9BkR0dpq6ubc72+poYmaxlVOBMRCbrb9tHT++fLN1amf5Y7y46O0rbA2u82YGRsbGULEpEVtdRw9rS1di/w28CfGWMOzv4Ga+03rLWPW2sfb1rXtMRfV74IDfS4mpe9kPbWVlLzHCuQyecZDYVo0ZozEZGgu20fPb1/3ti0cv2z3F77unWk8vk51621fAC0NzevfFEismKWFM6steeLHy8BPwCerERRS+WmXchmvS7D15667z5eGh/nyrQRsrHJSX58/TqP3X+/dmwUEQk4v/bRcnuPbtnCqfp6Tk0bzbTW4gwP09jezraWFg+rE5HltugNQYwxzUCNtfZ68fNPAn9VscoWKdGbgOFhUkc6IR73uhzf2rl+PR994gm+4bp0joxQD7xnDLvvv59ndu70ujwREVkCv/bRcmfNa9bw7/fv58jrr/NqJkObMQxYS3N7O7//2GOYWWvFRaS6LGW3xk3AD4pPEiHgBWvtixWpapESzmEi2byCWZke37aNRzo6OH3lCvnJSZ6NRIg0NHhdloiILJ3v+mgp3/Z16/jqs8/y3pUrZMfGeKK5mS1r1yqYiawCiw5n1tr3gQ9XsJZFc1IObqq3GMy2KZjdhfpQiIfb270uQ0REKshPfbQsTo0x7Gpr87oMEVlhgd9KX8FMRERERESqQSUOofbMjGB27ADEY16XJCIiIiIisiiBHTlzUg5u/1GiaQrBTIdMi4iIiIhIgAUynJVGzKJDIZy+fQpmIiIiIiISeMZau3K/zJjLwAdlfOsGYGiZy1lJ1dSeamoLqD1+Vk1tAW/ac4+1duMK/04JoLvon6G6/jarqS1QXe2pprZAdbWnmtoCPuufVzSclcsYc8xa+7jXdVRKNbWnmtoCao+fVVNboPraI6tXNf1brqa2QHW1p5raAtXVnmpqC/ivPYGc1igiIiIiIlJtFM5ERERERER8wK/h7BteF1Bh1dSeamoLqD1+Vk1tgeprj6xe1fRvuZraAtXVnmpqC1RXe6qpLeCz9vhyzZmIiIiIiMhq49eRMxERERERkVXFd+HMGJMyxrxtjHGNMce8ruduGWO+ZYy5ZIx5Z9q19caYnxpjThc/tnpZY7kWaEuPMeZc8f64xphPe1ljuYwx240xLxtj3jXGnDDGfLV4Paj3ZqH2BPX+NBhjfmmM+VWxPV8rXr/XGPNa8f581xizxuta7+Q2bUkaY85MuzdRr2sVuRvqn/2jmvpnqK4+Wv2zfwWlf/bdtEZjTAp43FobyPMTjDEHgSzwf6y1Dxev/VfgirX2sDHmENBqrf0LL+ssxwJt6QGy1tq/8bK2u2WM2Qxstta+aYxZC7wB/B4QJ5j3ZqH2fIFg3h8DNFtrs8aYOuAo8FXgz4F/tNb+gzHmb4FfWWu/7mWtd3Kbtvwp8GNr7RFPCxRZJPXP/lFN/TNUVx+t/tm/gtI/+27kLOista8AV2Zd/izw7eLn36bwR+p7C7QlkKy1F6y1bxY/vw68C2wluPdmofYEki3IFr+sK/5ngY8DpSfLQNyf27RFRDyk/tm/qqmPVv/sX0Hpn/0YzizwE2PMG8aY570upkI2WWsvQOGPFmj3uJ6l+o/GmOPFaRW+n2IwmzGmC3gUeI0quDez2gMBvT/GmFpjjAtcAn4KvAdkrLX54rcMEpAObnZbrLWle/PXxXvzP4wx9R6WKLIY6p/9L5DP/9NVUx+t/tl/gtA/+zGcPW2t3Qv8NvBnxaF78Y+vAzuBKHAB+G/elnN3jDFh4PtAt7V22Ot6lmqe9gT2/lhrJ6y1UWAb8CTw4HzftrJVLc7sthhjHgb+M7AbeAJYD/h6ao7IPNQ/+1tgn/9LqqmPVv/sT0Hon30Xzqy154sfLwE/oPCPIOguFucgl+YiX/K4nkWz1l4s/sOeBL5JgO5PcX7x94HvWGv/sXg5sPdmvvYE+f6UWGszgAPsAyLGmFDxoW3Aea/qWoxpbflUcaqLtdbeBP43Abw3srqpf/a3oD//V1Mfrf7Z//zcP/sqnBljmouLJzHGNAOfBN65/f8VCP8EfKX4+VeAH3lYy5KUniSL/h0BuT/FRaB/B7xrrf3v0x4K5L1ZqD0Bvj8bjTGR4ueNwG9RmKf/MvD54rcF4v4s0Ja+aS8wDIW5+YG4NyKg/jkIgvr8D9XVR6t/9q+g9M++2q3RGLODwrtxACHgBWvtX3tY0l0zxvw9EAM2ABeBvwR+CHwP6AQGgOestb5fyLtAW2IUhuQtkAL+pDQf3M+MMQeAV4G3gcni5f9CYR54EO/NQu35IsG8P3soLCiupfCm0festX9VfE74BwrTDN4Cvlx8Z8u3btOWl4CNgAFc4E+nLUwW8TX1z/5STf0zVFcfrf7Zv4LSP/sqnImIiIiIiKxWvprWKCIiIiIislopnImIiIiIiPiAwpmIiIiIiIgPKJyJiIiIiIj4gMKZiIiIiIiIDyiciYiIiIiI+IDCmYiIiIiIiA8onImIiIiIiPjA/weyvjipu20WGQAAAABJRU5ErkJggg==\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "titles = ('K Neighbors with k=1', 'K Neighbors with k=2')\n",
+ "\n",
+ "fig = plt.figure(figsize=(15, 5))\n",
+ "plt.subplots_adjust(wspace=0.4, hspace=0.4)\n",
+ "\n",
+ "X0, X1 = X_train[:, 0], X_train[:, 1]\n",
+ "\n",
+ "x_min, x_max = X0.min() - 1, X0.max() + 1\n",
+ "y_min, y_max = X1.min() - 1, X1.max() + 1\n",
+ "xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.2),\n",
+ " np.arange(y_min, y_max, 0.2))\n",
+ "\n",
+ "for clf, title, ax in zip(models, titles, fig.subplots(1, 2).flatten()):\n",
+ " Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])\n",
+ " Z = Z.reshape(xx.shape)\n",
+ " colors = ('red', 'green', 'lightgreen', 'gray', 'cyan')\n",
+ " cmap = ListedColormap(colors[:len(np.unique(Z))])\n",
+ " ax.contourf(xx, yy, Z, cmap=cmap, alpha=0.5)\n",
+ " ax.scatter(X0, X1, c=y_train, s=50, edgecolors='k', cmap=cmap, alpha=0.5)\n",
+ " ax.set_title(title)\n",
+ "\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 习题3.2\n",
+ " 利用例题3.2构造的$kd$树求点$x=(3,4.5)^T$的最近邻点。\n",
+ "\n",
+ "**解答:**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "x点的最近邻点是(2, 3)\n"
+ ]
+ }
+ ],
+ "source": [
+ "import numpy as np\n",
+ "from sklearn.neighbors import KDTree\n",
+ "\n",
+ "train_data = np.array([(2, 3), (5, 4), (9, 6), (4, 7), (8, 1), (7, 2)])\n",
+ "tree = KDTree(train_data, leaf_size=2)\n",
+ "dist, ind = tree.query(np.array([(3, 4.5)]), k=1)\n",
+ "x1 = train_data[ind[0]][0][0]\n",
+ "x2 = train_data[ind[0]][0][1]\n",
+ "\n",
+ "print(\"x点的最近邻点是({0}, {1})\".format(x1, x2))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 习题3.3\n",
+ " 参照算法3.3,写出输出为$x$的$k$近邻的算法。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**解答:** \n",
+ "**算法:用kd树的$k$近邻搜索** \n",
+ "输入:已构造的kd树;目标点$x$; \n",
+ "输出:$x$的最近邻 \n",
+ "1. 在$kd$树中找出包含目标点$x$的叶结点:从根结点出发,递归地向下访问树。若目标点$x$当前维的坐标小于切分点的坐标,则移动到左子结点,否则移动到右子结点,直到子结点为叶结点为止; \n",
+ "2. 如果“当前$k$近邻点集”元素数量小于$k$或者叶节点距离小于“当前$k$近邻点集”中最远点距离,那么将叶节点插入“当前k近邻点集”; \n",
+ "3. 递归地向上回退,在每个结点进行以下操作: \n",
+ "(a)如果“当前$k$近邻点集”元素数量小于$k$或者当前节点距离小于“当前$k$近邻点集”中最远点距离,那么将该节点插入“当前$k$近邻点集”。 \n",
+ "(b)检查另一子结点对应的区域是否与以目标点为球心、以目标点与于“当前$k$近邻点集”中最远点间的距离为半径的超球体相交。如果相交,可能在另一个子结点对应的区域内存在距目标点更近的点,移动到另一个子结点,接着,递归地进行最近邻搜索;如果不相交,向上回退;\n",
+ "4. 当回退到根结点时,搜索结束,最后的“当前$k$近邻点集”即为$x$的最近邻点。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# 构建kd树,搜索待预测点所属区域\n",
+ "from collections import namedtuple\n",
+ "import numpy as np\n",
+ "\n",
+ "\n",
+ "# 建立节点类\n",
+ "class Node(namedtuple(\"Node\", \"location left_child right_child\")):\n",
+ " def __repr__(self):\n",
+ " return str(tuple(self))\n",
+ "\n",
+ "\n",
+ "# kd tree类\n",
+ "class KdTree():\n",
+ " def __init__(self, k=1):\n",
+ " self.k = k\n",
+ " self.kdtree = None\n",
+ "\n",
+ " # 构建kd tree\n",
+ " def _fit(self, X, depth=0):\n",
+ " try:\n",
+ " k = self.k\n",
+ " except IndexError as e:\n",
+ " return None\n",
+ " # 这里可以展开,通过方差选择axis\n",
+ " axis = depth % k\n",
+ " X = X[X[:, axis].argsort()]\n",
+ " median = X.shape[0] // 2\n",
+ " try:\n",
+ " X[median]\n",
+ " except IndexError:\n",
+ " return None\n",
+ " return Node(location=X[median],\n",
+ " left_child=self._fit(X[:median], depth + 1),\n",
+ " right_child=self._fit(X[median + 1:], depth + 1))\n",
+ "\n",
+ " def _search(self, point, tree=None, depth=0, best=None):\n",
+ " if tree is None:\n",
+ " return best\n",
+ " k = self.k\n",
+ " # 更新 branch\n",
+ " if point[0][depth % k] < tree.location[depth % k]:\n",
+ " next_branch = tree.left_child\n",
+ " else:\n",
+ " next_branch = tree.right_child\n",
+ " if not next_branch is None:\n",
+ " best = next_branch.location\n",
+ " return self._search(point,\n",
+ " tree=next_branch,\n",
+ " depth=depth + 1,\n",
+ " best=best)\n",
+ "\n",
+ " def fit(self, X):\n",
+ " self.kdtree = self._fit(X)\n",
+ " return self.kdtree\n",
+ "\n",
+ " def predict(self, X):\n",
+ " res = self._search(X, self.kdtree)\n",
+ " return res"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "x点的最近邻点是(2, 3)\n"
+ ]
+ }
+ ],
+ "source": [
+ "KNN = KdTree()\n",
+ "X_train = np.array([[2, 3], [5, 4], [9, 6], [4, 7], [8, 1], [7, 2]])\n",
+ "KNN.fit(X_train)\n",
+ "X_new = np.array([[3, 4.5]])\n",
+ "res = KNN.predict(X_new)\n",
+ "\n",
+ "x1 = res[0]\n",
+ "x2 = res[1]\n",
+ "\n",
+ "print(\"x点的最近邻点是({0}, {1})\".format(x1, x2))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----\n",
+ "参考代码:https://github.com/wzyonggege/statistical-learning-method\n",
+ "\n",
+ "本文代码更新地址:https://github.com/fengdu78/lihang-code\n",
+ "\n",
+ "习题解答:https://github.com/datawhalechina/statistical-learning-method-solutions-manual\n",
+ "\n",
+ "中文注释制作:机器学习初学者公众号:ID:ai-start-com\n",
+ "\n",
+ "配置环境:python 3.5+\n",
+ "\n",
+ "代码全部测试通过。\n",
+ "![gongzhong](../gongzhong.jpg)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git "a/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/3.KNearestNeighbors.ipynb" "b/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/3.KNearestNeighbors.ipynb"
index a07d4fd..10e5296 100644
--- "a/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/3.KNearestNeighbors.ipynb"
+++ "b/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/3.KNearestNeighbors.ipynb"
@@ -80,7 +80,7 @@
},
{
"cell_type": "code",
- "execution_count": 3,
+ "execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
@@ -91,7 +91,7 @@
},
{
"cell_type": "code",
- "execution_count": 4,
+ "execution_count": 5,
"metadata": {},
"outputs": [
{
@@ -121,7 +121,7 @@
},
{
"cell_type": "code",
- "execution_count": 5,
+ "execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
@@ -137,7 +137,7 @@
},
{
"cell_type": "code",
- "execution_count": 6,
+ "execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
@@ -151,7 +151,7 @@
},
{
"cell_type": "code",
- "execution_count": 7,
+ "execution_count": 8,
"metadata": {},
"outputs": [
{
@@ -293,7 +293,7 @@
"[150 rows x 5 columns]"
]
},
- "execution_count": 7,
+ "execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
@@ -304,22 +304,22 @@
},
{
"cell_type": "code",
- "execution_count": 8,
+ "execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- ""
+ ""
]
},
- "execution_count": 8,
+ "execution_count": 9,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
- "image/png": "\n",
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEJCAYAAACZjSCSAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/NK7nSAAAACXBIWXMAAAsTAAALEwEAmpwYAAAfpElEQVR4nO3df5xddX3n8de7YTSxIllhbGEmMSpuHkpCCRn5IRYVtGiIIQWL8FDbKG26rgqWig9xrbLRNli2ai27KIJVixuMFCOgQFnA3wJOCCQQDKBgk5Fd0tAE0ADJ9LN/3DPJ5HJn5p6Z+733nHvez8djHjPn3HO/+ZxzYT5zzvl8zlcRgZmZVddvdToAMzPrLCcCM7OKcyIwM6s4JwIzs4pzIjAzqzgnAjOzikueCCRNk7RO0nUNXlsmaauku7KvP00dj5mZ7Wu/Nvwb5wD3AS8Y4/WvR8T72hCHmZk1kDQRSOoHTgb+Gji3FWMedNBBMWfOnFYMZWZWGWvXrv23iOht9FrqM4LPAh8C9h9nm9MkHQ/cD/xFRGweb8A5c+YwODjYugjNzCpA0i/Hei3ZPQJJi4FHI2LtOJtdC8yJiMOBm4CvjDHWckmDkga3bt2aIFozs+pKebP4OGCJpIeBK4ETJF0xeoOI2BYRT2eLlwELGw0UEZdGxEBEDPT2NjyzMTOzSUqWCCLi/Ijoj4g5wBnALRHxjtHbSDp41OISajeVzcysjdpRNbQPSSuAwYi4Bjhb0hJgN/AYsKzd8ZiZNWvXrl1s2bKFp556qtOhjGn69On09/fT09PT9HtUtsdQDwwMhG8Wm1knPPTQQ+y///4ceOCBSOp0OM8SEWzbto0nnniCl7zkJfu8JmltRAw0el/bzwjMqmLNuiEuunETv9q+k0NmzuC8k+aydEFfp8OyKXjqqaeYM2dOIZMAgCQOPPBA8hbVOBGYJbBm3RDnX72BnbuGARjavpPzr94A4GRQckVNAiMmE5+fNWSWwEU3btqTBEbs3DXMRTdu6lBEZmNzIjBL4Ffbd+Zab9asG264gblz53LooYdy4YUXtmRMJwKzBA6ZOSPXerNmDA8P8973vpfrr7+ejRs3smrVKjZu3DjlcZ0IzBI476S5zOiZts+6GT3TOO+kuR2KyDphzbohjrvwFl7y4W9z3IW3sGbd0JTGu+OOOzj00EN56UtfynOe8xzOOOMMvvWtb005TicCswSWLuhj5anz6Zs5AwF9M2ew8tT5vlFcISMFA0PbdxLsLRiYSjIYGhpi1qxZe5b7+/sZGppacgFXDZkls3RBn3/xV9h4BQNF++/CZwRmZgmkKBjo6+tj8+a9D2jesmULfX1TTypOBGZmCaQoGHjVq17FAw88wEMPPcQzzzzDlVdeyZIlSyY93ggnAjOzBFIUDOy3335cfPHFnHTSSbziFa/g9NNP57DDDptqqL5HYGaWwsh9gFY/ZmTRokUsWrSoFSHu4URgZpZIWQoGfGnIzKzinAjMzCrOicDMrOKcCMzMKs43i63yPIGMVZ3PCKzSUjwPxiyld7/73bzoRS9i3rx5LRvTicAqzRPIWNksW7aMG264oaVjOhFYpXkCGUtq/Wr4zDy4YGbt+/rVUx7y+OOP54UvfOHUYxvFicAqzRPIWDLrV8O1Z8OOzUDUvl97dkuSQas5EVileQIZS+bmFbCr7sxy187a+oJx1ZBVWqrnwZixY0u+9R3kRGCVV5bnwVjJHNCfXRZqsL5gfGnIOqbV87maFcqJH4OeuntNPTNq66fgzDPP5Nhjj2XTpk309/dz+eWXT2k88BmBdchI/f5I6eZI/T7gv86tOxx+eu37zStql4MO6K8lgZH1k7Rq1aoWBLcvJwLriDLN52o2aYefPuVf/O3gS0PWEa7fNysOJwLrCNfvW1lFRKdDGNdk4nMisI5w/b6V0fTp09m2bVthk0FEsG3bNqZPn57rfb5HYB3h+n0ro/7+frZs2cLWrVs7HcqYpk+fTn9/vhJVpc5skqYBg8BQRCyue+25wFeBhcA24G0R8fB44w0MDMTg4GCiaM3MupOktREx0Oi1dpwRnAPcB7ygwWtnAf8eEYdKOgP4FPC2NsRkVjieF8E6Jek9Akn9wMnAZWNscgrwleznq4ATJSllTGZF5HkRrJNS3yz+LPAh4D/GeL0P2AwQEbuBHcCBiWMyKxzPi2CdlCwRSFoMPBoRa1sw1nJJg5IGi3yTxmyy3FdhnZTyjOA4YImkh4ErgRMkXVG3zRAwC0DSfsAB1G4a7yMiLo2IgYgY6O3tTRiyWWe4r8I6KVkiiIjzI6I/IuYAZwC3RMQ76ja7BviT7Oe3ZtsUs0DXLCH3VVgntb2PQNIKYDAirgEuB/5J0oPAY9QShlnluK/COil5H0GruY/AzCy/TvcRmLXVR9dsYNXtmxmOYJrEmUfP4pNL53c6LLPCciKwrvLRNRu44rZ/3bM8HLFn2cnArDE/dM66yqrbG0wNOM56M3MisC4zPMY9r7HWm5kTgXWZaWM8oWSs9WbmRGBd5syjZ+Vab2a+WWxdZuSGsKuGzJrnPgIzswoYr4/Al4bMzCrOl4aspd7+xZ/wo58/tmf5uJe9kK/92bEdjKhzPNGMlYXPCKxl6pMAwI9+/hhv/+JPOhRR53iiGSsTJwJrmfokMNH6buaJZqxMnAjMEvBEM1YmTgRmCXiiGSsTJwJrmeNe9sJc67uZJ5qxMnEisJb52p8d+6xf+lWtGlq6oI+Vp86nb+YMBPTNnMHKU+e7asgKyQ1lZmYV4IlprG1S1c7nGdf1+2b5OBFYy4zUzo+UTY7UzgNT+kWcZ9xUMZh1M98jsJZJVTufZ1zX75vl50RgLZOqdj7PuK7fN8vPicBaJlXtfJ5xXb9vlp8TgbVMqtr5POO6ft8sP98stpYZuRnb6oqdPOOmisGsm7mPwMysAtxHUBBFqW93Tb6ZjeZE0CZFqW93Tb6Z1fPN4jYpSn27a/LNrJ4TQZsUpb7dNflmVs+JoE2KUt/umnwzq+dE0CZFqW93Tb6Z1fPN4jYpSn27a/LNrJ77CMzMKqAjfQSSpgPfB56b/TtXRcTH67ZZBlwEDGWrLo6Iy1LFZPl9dM0GVt2+meEIpkmcefQsPrl0fku2L0qPQlHiMOuUCROBpOcCpwFzRm8fESsmeOvTwAkR8aSkHuCHkq6PiNvqtvt6RLwvX9jWDh9ds4ErbvvXPcvDEXuWG/1yz7N9UXoUihKHWSc1c7P4W8ApwG7g16O+xhU1T2aLPdlXua5DVdyq2zcnW1+UHoWixGHWSc1cGuqPiDdNZnBJ04C1wKHA/4yI2xtsdpqk44H7gb+IiGf91pC0HFgOMHv27MmEYpMwPMb9o1asL0qPQlHiMOukZs4Ifixp7IvC44iI4Yg4AugHjpI0r26Ta4E5EXE4cBPwlTHGuTQiBiJioLe3dzKh2CRMk5KtL0qPQlHiMOukMROBpA2S1gOvAe6UtEnS+lHrmxYR24FbgTfVrd8WEU9ni5cBC3NFb0mdefSsZOuL0qNQlDjMOmm8S0OLpzKwpF5gV0RslzQDeCPwqbptDo6IR7LFJcB9U/k3rbVGbvA2WwWUZ/ui9CgUJQ6zTpqwj0DSP0XEOyda1+B9h1O71DON2pnH6ohYIWkFMBgR10haSS0B7AYeA94TET8bb1z3EZiZ5TfVPoLD6gabRhOXcCJiPbCgwfqPjfr5fOD8JmIwM7NExkwEks4HPgLMkPT4yGrgGeDSNsTWdVI2LuVt/Eo1bhEmvUl1LEpr/Wq4eQXs2AIH9MOJH4PDT+90VFYgYyaCiFgJrJS0MvvL3aYgZeNS3savVOMWYdKbVMeitNavhmvPhl1ZOeyOzbVlcDKwPcarGjpS0pHAN0Z+Hv3Vxhi7QsrGpbwNXqnGLcKkN6mORWndvGJvEhixa2dtvVlmvHsEf5d9nw4MAHdTuzR0ODAIHJs2tO6SsnEpb4NXqnGLMOlNqmNRWju25FtvlTTmGUFEvD4iXg88AhyZNXQtpHYDeGis91ljKRuX8jZ4pRq3CJPepDoWpXVAf771VknNdBbPjYgNIwsRcQ/winQhdaeUjUt5G7xSjVuESW9SHYvSOvFj0FOXXHtm1NabZZopH10v6TLgimz57UCuzmJL27iUt/Er1bhFmPQm1bEorZEbwq4asnE001A2HXgPcHy26vvAJRHxVOLYGnJDmZlZflNqKMt+4X8m+7KKyVvr70lebEzuZyis8RrKVkfE6ZI20GAegeyJodbF8tb6e5IXG5P7GQptvJvF52TfFwNvafBlXS5vrb8nebExuZ+h0MbrLB55KugbgO9HxAPtCcmKIm+tvyd5sTG5n6HQmikfnQ18QdIvJH1D0vslHZE4LiuAvLX+nuTFxuR+hkKbMBFExMcj4gRqTyH9AXAeteknrcvlrfX3JC82JvczFNqEVUOSPgocBzwfWAd8kFpCsC6Xt9bfk7zYmNzPUGjN9BHcSW3imG8D3wN+Mmp6ybZzH4GZWX5T7SM4UtILqJ0VvBG4VNKjEfGaFsdZGKlq4fOOW4Tn6rsvoKC6vSa/2/cvr8THo5lLQ/OA3wdeS+0ppJvp4ktDqWrh845bhOfquy+goLq9Jr/b9y+vNhyPZqqGLgT2Bz4HvCJ7KmnX3uFJVQufd9wiPFfffQEF1e01+d2+f3m14Xg0c2loccv+tRJIVQufd9wiPFfffQEF1e01+d2+f3m14Xg0c0ZQKalq4fOOW4Tn6rsvoKC6vSa/2/cvrzYcDyeCOqlq4fOOW4Tn6rsvoKC6vSa/2/cvrzYcj2bmI6iUVLXwecctwnP13RdQUN1ek9/t+5dXG47HmH0Ekq6lwVNHR0TEkpZFkYP7CMzM8ptsH8H/SBRPZaWsyc8zdhH6E8xK4bpzYe2XIYZB02DhMlj86daMXaBeifGePvq9dgbS7VLW5OcZuwj9CWalcN25MHj53uUY3rs81WRQsF6JCW8WS3q5pKskbcyeQPoLSb9oR3DdJGVNfp6xi9CfYFYKa7+cb30eBeuVaKZq6B+BS6g9b+j1wFfZO5G9NSllTX6esYvQn2BWCjGcb30eBeuVaCYRzIiIm6ndWP5lRFwAnJw2rO6TsiY/z9hF6E8wKwVNy7c+j4L1SjSTCJ6W9FvAA5LeJ+kPqT2S2nJIWZOfZ+wi9CeYlcLCZfnW51GwXolm+gjOAZ4HnA18AjgB+JOUQXWjlDX5ecYuQn+CWSmM3BBOUTVUsF6JCecj2LNh7VHUERFPpA1pfO4jMDPLb7w+gmaqhgYkbQDWAxsk3S1pYRPvmy7pjmz7eyX99wbbPFfS1yU9KOl2SXOa2B8zM2uhZi4NfQn4rxHxAwBJr6FWSXT4BO97GjghIp6U1AP8UNL1EXHbqG3OAv49Ig6VdAbwKeBtufdiAnkbuco4GUueJrE8+1fGY5G0USdPg1HKOFKNXaAmp2Ty7GMVjgfNJYLhkSQAEBE/lLR7ojdF7ZrTk9liT/ZVfx3qFOCC7OergIslKZq9XtWEvI1cZZyMJU+TWJ79K+OxSNqok6fBKGUcqcYuWJNTEnn2sQrHI9NM1dD3JH1B0uskvVbS/wK+K+lISUeO90ZJ0yTdBTwK3BQRt9dt0kdtxjMiYjewAzgw916MI28jVxknY8nTJJZn/8p4LJI26uRpMEoZR6qxC9bklESefazC8cg0c0bwe9n3j9etX0DtL/wTxnpjRAwDR0iaCXxT0ryIuCdvkJKWA8sBZs+eneu9eRu5yjgZS54msTz7V8ZjkbRRJ0+DUco4Uo1dsCanJPLsYxWOR2bCM4JsasqxvsZMAnVjbAduBd5U99IQMAtA0n7AAcC2Bu+/NCIGImKgt7e3mX9yj7yNXGWcjCVPk1ie/SvjsUjaqJOnwShlHKnGLliTUxJ59rEKxyPTTNXQ70i6XNL12fIrJZ3VxPt6szMBJM0A3gj8rG6za9jbk/BW4JZW3h+A/I1cZZyMJU+TWJ79K+OxSNqok6fBKGUcqcYuWJNTEnn2sQrHI9PMPYIvAzcCh2TL9wMfaOJ9BwO3SloP/JTaPYLrJK2QNDKXweXAgZIeBM4FPpwj9qYsXdDHylPn0zdzBgL6Zs5g5anzx7zZmXf7Ivjk0vm845jZe84Apkm845jZDauG8uxfGY8Fh58Ob/kcHDALUO37Wz7Xmpt7iz8NA2ftPQPQtNpyo6qhlHGkGjtlzEWRZx+rcDwyEzaUSfppRLxK0rqIWJCtuysijmhHgPXcUGZmlt9kJ6YZ8WtJB5KVfko6hlp1T9cqZe28tUcZa9BTxlzGfoaifC4F0kwiOJfatfyXSfoR0Evten5XKmXtvLVHGWvQU8Zcxn6GonwuBdNM1dCdwGuBVwN/DhwWEetTB9Yppaydt/YoYw16ypjL2M9QlM+lYJqpGvojanMS3AssBb4+USNZmZWydt7ao4w16CljLmM/Q1E+l4JppmroryLiiewZQydSq/S5JG1YnVPK2nlrjzLWoKeMuYz9DEX5XAqmmUQwcp3kZOCLEfFt4DnpQuqsUtbOW3uUsQY9Zcxl7GcoyudSMM0kgiFJX6D2VNDvSHpuk+8rpVLWzlt7lLEGPWXMZexnKMrnUjDN9BE8j9qjITZExAOSDgbmR8S/tCPAeu4jMDPLb0p9BBHxG+DqUcuPAI+0LjyzLpVn7oKiKGPMRekLKEock9BMH4GZ5ZVn7oKiKGPMRekLKEock9S11/rNOirP3AVFUcaYi9IXUJQ4JsmJwCyFPHMXFEUZYy5KX0BR4pgkJwKzFPLMXVAUZYy5KH0BRYljkpwIzFLIM3dBUZQx5qL0BRQljklyIjBLIc/cBUVRxpiL0hdQlDgmacI+gqJxH4GZWX5TnY/ALI0y1l2njDlVDX8Zj7O1lROBdUYZ665Txpyqhr+Mx9nazvcIrDPKWHedMuZUNfxlPM7Wdk4E1hllrLtOGXOqGv4yHmdrOycC64wy1l2njDlVDX8Zj7O1nROBdUYZ665Txpyqhr+Mx9nazonAOqOMddcpY05Vw1/G42xt5z4CM7MKGK+PwGcEZutXw2fmwQUza9/Xr+7MuKniMJuA+wis2lLV2ecd1/X+1kE+I7BqS1Vnn3dc1/tbBzkRWLWlqrPPO67r/a2DnAis2lLV2ecd1/X+1kFOBFZtqers847ren/rICcCq7ZUdfZ5x3W9v3WQ+wjMzCqgI30EkmZJulXSRkn3SjqnwTavk7RD0l3Zl8+DzczaLGUfwW7gLyPiTkn7A2sl3RQRG+u2+0FELE4Yh7VTGSdByRNzGfevKHzsCitZIoiIR4BHsp+fkHQf0AfUJwLrFmVsisoTcxn3ryh87AqtLTeLJc0BFgC3N3j5WEl3S7pe0mHtiMcSKWNTVJ6Yy7h/ReFjV2jJHzEh6fnAPwMfiIjH616+E3hxRDwpaRGwBnh5gzGWA8sBZs+enTZgm7wyNkXlibmM+1cUPnaFlvSMQFIPtSTwtYi4uv71iHg8Ip7Mfv4O0CPpoAbbXRoRAxEx0NvbmzJkm4oyNkXlibmM+1cUPnaFlrJqSMDlwH0R0fCh6pJ+N9sOSUdl8WxLFZMlVsamqDwxl3H/isLHrtBSXho6DngnsEHSXdm6jwCzASLi88BbgfdI2g3sBM6IsjU22F4jN/3KVBmSJ+Yy7l9R+NgVmhvKzMwqYLyGMs9HUEWu597XdefC2i9DDNemiFy4bOpTRJqViBNB1biee1/XnQuDl+9djuG9y04GVhF+6FzVuJ57X2u/nG+9WRdyIqga13PvK4bzrTfrQk4EVeN67n1pWr71Zl3IiaBqXM+9r4XL8q0360JOBFXjCVD2tfjTMHDW3jMATast+0axVYj7CMzMKsB9BAmtWTfERTdu4lfbd3LIzBmcd9Jcli7o63RYrVOFnoMq7GMR+DgXlhPBFKxZN8T5V29g565ahcnQ9p2cf/UGgO5IBlXoOajCPhaBj3Oh+R7BFFx046Y9SWDEzl3DXHTjpg5F1GJV6Dmowj4WgY9zoTkRTMGvtu/Mtb50qtBzUIV9LAIf50JzIpiCQ2bOyLW+dKrQc1CFfSwCH+dCcyKYgvNOmsuMnn0bj2b0TOO8k+Z2KKIWq0LPQRX2sQh8nAvNN4unYOSGcNdWDVXhGfJV2Mci8HEuNPcRmJlVwHh9BL40ZNbt1q+Gz8yDC2bWvq9fXY6xrW18acism6Ws33dvQNfwGYFZN0tZv+/egK7hRGDWzVLW77s3oGs4EZh1s5T1++4N6BpOBGbdLGX9vnsDuoYTgVk3Szn/hOe26BruIzAzqwD3EZiZ2ZicCMzMKs6JwMys4pwIzMwqzonAzKzinAjMzCrOicDMrOKcCMzMKi5ZIpA0S9KtkjZKulfSOQ22kaTPSXpQ0npJR6aKx8zMGkt5RrAb+MuIeCVwDPBeSa+s2+bNwMuzr+XAJQnjscnwxCNmXS9ZIoiIRyLizuznJ4D7gPrJfE8Bvho1twEzJR2cKibLaWTikR2bgdg78YiTgVlXacs9AklzgAXA7XUv9QGbRy1v4dnJwjrFE4+YVULyRCDp+cA/Ax+IiMcnOcZySYOSBrdu3draAG1snnjErBKSJgJJPdSSwNci4uoGmwwBs0Yt92fr9hERl0bEQEQM9Pb2pgnWns0Tj5hVQsqqIQGXA/dFxKfH2Owa4I+z6qFjgB0R8UiqmCwnTzxiVgn7JRz7OOCdwAZJd2XrPgLMBoiIzwPfARYBDwK/Ad6VMB7La2SCkZtX1C4HHdBfSwKeeMSsq3hiGjOzCvDENGZmNiYnAjOzinMiMDOrOCcCM7OKcyIwM6u40lUNSdoK/LLTcTRwEPBvnQ4ioW7fP+j+ffT+ld9U9vHFEdGwI7d0iaCoJA2OVZrVDbp9/6D799H7V36p9tGXhszMKs6JwMys4pwIWufSTgeQWLfvH3T/Pnr/yi/JPvoegZlZxfmMwMys4pwIcpI0TdI6Sdc1eG2ZpK2S7sq+/rQTMU6FpIclbcjif9bT/bJHhn9O0oOS1ks6shNxTkUT+/g6STtGfY6leu62pJmSrpL0M0n3STq27vVSf4ZN7F/ZP7+5o2K/S9Ljkj5Qt01LP8OUj6HuVudQm3/5BWO8/vWIeF8b40nh9RExVq3ym4GXZ19HA5dk38tmvH0E+EFELG5bNK3198ANEfFWSc8Bnlf3etk/w4n2D0r8+UXEJuAIqP3hSW2yrm/WbdbSz9BnBDlI6gdOBi7rdCwddArw1ai5DZgp6eBOB2U1kg4Ajqc2KRQR8UxEbK/brLSfYZP7101OBH4eEfVNtC39DJ0I8vks8CHgP8bZ5rTsVO0qSbPG2a6oAvgXSWslLW/weh+wedTylmxdmUy0jwDHSrpb0vWSDmtncFP0EmAr8I/ZJczLJP123TZl/gyb2T8o7+dX7wxgVYP1Lf0MnQiaJGkx8GhErB1ns2uBORFxOHAT8JW2BNdar4mII6mder5X0vGdDiiBifbxTmrt+L8H/AOwps3xTcV+wJHAJRGxAPg18OHOhtRSzexfmT+/PbLLXkuAb6T+t5wImnccsETSw8CVwAmSrhi9QURsi4ins8XLgIXtDXHqImIo+/4oteuSR9VtMgSMPtPpz9aVxkT7GBGPR8ST2c/fAXokHdT2QCdnC7AlIm7Plq+i9otztDJ/hhPuX8k/v9HeDNwZEf+vwWst/QydCJoUEedHRH9EzKF2unZLRLxj9DZ11+iWULupXBqSflvS/iM/A38A3FO32TXAH2dVC8cAOyLikTaHOmnN7KOk35Wk7OejqP1/sq3dsU5GRPxfYLOkudmqE4GNdZuV9jNsZv/K/PnVOZPGl4WgxZ+hq4amSNIKYDAirgHOlrQE2A08BizrZGyT8DvAN7P/h/YD/ndE3CDpvwBExOeB7wCLgAeB3wDv6lCsk9XMPr4VeI+k3cBO4IwoV+fl+4GvZZcWfgG8q8s+w4n2r+yf38gfKW8E/nzUumSfoTuLzcwqzpeGzMwqzonAzKzinAjMzCrOicDMrOKcCMzMKs6JwCyn7OmWjZ4+23B9C/69pZJeOWr5u5K6em5eay8nArPiWwq8cqKNzCbLicC6TtY9/O3soWP3SHpbtn6hpO9lD5u7caQTPPsL+++zZ7/fk3WjIukoST/JHm7241HdrM3G8CVJd2TvPyVbv0zS1ZJukPSApL8d9Z6zJN2fveeLki6W9GpqXeoXZfG9LNv8j7Lt7pf0+y06dFZR7iy2bvQm4FcRcTLUHl0sqYfaA8hOiYitWXL4a+Dd2XueFxFHZA+g+xIwD/gZ8PsRsVvSG4C/AU5rMob/Ru0xJO+WNBO4Q9L/yV47AlgAPA1skvQPwDDwV9Sem/MEcAtwd0T8WNI1wHURcVW2PwD7RcRRkhYBHwfekP8wmdU4EVg32gD8naRPUfsF+gNJ86j9cr8p+0U6DRj9bJZVABHxfUkvyH557w98RdLLqT26uidHDH9A7SGFH8yWpwOzs59vjogdAJI2Ai8GDgK+FxGPZeu/Afzncca/Ovu+FpiTIy6zZ3EisK4TEferNnXfIuCTkm6m9pTReyPi2LHe1mD5E8CtEfGHkuYA380RhoDTstmm9q6UjqZ2JjBimMn9fzgyxmTfb7aH7xFY15F0CPCbiLgCuIja5ZZNQK+y+W0l9WjfCUtG7iO8htqTHHcAB7D30b7LcoZxI/D+UU/BXDDB9j8FXivpP0naj30vQT1B7ezELAknAutG86ldk7+L2vXzT0bEM9SeSvkpSXcDdwGvHvWepyStAz4PnJWt+1tgZbY+71/dn6B2KWm9pHuz5TFlcyT8DXAH8CPgYWBH9vKVwHnZTeeXNR7BbPL89FGrPEnfBT4YEYMdjuP5EfFkdkbwTeBLEVE/ablZy/mMwKw4LsjOYu4BHqKkUyxa+fiMwMys4nxGYGZWcU4EZmYV50RgZlZxTgRmZhXnRGBmVnFOBGZmFff/ATbjY/3/keu8AAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
@@ -340,18 +340,20 @@
},
{
"cell_type": "code",
- "execution_count": 9,
+ "execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
+ "# 选取label为0和1的实例\n",
"data = np.array(df.iloc[:100, [0, 1, -1]])\n",
"X, y = data[:,:-1], data[:,-1]\n",
+ "# 使用交叉验证,训练集:测试集=8:2\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)"
]
},
{
"cell_type": "code",
- "execution_count": 10,
+ "execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
@@ -370,19 +372,26 @@
" # 取出n个点\n",
" knn_list = []\n",
" for i in range(self.n):\n",
+ " # 计算p范数距离\n",
" dist = np.linalg.norm(X - self.X_train[i], ord=self.p)\n",
+ " # 存入距离和实例类别\n",
" knn_list.append((dist, self.y_train[i]))\n",
- "\n",
+ " \n",
+ " # 遍历所有训练实例点\n",
" for i in range(self.n, len(self.X_train)):\n",
+ " # 找到n个点中距离最远的一个点\n",
" max_index = knn_list.index(max(knn_list, key=lambda x: x[0]))\n",
+ " # 计算p范数距离\n",
" dist = np.linalg.norm(X - self.X_train[i], ord=self.p)\n",
+ " # 如果knn_list最远的点比这个点还要远,则替换成这个点\n",
" if knn_list[max_index][0] > dist:\n",
" knn_list[max_index] = (dist, self.y_train[i])\n",
"\n",
- " # 统计\n",
+ " # 统计n个点的类别\n",
" knn = [k[-1] for k in knn_list]\n",
" count_pairs = Counter(knn)\n",
- "# max_count = sorted(count_pairs, key=lambda x: x)[-1]\n",
+ "# max_count = sorted(count_pairs, key=lambda x: x)[-1]\n",
+ " # 选出最多的类别\n",
" max_count = sorted(count_pairs.items(), key=lambda x: x[1])[-1][0]\n",
" return max_count\n",
"\n",
@@ -393,12 +402,13 @@
" label = self.predict(X)\n",
" if label == y:\n",
" right_count += 1\n",
+ " # 返回预测label正确的比例\n",
" return right_count / len(X_test)"
]
},
{
"cell_type": "code",
- "execution_count": 11,
+ "execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
@@ -407,7 +417,7 @@
},
{
"cell_type": "code",
- "execution_count": 12,
+ "execution_count": 20,
"metadata": {},
"outputs": [
{
@@ -416,7 +426,7 @@
"1.0"
]
},
- "execution_count": 12,
+ "execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
@@ -427,7 +437,7 @@
},
{
"cell_type": "code",
- "execution_count": 13,
+ "execution_count": 21,
"metadata": {},
"outputs": [
{
@@ -445,22 +455,22 @@
},
{
"cell_type": "code",
- "execution_count": 14,
+ "execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- ""
+ ""
]
},
- "execution_count": 14,
+ "execution_count": 23,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
- "image/png": "\n",
+ "image/png": "\n",
"text/plain": [
""
]
@@ -474,7 +484,7 @@
"source": [
"plt.scatter(df[:50]['sepal length'], df[:50]['sepal width'], label='0')\n",
"plt.scatter(df[50:100]['sepal length'], df[50:100]['sepal width'], label='1')\n",
- "plt.plot(test_point[0], test_point[1], 'bo', label='test_point')\n",
+ "plt.scatter(test_point[0], test_point[1], label='test_point')\n",
"plt.xlabel('sepal length')\n",
"plt.ylabel('sepal width')\n",
"plt.legend()"
@@ -489,7 +499,7 @@
},
{
"cell_type": "code",
- "execution_count": 15,
+ "execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
@@ -498,7 +508,7 @@
},
{
"cell_type": "code",
- "execution_count": 16,
+ "execution_count": 26,
"metadata": {},
"outputs": [
{
@@ -507,7 +517,7 @@
"KNeighborsClassifier()"
]
},
- "execution_count": 16,
+ "execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
@@ -519,7 +529,7 @@
},
{
"cell_type": "code",
- "execution_count": 17,
+ "execution_count": 27,
"metadata": {},
"outputs": [
{
@@ -528,7 +538,7 @@
"1.0"
]
},
- "execution_count": 17,
+ "execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
@@ -545,10 +555,20 @@
"source": [
"### sklearn.neighbors.KNeighborsClassifier\n",
"\n",
- "- n_neighbors: 临近点个数\n",
- "- p: 距离度量\n",
- "- algorithm: 近邻算法,可选{'auto', 'ball_tree', 'kd_tree', 'brute'}\n",
- "- weights: 确定近邻的权重"
+ "- n_neighbors: 临近点个数, default=5\n",
+ "- p: 距离度量, default=2\n",
+ "- algorithm: 近邻算法,可选{'auto', 'ball_tree', 'kd_tree', 'brute'}, default=’auto’\n",
+ "- weights: 确定近邻的权重, default=’uniform’\n",
+ "\n",
+ " Weight function used in prediction. Possible values:\n",
+ " \n",
+ " ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.\n",
+ " \n",
+ " ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.\n",
+ "\n",
+ "- n_jobsint, default=None \n",
+ "\n",
+ " The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details. Doesn’t affect fit method."
]
},
{
@@ -605,7 +625,7 @@
},
{
"cell_type": "code",
- "execution_count": 18,
+ "execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
@@ -627,7 +647,7 @@
" return None\n",
" # key参数的值为一个函数,此函数只有一个参数且返回一个值用来进行比较\n",
" # operator模块提供的itemgetter函数用于获取对象的哪些维的数据,参数为需要获取的数据在对象中的序号\n",
- " #data_set.sort(key=itemgetter(split)) # 按要进行分割的那一维数据排序\n",
+ " # data_set.sort(key=itemgetter(split)) # 按要进行分割的那一维数据排序\n",
" data_set.sort(key=lambda x: x[split])\n",
" split_pos = len(data_set) // 2 # //为Python中的整数除法\n",
" median = data_set[split_pos] # 中位数分割点\n",
@@ -654,7 +674,7 @@
},
{
"cell_type": "code",
- "execution_count": 19,
+ "execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
@@ -732,7 +752,7 @@
},
{
"cell_type": "code",
- "execution_count": 20,
+ "execution_count": 30,
"metadata": {},
"outputs": [
{
@@ -756,11 +776,11 @@
},
{
"cell_type": "code",
- "execution_count": 21,
+ "execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
- "from time import clock\n",
+ "import time\n",
"from random import random\n",
"\n",
"# 产生一个k维随机向量,每维分量值在0~1之间\n",
@@ -774,7 +794,7 @@
},
{
"cell_type": "code",
- "execution_count": 22,
+ "execution_count": 37,
"metadata": {},
"outputs": [
{
@@ -792,40 +812,24 @@
},
{
"cell_type": "code",
- "execution_count": 23,
+ "execution_count": 40,
"metadata": {},
"outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:2: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead\n",
- " \n"
- ]
- },
{
"name": "stdout",
"output_type": "stream",
"text": [
- "time: 5.170202400000001 s\n",
- "Result_tuple(nearest_point=[0.09916902877403755, 0.5005978535517558, 0.7997848590100571], nearest_dist=0.0010460533893058112, nodes_visited=38)\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:5: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead\n",
- " \"\"\"\n"
+ "time: 5.403439998626709 s\n",
+ "Result_tuple(nearest_point=[0.09910258486020529, 0.5042390385455003, 0.8050068418001802], nearest_dist=0.006621424811579601, nodes_visited=86)\n"
]
}
],
"source": [
"N = 400000\n",
- "t0 = clock()\n",
+ "t0 = time.time()\n",
"kd2 = KdTree(random_points(3, N)) # 构建包含四十万个3维空间样本点的kd树\n",
"ret2 = find_nearest(kd2, [0.1,0.5,0.8]) # 四十万个样本点中寻找离目标最近的点\n",
- "t1 = clock()\n",
+ "t1 = time.time()\n",
"print (\"time: \",t1-t0, \"s\")\n",
"print (ret2)"
]
@@ -844,7 +848,7 @@
},
{
"cell_type": "code",
- "execution_count": 24,
+ "execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
@@ -869,12 +873,12 @@
},
{
"cell_type": "code",
- "execution_count": 25,
+ "execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
- "image/png": "\n",
+ "image/png": "\n",
"text/plain": [
""
]
@@ -922,7 +926,7 @@
},
{
"cell_type": "code",
- "execution_count": 26,
+ "execution_count": 43,
"metadata": {},
"outputs": [
{
@@ -946,6 +950,21 @@
"print(\"x点的最近邻点是({0}, {1})\".format(x1, x2))"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## sklearn.neighbors.KDTree\n",
+ "- leaf_size positive int, default=40\n",
+ "\n",
+ "Number of points at which to switch to brute-force. Changing leaf_size will not affect the results of a query, but can significantly impact the speed of a query and the memory required to store the constructed tree. The amount of memory needed to store the tree scales as approximately n_samples / leaf_size. For a specified leaf_size, a leaf node is guaranteed to satisfy leaf_size <= n_points <= 2 * leaf_size, except in the case that n_samples < leaf_size.\n",
+ "\n",
+ "- metric str or DistanceMetric64 object, default=’minkowski’\n",
+ "\n",
+ "Metric to use for distance computation. Default is “minkowski”, which results in the standard Euclidean distance when p = 2. A list of valid metrics for KDTree is given by KDTree.valid_metrics. See the documentation of scipy.spatial.distance and the metrics listed in distance_metrics for more information on any distance metric.\n",
+ "\n"
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {},
@@ -972,7 +991,7 @@
},
{
"cell_type": "code",
- "execution_count": 27,
+ "execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
@@ -1038,7 +1057,7 @@
},
{
"cell_type": "code",
- "execution_count": 28,
+ "execution_count": 50,
"metadata": {},
"outputs": [
{
@@ -1073,12 +1092,9 @@
"\n",
"习题解答:https://github.com/datawhalechina/statistical-learning-method-solutions-manual\n",
"\n",
- "中文注释制作:机器学习初学者公众号:ID:ai-start-com\n",
- "\n",
- "配置环境:python 3.5+\n",
+ "配置环境:python 3.8+\n",
"\n",
- "代码全部测试通过。\n",
- "![gongzhong](../gongzhong.jpg)"
+ "代码全部测试通过。"
]
},
{
@@ -1105,7 +1121,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.7.6"
+ "version": "3.8.3"
}
},
"nbformat": 4,
diff --git "a/\347\254\25404\347\253\240 \346\234\264\347\264\240\350\264\235\345\217\266\346\226\257/.ipynb_checkpoints/4.NaiveBayes-checkpoint.ipynb" "b/\347\254\25404\347\253\240 \346\234\264\347\264\240\350\264\235\345\217\266\346\226\257/.ipynb_checkpoints/4.NaiveBayes-checkpoint.ipynb"
new file mode 100644
index 0000000..876bad9
--- /dev/null
+++ "b/\347\254\25404\347\253\240 \346\234\264\347\264\240\350\264\235\345\217\266\346\226\257/.ipynb_checkpoints/4.NaiveBayes-checkpoint.ipynb"
@@ -0,0 +1,594 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 第4章 朴素贝叶斯"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "1.朴素贝叶斯法是典型的生成学习方法。生成方法由训练数据学习联合概率分布\n",
+ "$P(X,Y)$,然后求得后验概率分布$P(Y|X)$。具体来说,利用训练数据学习$P(X|Y)$和$P(Y)$的估计,得到联合概率分布:\n",
+ "\n",
+ "$$P(X,Y)=P(Y)P(X|Y)$$\n",
+ "\n",
+ "概率估计方法可以是极大似然估计或贝叶斯估计。\n",
+ "\n",
+ "2.朴素贝叶斯法的基本假设是条件独立性,\n",
+ "\n",
+ "$$\\begin{aligned} P(X&=x | Y=c_{k} )=P\\left(X^{(1)}=x^{(1)}, \\cdots, X^{(n)}=x^{(n)} | Y=c_{k}\\right) \\\\ &=\\prod_{j=1}^{n} P\\left(X^{(j)}=x^{(j)} | Y=c_{k}\\right) \\end{aligned}$$\n",
+ "\n",
+ "\n",
+ "这是一个较强的假设。由于这一假设,模型包含的条件概率的数量大为减少,朴素贝叶斯法的学习与预测大为简化。因而朴素贝叶斯法高效,且易于实现。其缺点是分类的性能不一定很高。\n",
+ "\n",
+ "3.朴素贝叶斯法利用贝叶斯定理与学到的联合概率模型进行分类预测。\n",
+ "\n",
+ "$$P(Y | X)=\\frac{P(X, Y)}{P(X)}=\\frac{P(Y) P(X | Y)}{\\sum_{Y} P(Y) P(X | Y)}$$\n",
+ " \n",
+ "将输入$x$分到后验概率最大的类$y$。\n",
+ "\n",
+ "$$y=\\arg \\max _{c_{k}} P\\left(Y=c_{k}\\right) \\prod_{j=1}^{n} P\\left(X_{j}=x^{(j)} | Y=c_{k}\\right)$$\n",
+ "\n",
+ "后验概率最大等价于0-1损失函数时的期望风险最小化。\n",
+ "\n",
+ "\n",
+ "模型:\n",
+ "\n",
+ "- 高斯模型\n",
+ "- 多项式模型\n",
+ "- 伯努利模型"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline\n",
+ "\n",
+ "from sklearn.datasets import load_iris\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "\n",
+ "from collections import Counter\n",
+ "import math"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# data\n",
+ "def create_data():\n",
+ " iris = load_iris()\n",
+ " df = pd.DataFrame(iris.data, columns=iris.feature_names)\n",
+ " df['label'] = iris.target\n",
+ " df.columns = [\n",
+ " 'sepal length', 'sepal width', 'petal length', 'petal width', 'label'\n",
+ " ]\n",
+ " data = np.array(df.iloc[:100, :])\n",
+ " print(data)\n",
+ " return data[:, :-1], data[:, -1]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[5.1 3.5 1.4 0.2 0. ]\n",
+ " [4.9 3. 1.4 0.2 0. ]\n",
+ " [4.7 3.2 1.3 0.2 0. ]\n",
+ " [4.6 3.1 1.5 0.2 0. ]\n",
+ " [5. 3.6 1.4 0.2 0. ]\n",
+ " [5.4 3.9 1.7 0.4 0. ]\n",
+ " [4.6 3.4 1.4 0.3 0. ]\n",
+ " [5. 3.4 1.5 0.2 0. ]\n",
+ " [4.4 2.9 1.4 0.2 0. ]\n",
+ " [4.9 3.1 1.5 0.1 0. ]\n",
+ " [5.4 3.7 1.5 0.2 0. ]\n",
+ " [4.8 3.4 1.6 0.2 0. ]\n",
+ " [4.8 3. 1.4 0.1 0. ]\n",
+ " [4.3 3. 1.1 0.1 0. ]\n",
+ " [5.8 4. 1.2 0.2 0. ]\n",
+ " [5.7 4.4 1.5 0.4 0. ]\n",
+ " [5.4 3.9 1.3 0.4 0. ]\n",
+ " [5.1 3.5 1.4 0.3 0. ]\n",
+ " [5.7 3.8 1.7 0.3 0. ]\n",
+ " [5.1 3.8 1.5 0.3 0. ]\n",
+ " [5.4 3.4 1.7 0.2 0. ]\n",
+ " [5.1 3.7 1.5 0.4 0. ]\n",
+ " [4.6 3.6 1. 0.2 0. ]\n",
+ " [5.1 3.3 1.7 0.5 0. ]\n",
+ " [4.8 3.4 1.9 0.2 0. ]\n",
+ " [5. 3. 1.6 0.2 0. ]\n",
+ " [5. 3.4 1.6 0.4 0. ]\n",
+ " [5.2 3.5 1.5 0.2 0. ]\n",
+ " [5.2 3.4 1.4 0.2 0. ]\n",
+ " [4.7 3.2 1.6 0.2 0. ]\n",
+ " [4.8 3.1 1.6 0.2 0. ]\n",
+ " [5.4 3.4 1.5 0.4 0. ]\n",
+ " [5.2 4.1 1.5 0.1 0. ]\n",
+ " [5.5 4.2 1.4 0.2 0. ]\n",
+ " [4.9 3.1 1.5 0.2 0. ]\n",
+ " [5. 3.2 1.2 0.2 0. ]\n",
+ " [5.5 3.5 1.3 0.2 0. ]\n",
+ " [4.9 3.6 1.4 0.1 0. ]\n",
+ " [4.4 3. 1.3 0.2 0. ]\n",
+ " [5.1 3.4 1.5 0.2 0. ]\n",
+ " [5. 3.5 1.3 0.3 0. ]\n",
+ " [4.5 2.3 1.3 0.3 0. ]\n",
+ " [4.4 3.2 1.3 0.2 0. ]\n",
+ " [5. 3.5 1.6 0.6 0. ]\n",
+ " [5.1 3.8 1.9 0.4 0. ]\n",
+ " [4.8 3. 1.4 0.3 0. ]\n",
+ " [5.1 3.8 1.6 0.2 0. ]\n",
+ " [4.6 3.2 1.4 0.2 0. ]\n",
+ " [5.3 3.7 1.5 0.2 0. ]\n",
+ " [5. 3.3 1.4 0.2 0. ]\n",
+ " [7. 3.2 4.7 1.4 1. ]\n",
+ " [6.4 3.2 4.5 1.5 1. ]\n",
+ " [6.9 3.1 4.9 1.5 1. ]\n",
+ " [5.5 2.3 4. 1.3 1. ]\n",
+ " [6.5 2.8 4.6 1.5 1. ]\n",
+ " [5.7 2.8 4.5 1.3 1. ]\n",
+ " [6.3 3.3 4.7 1.6 1. ]\n",
+ " [4.9 2.4 3.3 1. 1. ]\n",
+ " [6.6 2.9 4.6 1.3 1. ]\n",
+ " [5.2 2.7 3.9 1.4 1. ]\n",
+ " [5. 2. 3.5 1. 1. ]\n",
+ " [5.9 3. 4.2 1.5 1. ]\n",
+ " [6. 2.2 4. 1. 1. ]\n",
+ " [6.1 2.9 4.7 1.4 1. ]\n",
+ " [5.6 2.9 3.6 1.3 1. ]\n",
+ " [6.7 3.1 4.4 1.4 1. ]\n",
+ " [5.6 3. 4.5 1.5 1. ]\n",
+ " [5.8 2.7 4.1 1. 1. ]\n",
+ " [6.2 2.2 4.5 1.5 1. ]\n",
+ " [5.6 2.5 3.9 1.1 1. ]\n",
+ " [5.9 3.2 4.8 1.8 1. ]\n",
+ " [6.1 2.8 4. 1.3 1. ]\n",
+ " [6.3 2.5 4.9 1.5 1. ]\n",
+ " [6.1 2.8 4.7 1.2 1. ]\n",
+ " [6.4 2.9 4.3 1.3 1. ]\n",
+ " [6.6 3. 4.4 1.4 1. ]\n",
+ " [6.8 2.8 4.8 1.4 1. ]\n",
+ " [6.7 3. 5. 1.7 1. ]\n",
+ " [6. 2.9 4.5 1.5 1. ]\n",
+ " [5.7 2.6 3.5 1. 1. ]\n",
+ " [5.5 2.4 3.8 1.1 1. ]\n",
+ " [5.5 2.4 3.7 1. 1. ]\n",
+ " [5.8 2.7 3.9 1.2 1. ]\n",
+ " [6. 2.7 5.1 1.6 1. ]\n",
+ " [5.4 3. 4.5 1.5 1. ]\n",
+ " [6. 3.4 4.5 1.6 1. ]\n",
+ " [6.7 3.1 4.7 1.5 1. ]\n",
+ " [6.3 2.3 4.4 1.3 1. ]\n",
+ " [5.6 3. 4.1 1.3 1. ]\n",
+ " [5.5 2.5 4. 1.3 1. ]\n",
+ " [5.5 2.6 4.4 1.2 1. ]\n",
+ " [6.1 3. 4.6 1.4 1. ]\n",
+ " [5.8 2.6 4. 1.2 1. ]\n",
+ " [5. 2.3 3.3 1. 1. ]\n",
+ " [5.6 2.7 4.2 1.3 1. ]\n",
+ " [5.7 3. 4.2 1.2 1. ]\n",
+ " [5.7 2.9 4.2 1.3 1. ]\n",
+ " [6.2 2.9 4.3 1.3 1. ]\n",
+ " [5.1 2.5 3. 1.1 1. ]\n",
+ " [5.7 2.8 4.1 1.3 1. ]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "X, y = create_data()\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([5. , 3.4, 1.5, 0.2]), 0.0)"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "X_test[0], y_test[0]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "参考:https://machinelearningmastery.com/naive-bayes-classifier-scratch-python/\n",
+ "http://www.zhangzhenhu.com/probability_model/%E8%B4%9D%E5%8F%B6%E6%96%AF%E5%88%86%E7%B1%BB%E5%99%A8.html#id5\n",
+ "\n",
+ "## GaussianNB 高斯朴素贝叶斯\n",
+ "朴素贝叶斯模型中,假设所有所有特征变量都是离散变量,然而实际应用场景是复杂多变的,很多时候是无法满足这样的强假设的。如果特征变量$X$是连续值怎么办?我们可以通过离散化的手段把连续值转换成离散变量后,再应用朴素贝叶斯模型进行处理。但这样做显然不够优雅,事实上,贝叶斯分类器并没有约束变量服从何种概率分布,理论上任何概率分布都是支持的,无论是标签变量Y ,还是特征变量$X$都可以是任意类型的分布。假设特征变量$X$是高斯分布,标签变量$Y$仍然是离散(类别)变量的分类模型, 这类模型有个名字,称之为高斯判别分析(Gaussian Discriminant Analysis,GDA),或者高斯朴素贝叶斯(GaussianNB)\n",
+ "\n",
+ "特征变量$x$中每个特征都为高斯概率密度函数(完全条件独立):\n",
+ "$$P(x_i | y_k)=\\frac{1}{\\sqrt{2\\pi\\sigma^2_{yk}}}exp(-\\frac{(x_i-\\mu_{yk})^2}{2\\sigma^2_{yk}})$$\n",
+ "\n",
+ "数学期望(mean):$\\mu$\n",
+ "\n",
+ "方差:$\\sigma^2=\\frac{\\sum(X-\\mu)^2}{N}$\n",
+ "\n",
+ "类似于朴素贝叶斯模型的参数估计过程,可以把观测样本集按照Y的值划分成两份,用每一份数据分别去估计上述两个高斯分布的参数即可。\n",
+ ">将输入$x$分到后验概率最大的类$y$。\n",
+ "$$y=\\arg \\max _{yk} P\\left(Y=yk\\right) \\prod_{i=1}^{n} \\frac{1}{\\sqrt{2\\pi\\sigma^2_{yk}}}exp(-\\frac{(x_i-\\mu_{yk})^2}{2\\sigma^2_{yk}})$$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "class NaiveBayes:\n",
+ " def __init__(self):\n",
+ " self.model = None\n",
+ "\n",
+ " # 数学期望\n",
+ " @staticmethod\n",
+ " def mean(X):\n",
+ " return sum(X) / float(len(X))\n",
+ "\n",
+ " # 标准差(方差)\n",
+ " def stdev(self, X):\n",
+ " avg = self.mean(X)\n",
+ " return math.sqrt(sum([pow(x - avg, 2) for x in X]) / float(len(X)))\n",
+ "\n",
+ " # 高斯概率密度函数\n",
+ " def gaussian_probability(self, x, mean, stdev):\n",
+ " exponent = math.exp(-(math.pow(x - mean, 2) /\n",
+ " (2 * math.pow(stdev, 2))))\n",
+ " return (1 / (math.sqrt(2 * math.pi) * stdev)) * exponent\n",
+ "\n",
+ " # 处理X_train\n",
+ " def summarize(self, train_data):\n",
+ " summaries = [(self.mean(i), self.stdev(i)) for i in zip(*train_data)]\n",
+ " return summaries\n",
+ "\n",
+ " # 分类别求出数学期望和标准差\n",
+ " def fit(self, X, y):\n",
+ " # 所有labels类型\n",
+ " labels = list(set(y))\n",
+ " # 创建label与[特征期望,特征标准差]的字典映射\n",
+ " data = {label: [] for label in labels}\n",
+ " for f, label in zip(X, y):\n",
+ " data[label].append(f)\n",
+ " # value为X的多(n)维特征,生成n个[特征期望,特征标准差]\n",
+ " self.model = {\n",
+ " label: self.summarize(value)\n",
+ " for label, value in data.items()\n",
+ " }\n",
+ " print(self.model)\n",
+ " return 'gaussianNB train done!'\n",
+ "\n",
+ " # 计算概率\n",
+ " def calculate_probabilities(self, input_data):\n",
+ " # summaries:{0.0: [(5.0, 0.37),(3.42, 0.40)], 1.0: [(5.8, 0.449),(2.7, 0.27)]}\n",
+ " # input_data:[1.1, 2.2]\n",
+ " probabilities = {}\n",
+ " for label, value in self.model.items():\n",
+ " probabilities[label] = 1\n",
+ " for i in range(len(value)):\n",
+ " mean, stdev = value[i]\n",
+ " probabilities[label] *= self.gaussian_probability(\n",
+ " input_data[i], mean, stdev)\n",
+ " return probabilities\n",
+ "\n",
+ " # 类别\n",
+ " def predict(self, X_test):\n",
+ " # {0.0: 2.9680340789325763e-27, 1.0: 3.5749783019849535e-26}\n",
+ " label = sorted(\n",
+ " self.calculate_probabilities(X_test).items(),\n",
+ " key=lambda x: x[-1])[-1][0]\n",
+ " return label\n",
+ "\n",
+ " def score(self, X_test, y_test):\n",
+ " right = 0\n",
+ " for X, y in zip(X_test, y_test):\n",
+ " label = self.predict(X)\n",
+ " if label == y:\n",
+ " right += 1\n",
+ "\n",
+ " return right / float(len(X_test))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "model = NaiveBayes()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "{0.0: [(5.042424242424242, 0.3709368181357421), (3.4818181818181815, 0.3545843024560018), (1.4484848484848483, 0.17077579155885297), (0.2424242424242425, 0.08886592908251627)], 1.0: [(5.932432432432432, 0.4644404096149176), (2.7540540540540546, 0.31928517677021306), (4.232432432432433, 0.4702236696389048), (1.3135135135135134, 0.19194898275205088)]}\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "'gaussianNB train done!'"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "model.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "0.0\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(model.predict([4.4, 3.2, 1.3, 0.2]))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "1.0"
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "model.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "### scikit-learn实例"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.naive_bayes import GaussianNB"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "GaussianNB()"
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "clf = GaussianNB()\n",
+ "clf.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "1.0"
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "clf.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([0.])"
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "clf.predict([[4.4, 3.2, 1.3, 0.2]])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.naive_bayes import BernoulliNB, MultinomialNB # 伯努利模型和多项式模型"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 第4章朴素贝叶斯法-习题\n",
+ "\n",
+ "### 习题4.1\n",
+ " 用极大似然估计法推出朴素贝叶斯法中的概率估计公式(4.8)及公式 (4.9)。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**解答:** \n",
+ "**第1步:**证明公式(4.8):$\\displaystyle P(Y=c_k) = \\frac{\\displaystyle \\sum_{i=1}^N I(y_i=c_k)}{N}$ \n",
+ "由于朴素贝叶斯法假设$Y$是定义在输出空间$\\mathcal{Y}$上的随机变量,因此可以定义$P(Y=c_k)$概率为$p$。 \n",
+ "令$\\displaystyle m=\\sum_{i=1}^NI(y_i=c_k)$,得出似然函数:$$L(p)=f_D(y_1,y_2,\\cdots,y_n|\\theta)=\\binom{N}{m}p^m(1-p)^{(N-m)}$$使用微分求极值,两边同时对$p$求微分:$$\\begin{aligned}\n",
+ "0 &= \\binom{N}{m}\\left[mp^{(m-1)}(1-p)^{(N-m)}-(N-m)p^m(1-p)^{(N-m-1)}\\right] \\\\\n",
+ "& = \\binom{N}{m}\\left[p^{(m-1)}(1-p)^{(N-m-1)}(m-Np)\\right]\n",
+ "\\end{aligned}$$可求解得到$\\displaystyle p=0,p=1,p=\\frac{m}{N}$ \n",
+ "显然$\\displaystyle P(Y=c_k)=p=\\frac{m}{N}=\\frac{\\displaystyle \\sum_{i=1}^N I(y_i=c_k)}{N}$,公式(4.8)得证。\n",
+ "\n",
+ "----\n",
+ "\n",
+ "**第2步:**证明公式(4.9):$\\displaystyle P(X^{(j)}=a_{jl}|Y=c_k) = \\frac{\\displaystyle \\sum_{i=1}^N I(x_i^{(j)}=a_{jl},y_i=c_k)}{\\displaystyle \\sum_{i=1}^N I(y_i=c_k)}$ \n",
+ "令$P(X^{(j)}=a_{jl}|Y=c_k)=p$,令$\\displaystyle m=\\sum_{i=1}^N I(y_i=c_k), q=\\sum_{i=1}^N I(x_i^{(j)}=a_{jl},y_i=c_k)$,得出似然函数:$$L(p)=\\binom{m}{q}p^q(i-p)^{m-q}$$使用微分求极值,两边同时对$p$求微分:$$\\begin{aligned}\n",
+ "0 &= \\binom{m}{q}\\left[qp^{(q-1)}(1-p)^{(m-q)}-(m-q)p^q(1-p)^{(m-q-1)}\\right] \\\\\n",
+ "& = \\binom{m}{q}\\left[p^{(q-1)}(1-p)^{(m-q-1)}(q-mp)\\right]\n",
+ "\\end{aligned}$$可求解得到$\\displaystyle p=0,p=1,p=\\frac{q}{m}$ \n",
+ "显然$\\displaystyle P(X^{(j)}=a_{jl}|Y=c_k)=p=\\frac{q}{m}=\\frac{\\displaystyle \\sum_{i=1}^N I(x_i^{(j)}=a_{jl},y_i=c_k)}{\\displaystyle \\sum_{i=1}^N I(y_i=c_k)}$,公式(4.9)得证。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 习题4.2\n",
+ " 用贝叶斯估计法推出朴素贝叶斯法中的概率估计公式(4.10)及公式(4.11)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**解答:** \n",
+ "**第1步:**证明公式(4.11):$\\displaystyle P(Y=c_k) = \\frac{\\displaystyle \\sum_{i=1}^N I(y_i=c_k) + \\lambda}{N+K \\lambda}$ \n",
+ "加入先验概率,在没有任何信息的情况下,可以假设先验概率为均匀概率(即每个事件的概率是相同的)。 \n",
+ "可得$\\displaystyle p=\\frac{1}{K} \\Leftrightarrow pK-1=0\\quad(1)$ \n",
+ "根据习题4.1得出先验概率的极大似然估计是$\\displaystyle pN - \\sum_{i=1}^N I(y_i=c_k) = 0\\quad(2)$ \n",
+ "存在参数$\\lambda$使得$(1) \\cdot \\lambda + (2) = 0$ \n",
+ "所以有$$\\lambda(pK-1) + pN - \\sum_{i=1}^N I(y_i=c_k) = 0$$可得$\\displaystyle P(Y=c_k) = \\frac{\\displaystyle \\sum_{i=1}^N I(y_i=c_k) + \\lambda}{N+K \\lambda}$,公式(4.11)得证。 \n",
+ "\n",
+ "----\n",
+ "\n",
+ "**第2步:**证明公式(4.10):$\\displaystyle P_{\\lambda}(X^{(j)}=a_{jl} | Y = c_k) = \\frac{\\displaystyle \\sum_{i=1}^N I(x_i^{(j)}=a_{jl},y_i=c_k) + \\lambda}{\\displaystyle \\sum_{i=1}^N I(y_i=c_k) + S_j \\lambda}$ \n",
+ "根据第1步,可同理得到$$\n",
+ "P(Y=c_k, x^{(j)}=a_{j l})=\\frac{\\displaystyle \\sum_{i=1}^N I(y_i=c_k, x_i^{(j)}=a_{jl})+\\lambda}{N+K S_j \\lambda}$$ \n",
+ "$$\\begin{aligned} \n",
+ "P(x^{(j)}=a_{jl} | Y=c_k)\n",
+ "&= \\frac{P(Y=c_k, x^{(j)}=a_{j l})}{P(y_i=c_k)} \\\\\n",
+ "&= \\frac{\\displaystyle \\frac{\\displaystyle \\sum_{i=1}^N I(y_i=c_k, x_i^{(j)}=a_{jl})+\\lambda}{N+K S_j \\lambda}}{\\displaystyle \\frac{\\displaystyle \\sum_{i=1}^N I(y_i=c_k) + \\lambda}{N+K \\lambda}} \\\\\n",
+ "&= (\\lambda可以任意取值,于是取\\lambda = S_j \\lambda) \\\\\n",
+ "&= \\frac{\\displaystyle \\frac{\\displaystyle \\sum_{i=1}^N I(y_i=c_k, x_i^{(j)}=a_{jl})+\\lambda}{N+K S_j \\lambda}}{\\displaystyle \\frac{\\displaystyle \\sum_{i=1}^N I(y_i=c_k) + \\lambda}{N+K S_j \\lambda}} \\\\ \n",
+ "&= \\frac{\\displaystyle \\sum_{i=1}^N I(y_i=c_k, x_i^{(j)}=a_{jl})+\\lambda}{\\displaystyle \\sum_{i=1}^N I(y_i=c_k) + \\lambda} (其中\\lambda = S_j \\lambda)\\\\\n",
+ "&= \\frac{\\displaystyle \\sum_{i=1}^N I(x_i^{(j)}=a_{jl},y_i=c_k) + \\lambda}{\\displaystyle \\sum_{i=1}^N I(y_i=c_k) + S_j \\lambda}\n",
+ "\\end{aligned} $$ \n",
+ "公式(4.11)得证。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "----\n",
+ "参考代码:https://github.com/wzyonggege/statistical-learning-method\n",
+ "\n",
+ "本文代码更新地址:https://github.com/fengdu78/lihang-code\n",
+ "\n",
+ "习题解答:https://github.com/datawhalechina/statistical-learning-method-solutions-manual\n",
+ "\n",
+ "中文注释制作:机器学习初学者公众号:ID:ai-start-com\n",
+ "\n",
+ "配置环境:python 3.5+\n",
+ "\n",
+ "代码全部测试通过。\n",
+ "![gongzhong](../gongzhong.jpg)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git "a/\347\254\25404\347\253\240 \346\234\264\347\264\240\350\264\235\345\217\266\346\226\257/4.NaiveBayes.ipynb" "b/\347\254\25404\347\253\240 \346\234\264\347\264\240\350\264\235\345\217\266\346\226\257/4.NaiveBayes.ipynb"
index a64ec2b..876bad9 100644
--- "a/\347\254\25404\347\253\240 \346\234\264\347\264\240\350\264\235\345\217\266\346\226\257/4.NaiveBayes.ipynb"
+++ "b/\347\254\25404\347\253\240 \346\234\264\347\264\240\350\264\235\345\217\266\346\226\257/4.NaiveBayes.ipynb"
@@ -45,7 +45,7 @@
},
{
"cell_type": "code",
- "execution_count": 1,
+ "execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
@@ -63,7 +63,7 @@
},
{
"cell_type": "code",
- "execution_count": 2,
+ "execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
@@ -76,15 +76,122 @@
" 'sepal length', 'sepal width', 'petal length', 'petal width', 'label'\n",
" ]\n",
" data = np.array(df.iloc[:100, :])\n",
- " # print(data)\n",
+ " print(data)\n",
" return data[:, :-1], data[:, -1]"
]
},
{
"cell_type": "code",
- "execution_count": 3,
+ "execution_count": 6,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[5.1 3.5 1.4 0.2 0. ]\n",
+ " [4.9 3. 1.4 0.2 0. ]\n",
+ " [4.7 3.2 1.3 0.2 0. ]\n",
+ " [4.6 3.1 1.5 0.2 0. ]\n",
+ " [5. 3.6 1.4 0.2 0. ]\n",
+ " [5.4 3.9 1.7 0.4 0. ]\n",
+ " [4.6 3.4 1.4 0.3 0. ]\n",
+ " [5. 3.4 1.5 0.2 0. ]\n",
+ " [4.4 2.9 1.4 0.2 0. ]\n",
+ " [4.9 3.1 1.5 0.1 0. ]\n",
+ " [5.4 3.7 1.5 0.2 0. ]\n",
+ " [4.8 3.4 1.6 0.2 0. ]\n",
+ " [4.8 3. 1.4 0.1 0. ]\n",
+ " [4.3 3. 1.1 0.1 0. ]\n",
+ " [5.8 4. 1.2 0.2 0. ]\n",
+ " [5.7 4.4 1.5 0.4 0. ]\n",
+ " [5.4 3.9 1.3 0.4 0. ]\n",
+ " [5.1 3.5 1.4 0.3 0. ]\n",
+ " [5.7 3.8 1.7 0.3 0. ]\n",
+ " [5.1 3.8 1.5 0.3 0. ]\n",
+ " [5.4 3.4 1.7 0.2 0. ]\n",
+ " [5.1 3.7 1.5 0.4 0. ]\n",
+ " [4.6 3.6 1. 0.2 0. ]\n",
+ " [5.1 3.3 1.7 0.5 0. ]\n",
+ " [4.8 3.4 1.9 0.2 0. ]\n",
+ " [5. 3. 1.6 0.2 0. ]\n",
+ " [5. 3.4 1.6 0.4 0. ]\n",
+ " [5.2 3.5 1.5 0.2 0. ]\n",
+ " [5.2 3.4 1.4 0.2 0. ]\n",
+ " [4.7 3.2 1.6 0.2 0. ]\n",
+ " [4.8 3.1 1.6 0.2 0. ]\n",
+ " [5.4 3.4 1.5 0.4 0. ]\n",
+ " [5.2 4.1 1.5 0.1 0. ]\n",
+ " [5.5 4.2 1.4 0.2 0. ]\n",
+ " [4.9 3.1 1.5 0.2 0. ]\n",
+ " [5. 3.2 1.2 0.2 0. ]\n",
+ " [5.5 3.5 1.3 0.2 0. ]\n",
+ " [4.9 3.6 1.4 0.1 0. ]\n",
+ " [4.4 3. 1.3 0.2 0. ]\n",
+ " [5.1 3.4 1.5 0.2 0. ]\n",
+ " [5. 3.5 1.3 0.3 0. ]\n",
+ " [4.5 2.3 1.3 0.3 0. ]\n",
+ " [4.4 3.2 1.3 0.2 0. ]\n",
+ " [5. 3.5 1.6 0.6 0. ]\n",
+ " [5.1 3.8 1.9 0.4 0. ]\n",
+ " [4.8 3. 1.4 0.3 0. ]\n",
+ " [5.1 3.8 1.6 0.2 0. ]\n",
+ " [4.6 3.2 1.4 0.2 0. ]\n",
+ " [5.3 3.7 1.5 0.2 0. ]\n",
+ " [5. 3.3 1.4 0.2 0. ]\n",
+ " [7. 3.2 4.7 1.4 1. ]\n",
+ " [6.4 3.2 4.5 1.5 1. ]\n",
+ " [6.9 3.1 4.9 1.5 1. ]\n",
+ " [5.5 2.3 4. 1.3 1. ]\n",
+ " [6.5 2.8 4.6 1.5 1. ]\n",
+ " [5.7 2.8 4.5 1.3 1. ]\n",
+ " [6.3 3.3 4.7 1.6 1. ]\n",
+ " [4.9 2.4 3.3 1. 1. ]\n",
+ " [6.6 2.9 4.6 1.3 1. ]\n",
+ " [5.2 2.7 3.9 1.4 1. ]\n",
+ " [5. 2. 3.5 1. 1. ]\n",
+ " [5.9 3. 4.2 1.5 1. ]\n",
+ " [6. 2.2 4. 1. 1. ]\n",
+ " [6.1 2.9 4.7 1.4 1. ]\n",
+ " [5.6 2.9 3.6 1.3 1. ]\n",
+ " [6.7 3.1 4.4 1.4 1. ]\n",
+ " [5.6 3. 4.5 1.5 1. ]\n",
+ " [5.8 2.7 4.1 1. 1. ]\n",
+ " [6.2 2.2 4.5 1.5 1. ]\n",
+ " [5.6 2.5 3.9 1.1 1. ]\n",
+ " [5.9 3.2 4.8 1.8 1. ]\n",
+ " [6.1 2.8 4. 1.3 1. ]\n",
+ " [6.3 2.5 4.9 1.5 1. ]\n",
+ " [6.1 2.8 4.7 1.2 1. ]\n",
+ " [6.4 2.9 4.3 1.3 1. ]\n",
+ " [6.6 3. 4.4 1.4 1. ]\n",
+ " [6.8 2.8 4.8 1.4 1. ]\n",
+ " [6.7 3. 5. 1.7 1. ]\n",
+ " [6. 2.9 4.5 1.5 1. ]\n",
+ " [5.7 2.6 3.5 1. 1. ]\n",
+ " [5.5 2.4 3.8 1.1 1. ]\n",
+ " [5.5 2.4 3.7 1. 1. ]\n",
+ " [5.8 2.7 3.9 1.2 1. ]\n",
+ " [6. 2.7 5.1 1.6 1. ]\n",
+ " [5.4 3. 4.5 1.5 1. ]\n",
+ " [6. 3.4 4.5 1.6 1. ]\n",
+ " [6.7 3.1 4.7 1.5 1. ]\n",
+ " [6.3 2.3 4.4 1.3 1. ]\n",
+ " [5.6 3. 4.1 1.3 1. ]\n",
+ " [5.5 2.5 4. 1.3 1. ]\n",
+ " [5.5 2.6 4.4 1.2 1. ]\n",
+ " [6.1 3. 4.6 1.4 1. ]\n",
+ " [5.8 2.6 4. 1.2 1. ]\n",
+ " [5. 2.3 3.3 1. 1. ]\n",
+ " [5.6 2.7 4.2 1.3 1. ]\n",
+ " [5.7 3. 4.2 1.2 1. ]\n",
+ " [5.7 2.9 4.2 1.3 1. ]\n",
+ " [6.2 2.9 4.3 1.3 1. ]\n",
+ " [5.1 2.5 3. 1.1 1. ]\n",
+ " [5.7 2.8 4.1 1.3 1. ]]\n"
+ ]
+ }
+ ],
"source": [
"X, y = create_data()\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)"
@@ -92,16 +199,16 @@
},
{
"cell_type": "code",
- "execution_count": 4,
+ "execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- "(array([5.7, 2.6, 3.5, 1. ]), 1.0)"
+ "(array([5. , 3.4, 1.5, 0.2]), 0.0)"
]
},
- "execution_count": 4,
+ "execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
@@ -115,22 +222,26 @@
"metadata": {},
"source": [
"参考:https://machinelearningmastery.com/naive-bayes-classifier-scratch-python/\n",
+ "http://www.zhangzhenhu.com/probability_model/%E8%B4%9D%E5%8F%B6%E6%96%AF%E5%88%86%E7%B1%BB%E5%99%A8.html#id5\n",
"\n",
"## GaussianNB 高斯朴素贝叶斯\n",
+ "朴素贝叶斯模型中,假设所有所有特征变量都是离散变量,然而实际应用场景是复杂多变的,很多时候是无法满足这样的强假设的。如果特征变量$X$是连续值怎么办?我们可以通过离散化的手段把连续值转换成离散变量后,再应用朴素贝叶斯模型进行处理。但这样做显然不够优雅,事实上,贝叶斯分类器并没有约束变量服从何种概率分布,理论上任何概率分布都是支持的,无论是标签变量Y ,还是特征变量$X$都可以是任意类型的分布。假设特征变量$X$是高斯分布,标签变量$Y$仍然是离散(类别)变量的分类模型, 这类模型有个名字,称之为高斯判别分析(Gaussian Discriminant Analysis,GDA),或者高斯朴素贝叶斯(GaussianNB)\n",
"\n",
- "特征的可能性被假设为高斯\n",
- "\n",
- "概率密度函数:\n",
+ "特征变量$x$中每个特征都为高斯概率密度函数(完全条件独立):\n",
"$$P(x_i | y_k)=\\frac{1}{\\sqrt{2\\pi\\sigma^2_{yk}}}exp(-\\frac{(x_i-\\mu_{yk})^2}{2\\sigma^2_{yk}})$$\n",
"\n",
"数学期望(mean):$\\mu$\n",
"\n",
- "方差:$\\sigma^2=\\frac{\\sum(X-\\mu)^2}{N}$"
+ "方差:$\\sigma^2=\\frac{\\sum(X-\\mu)^2}{N}$\n",
+ "\n",
+ "类似于朴素贝叶斯模型的参数估计过程,可以把观测样本集按照Y的值划分成两份,用每一份数据分别去估计上述两个高斯分布的参数即可。\n",
+ ">将输入$x$分到后验概率最大的类$y$。\n",
+ "$$y=\\arg \\max _{yk} P\\left(Y=yk\\right) \\prod_{i=1}^{n} \\frac{1}{\\sqrt{2\\pi\\sigma^2_{yk}}}exp(-\\frac{(x_i-\\mu_{yk})^2}{2\\sigma^2_{yk}})$$"
]
},
{
"cell_type": "code",
- "execution_count": 5,
+ "execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
@@ -148,7 +259,7 @@
" avg = self.mean(X)\n",
" return math.sqrt(sum([pow(x - avg, 2) for x in X]) / float(len(X)))\n",
"\n",
- " # 概率密度函数\n",
+ " # 高斯概率密度函数\n",
" def gaussian_probability(self, x, mean, stdev):\n",
" exponent = math.exp(-(math.pow(x - mean, 2) /\n",
" (2 * math.pow(stdev, 2))))\n",
@@ -161,14 +272,18 @@
"\n",
" # 分类别求出数学期望和标准差\n",
" def fit(self, X, y):\n",
+ " # 所有labels类型\n",
" labels = list(set(y))\n",
+ " # 创建label与[特征期望,特征标准差]的字典映射\n",
" data = {label: [] for label in labels}\n",
" for f, label in zip(X, y):\n",
" data[label].append(f)\n",
+ " # value为X的多(n)维特征,生成n个[特征期望,特征标准差]\n",
" self.model = {\n",
" label: self.summarize(value)\n",
" for label, value in data.items()\n",
" }\n",
+ " print(self.model)\n",
" return 'gaussianNB train done!'\n",
"\n",
" # 计算概率\n",
@@ -204,7 +319,7 @@
},
{
"cell_type": "code",
- "execution_count": 6,
+ "execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
@@ -213,16 +328,23 @@
},
{
"cell_type": "code",
- "execution_count": 7,
+ "execution_count": 20,
"metadata": {},
"outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "{0.0: [(5.042424242424242, 0.3709368181357421), (3.4818181818181815, 0.3545843024560018), (1.4484848484848483, 0.17077579155885297), (0.2424242424242425, 0.08886592908251627)], 1.0: [(5.932432432432432, 0.4644404096149176), (2.7540540540540546, 0.31928517677021306), (4.232432432432433, 0.4702236696389048), (1.3135135135135134, 0.19194898275205088)]}\n"
+ ]
+ },
{
"data": {
"text/plain": [
"'gaussianNB train done!'"
]
},
- "execution_count": 7,
+ "execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
@@ -233,7 +355,7 @@
},
{
"cell_type": "code",
- "execution_count": 8,
+ "execution_count": 21,
"metadata": {},
"outputs": [
{
@@ -250,7 +372,7 @@
},
{
"cell_type": "code",
- "execution_count": 9,
+ "execution_count": 22,
"metadata": {},
"outputs": [
{
@@ -259,7 +381,7 @@
"1.0"
]
},
- "execution_count": 9,
+ "execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
@@ -279,7 +401,7 @@
},
{
"cell_type": "code",
- "execution_count": 10,
+ "execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
@@ -288,7 +410,7 @@
},
{
"cell_type": "code",
- "execution_count": 11,
+ "execution_count": 24,
"metadata": {},
"outputs": [
{
@@ -297,7 +419,7 @@
"GaussianNB()"
]
},
- "execution_count": 11,
+ "execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
@@ -309,7 +431,7 @@
},
{
"cell_type": "code",
- "execution_count": 12,
+ "execution_count": 25,
"metadata": {},
"outputs": [
{
@@ -318,7 +440,7 @@
"1.0"
]
},
- "execution_count": 12,
+ "execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
@@ -329,7 +451,7 @@
},
{
"cell_type": "code",
- "execution_count": 13,
+ "execution_count": 26,
"metadata": {},
"outputs": [
{
@@ -338,7 +460,7 @@
"array([0.])"
]
},
- "execution_count": 13,
+ "execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
@@ -394,7 +516,7 @@
"metadata": {},
"source": [
"### 习题4.2\n",
- " 用贝叶斯估计法推出朴素贝叶斯法中的慨率估计公式(4.10)及公式(4.11)"
+ " 用贝叶斯估计法推出朴素贝叶斯法中的概率估计公式(4.10)及公式(4.11)"
]
},
{
@@ -464,7 +586,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.7.6"
+ "version": "3.8.3"
}
},
"nbformat": 4,
diff --git "a/\347\254\25405\347\253\240 \345\206\263\347\255\226\346\240\221/.ipynb_checkpoints/5.DecisonTree-checkpoint.ipynb" "b/\347\254\25405\347\253\240 \345\206\263\347\255\226\346\240\221/.ipynb_checkpoints/5.DecisonTree-checkpoint.ipynb"
new file mode 100644
index 0000000..4f862a4
--- /dev/null
+++ "b/\347\254\25405\347\253\240 \345\206\263\347\255\226\346\240\221/.ipynb_checkpoints/5.DecisonTree-checkpoint.ipynb"
@@ -0,0 +1,1370 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 第5章 决策树"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "1.分类决策树模型是表示基于特征对实例进行分类的树形结构。决策树可以转换成一个**if-then**规则的集合,也可以看作是定义在特征空间划分上的类的条件概率分布。\n",
+ "\n",
+ "2.决策树学习旨在构建一个与训练数据拟合很好,并且复杂度小的决策树。因为从可能的决策树中直接选取最优决策树是NP完全问题。现实中采用启发式方法学习次优的决策树。\n",
+ "\n",
+ "决策树学习算法包括3部分:特征选择、树的生成和树的剪枝。常用的算法有ID3、\n",
+ "C4.5和CART。\n",
+ "\n",
+ "3.特征选择的目的在于选取对训练数据能够分类的特征。特征选择的关键是其准则。常用的准则如下:\n",
+ "\n",
+ "(1)样本集合$D$对特征$A$的信息增益(ID3)\n",
+ "\n",
+ "\n",
+ "$$g(D, A)=H(D)-H(D|A)$$\n",
+ "\n",
+ "$$H(D)=-\\sum_{k=1}^{K} \\frac{\\left|C_{k}\\right|}{|D|} \\log _{2} \\frac{\\left|C_{k}\\right|}{|D|}$$\n",
+ "\n",
+ "$$H(D | A)=\\sum_{i=1}^{n} \\frac{\\left|D_{i}\\right|}{|D|} H\\left(D_{i}\\right)$$\n",
+ "\n",
+ "其中,$H(D)$是数据集$D$的熵,$H(D_i)$是数据集$D_i$的熵,$H(D|A)$是数据集$D$对特征$A$的条件熵。\t$D_i$是$D$中特征$A$取第$i$个值的样本子集,$C_k$是$D$中属于第$k$类的样本子集。$n$是特征$A$取 值的个数,$K$是类的个数。\n",
+ "\n",
+ "(2)样本集合$D$对特征$A$的信息增益比(C4.5)\n",
+ "\n",
+ "\n",
+ "$$g_{R}(D, A)=\\frac{g(D, A)}{H_A(D)}$$\n",
+ "\n",
+ "$$H_A(D)=\\sum_{i=1}^{n} \\frac{\\left|D_{i}\\right|}{|D|}\\log _{2} \\frac{\\left|D_{i}\\right|}{|D|} $$\n",
+ "\n",
+ "其中,$g(D,A)$是信息增益,$H_A(D)$是 D 关于特征 A 的值的熵。\n",
+ "\n",
+ "(3)样本集合$D$的基尼指数(CART)\n",
+ "\n",
+ "$$\\operatorname{Gini}(D)=1-\\sum_{k=1}^{K}\\left(\\frac{\\left|C_{k}\\right|}{|D|}\\right)^{2}$$\n",
+ "\n",
+ "特征$A$条件下集合$D$的基尼指数:\n",
+ "\n",
+ " $$\\operatorname{Gini}(D, A)=\\frac{\\left|D_{1}\\right|}{|D|} \\operatorname{Gini}\\left(D_{1}\\right)+\\frac{\\left|D_{2}\\right|}{|D|} \\operatorname{Gini}\\left(D_{2}\\right)$$\n",
+ " \n",
+ "4.决策树的生成。通常使用信息增益最大、信息增益比最大或基尼指数最小作为特征选择的准则。决策树的生成往往通过计算信息增益或其他指标,从根结点开始,递归地产生决策树。这相当于用信息增益或其他准则不断地选取局部最优的特征,或将训练集分割为能够基本正确分类的子集。\n",
+ "\n",
+ "5.决策树的剪枝。由于生成的决策树存在过拟合问题,需要对它进行剪枝,以简化学到的决策树。决策树的剪枝,往往从已生成的树上剪掉一些叶结点或叶结点以上的子树,并将其父结点或根结点作为新的叶结点,从而简化生成的决策树。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline\n",
+ "\n",
+ "from sklearn.datasets import load_iris\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "from collections import Counter\n",
+ "import math\n",
+ "from math import log\n",
+ "import pprint"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 书上题目5.1"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# 书上题目5.1\n",
+ "def create_data():\n",
+ " datasets = [['青年', '否', '否', '一般', '否'],\n",
+ " ['青年', '否', '否', '好', '否'],\n",
+ " ['青年', '是', '否', '好', '是'],\n",
+ " ['青年', '是', '是', '一般', '是'],\n",
+ " ['青年', '否', '否', '一般', '否'],\n",
+ " ['中年', '否', '否', '一般', '否'],\n",
+ " ['中年', '否', '否', '好', '否'],\n",
+ " ['中年', '是', '是', '好', '是'],\n",
+ " ['中年', '否', '是', '非常好', '是'],\n",
+ " ['中年', '否', '是', '非常好', '是'],\n",
+ " ['老年', '否', '是', '非常好', '是'],\n",
+ " ['老年', '否', '是', '好', '是'],\n",
+ " ['老年', '是', '否', '好', '是'],\n",
+ " ['老年', '是', '否', '非常好', '是'],\n",
+ " ['老年', '否', '否', '一般', '否'],\n",
+ " ]\n",
+ " labels = [u'年龄', u'有工作', u'有自己的房子', u'信贷情况', u'类别']\n",
+ " # 返回数据集和每个维度的名称\n",
+ " return datasets, labels"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "datasets, labels = create_data()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_data = pd.DataFrame(datasets, columns=labels)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 年龄 | \n",
+ " 有工作 | \n",
+ " 有自己的房子 | \n",
+ " 信贷情况 | \n",
+ " 类别 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 青年 | \n",
+ " 否 | \n",
+ " 否 | \n",
+ " 一般 | \n",
+ " 否 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 青年 | \n",
+ " 否 | \n",
+ " 否 | \n",
+ " 好 | \n",
+ " 否 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 青年 | \n",
+ " 是 | \n",
+ " 否 | \n",
+ " 好 | \n",
+ " 是 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 青年 | \n",
+ " 是 | \n",
+ " 是 | \n",
+ " 一般 | \n",
+ " 是 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 青年 | \n",
+ " 否 | \n",
+ " 否 | \n",
+ " 一般 | \n",
+ " 否 | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " 中年 | \n",
+ " 否 | \n",
+ " 否 | \n",
+ " 一般 | \n",
+ " 否 | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " 中年 | \n",
+ " 否 | \n",
+ " 否 | \n",
+ " 好 | \n",
+ " 否 | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " 中年 | \n",
+ " 是 | \n",
+ " 是 | \n",
+ " 好 | \n",
+ " 是 | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " 中年 | \n",
+ " 否 | \n",
+ " 是 | \n",
+ " 非常好 | \n",
+ " 是 | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " 中年 | \n",
+ " 否 | \n",
+ " 是 | \n",
+ " 非常好 | \n",
+ " 是 | \n",
+ "
\n",
+ " \n",
+ " 10 | \n",
+ " 老年 | \n",
+ " 否 | \n",
+ " 是 | \n",
+ " 非常好 | \n",
+ " 是 | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " 老年 | \n",
+ " 否 | \n",
+ " 是 | \n",
+ " 好 | \n",
+ " 是 | \n",
+ "
\n",
+ " \n",
+ " 12 | \n",
+ " 老年 | \n",
+ " 是 | \n",
+ " 否 | \n",
+ " 好 | \n",
+ " 是 | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " 老年 | \n",
+ " 是 | \n",
+ " 否 | \n",
+ " 非常好 | \n",
+ " 是 | \n",
+ "
\n",
+ " \n",
+ " 14 | \n",
+ " 老年 | \n",
+ " 否 | \n",
+ " 否 | \n",
+ " 一般 | \n",
+ " 否 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 年龄 有工作 有自己的房子 信贷情况 类别\n",
+ "0 青年 否 否 一般 否\n",
+ "1 青年 否 否 好 否\n",
+ "2 青年 是 否 好 是\n",
+ "3 青年 是 是 一般 是\n",
+ "4 青年 否 否 一般 否\n",
+ "5 中年 否 否 一般 否\n",
+ "6 中年 否 否 好 否\n",
+ "7 中年 是 是 好 是\n",
+ "8 中年 否 是 非常好 是\n",
+ "9 中年 否 是 非常好 是\n",
+ "10 老年 否 是 非常好 是\n",
+ "11 老年 否 是 好 是\n",
+ "12 老年 是 否 好 是\n",
+ "13 老年 是 否 非常好 是\n",
+ "14 老年 否 否 一般 否"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "train_data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# 熵\n",
+ "def calc_ent(datasets):\n",
+ " data_length = len(datasets)\n",
+ " label_count = {}\n",
+ " for i in range(data_length):\n",
+ " label = datasets[i][-1]\n",
+ " if label not in label_count:\n",
+ " label_count[label] = 0\n",
+ " label_count[label] += 1\n",
+ " ent = -sum([(p / data_length) * log(p / data_length, 2)\n",
+ " for p in label_count.values()])\n",
+ " return ent\n",
+ "\n",
+ "\n",
+ "# 经验条件熵\n",
+ "def cond_ent(datasets, axis=0):\n",
+ " data_length = len(datasets)\n",
+ " feature_sets = {}\n",
+ " for i in range(data_length):\n",
+ " feature = datasets[i][axis]\n",
+ " if feature not in feature_sets:\n",
+ " feature_sets[feature] = []\n",
+ " feature_sets[feature].append(datasets[i])\n",
+ " cond_ent = sum(\n",
+ " [(len(p) / data_length) * calc_ent(p) for p in feature_sets.values()])\n",
+ " return cond_ent\n",
+ "\n",
+ "\n",
+ "# 信息增益\n",
+ "def info_gain(ent, cond_ent):\n",
+ " return ent - cond_ent\n",
+ "\n",
+ "\n",
+ "def info_gain_train(datasets):\n",
+ " count = len(datasets[0]) - 1\n",
+ " ent = calc_ent(datasets)\n",
+ " best_feature = []\n",
+ " for c in range(count):\n",
+ " c_info_gain = info_gain(ent, cond_ent(datasets, axis=c))\n",
+ " best_feature.append((c, c_info_gain))\n",
+ " print('特征({}) - info_gain - {:.3f}'.format(labels[c], c_info_gain))\n",
+ " # 比较大小\n",
+ " best_ = max(best_feature, key=lambda x: x[-1])\n",
+ " return '特征({})的信息增益最大,选择为根节点特征'.format(labels[best_[0]])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "特征(年龄) - info_gain - 0.083\n",
+ "特征(有工作) - info_gain - 0.324\n",
+ "特征(有自己的房子) - info_gain - 0.420\n",
+ "特征(信贷情况) - info_gain - 0.363\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "'特征(有自己的房子)的信息增益最大,选择为根节点特征'"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "info_gain_train(np.array(datasets))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "---\n",
+ "\n",
+ "利用ID3算法生成决策树,例5.3"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# 定义节点类 二叉树\n",
+ "class Node:\n",
+ " def __init__(self, root=True, label=None, feature_name=None, feature=None):\n",
+ " self.root = root\n",
+ " self.label = label\n",
+ " self.feature_name = feature_name\n",
+ " self.feature = feature\n",
+ " self.tree = {}\n",
+ " self.result = {\n",
+ " 'label:': self.label,\n",
+ " 'feature': self.feature,\n",
+ " 'tree': self.tree\n",
+ " }\n",
+ "\n",
+ " def __repr__(self):\n",
+ " return '{}'.format(self.result)\n",
+ "\n",
+ " def add_node(self, val, node):\n",
+ " self.tree[val] = node\n",
+ "\n",
+ " def predict(self, features):\n",
+ " if self.root is True:\n",
+ " return self.label\n",
+ " return self.tree[features[self.feature]].predict(features)\n",
+ "\n",
+ "\n",
+ "class DTree:\n",
+ " def __init__(self, epsilon=0.1):\n",
+ " self.epsilon = epsilon\n",
+ " self._tree = {}\n",
+ "\n",
+ " # 熵\n",
+ " @staticmethod\n",
+ " def calc_ent(datasets):\n",
+ " data_length = len(datasets)\n",
+ " label_count = {}\n",
+ " for i in range(data_length):\n",
+ " label = datasets[i][-1]\n",
+ " if label not in label_count:\n",
+ " label_count[label] = 0\n",
+ " label_count[label] += 1\n",
+ " ent = -sum([(p / data_length) * log(p / data_length, 2)\n",
+ " for p in label_count.values()])\n",
+ " return ent\n",
+ "\n",
+ " # 经验条件熵\n",
+ " def cond_ent(self, datasets, axis=0):\n",
+ " data_length = len(datasets)\n",
+ " feature_sets = {}\n",
+ " for i in range(data_length):\n",
+ " feature = datasets[i][axis]\n",
+ " if feature not in feature_sets:\n",
+ " feature_sets[feature] = []\n",
+ " feature_sets[feature].append(datasets[i])\n",
+ " cond_ent = sum([(len(p) / data_length) * self.calc_ent(p)\n",
+ " for p in feature_sets.values()])\n",
+ " return cond_ent\n",
+ "\n",
+ " # 信息增益\n",
+ " @staticmethod\n",
+ " def info_gain(ent, cond_ent):\n",
+ " return ent - cond_ent\n",
+ "\n",
+ " def info_gain_train(self, datasets):\n",
+ " count = len(datasets[0]) - 1\n",
+ " ent = self.calc_ent(datasets)\n",
+ " best_feature = []\n",
+ " for c in range(count):\n",
+ " c_info_gain = self.info_gain(ent, self.cond_ent(datasets, axis=c))\n",
+ " best_feature.append((c, c_info_gain))\n",
+ " # 比较大小\n",
+ " best_ = max(best_feature, key=lambda x: x[-1])\n",
+ " return best_\n",
+ "\n",
+ " def train(self, train_data):\n",
+ " \"\"\"\n",
+ " input:数据集D(DataFrame格式),特征集A,阈值eta\n",
+ " output:决策树T\n",
+ " \"\"\"\n",
+ " _, y_train, features = train_data.iloc[:, :\n",
+ " -1], train_data.iloc[:,\n",
+ " -1], train_data.columns[:\n",
+ " -1]\n",
+ " # 1,若D中实例属于同一类Ck,则T为单节点树,并将类Ck作为结点的类标记,返回T\n",
+ " if len(y_train.value_counts()) == 1:\n",
+ " return Node(root=True, label=y_train.iloc[0])\n",
+ "\n",
+ " # 2, 若A为空,则T为单节点树,将D中实例树最大的类Ck作为该节点的类标记,返回T\n",
+ " if len(features) == 0:\n",
+ " return Node(\n",
+ " root=True,\n",
+ " label=y_train.value_counts().sort_values(\n",
+ " ascending=False).index[0])\n",
+ "\n",
+ " # 3,计算最大信息增益 同5.1,Ag为信息增益最大的特征\n",
+ " max_feature, max_info_gain = self.info_gain_train(np.array(train_data))\n",
+ " max_feature_name = features[max_feature]\n",
+ "\n",
+ " # 4,Ag的信息增益小于阈值eta,则置T为单节点树,并将D中是实例数最大的类Ck作为该节点的类标记,返回T\n",
+ " if max_info_gain < self.epsilon:\n",
+ " return Node(\n",
+ " root=True,\n",
+ " label=y_train.value_counts().sort_values(\n",
+ " ascending=False).index[0])\n",
+ "\n",
+ " # 5,构建Ag子集\n",
+ " node_tree = Node(\n",
+ " root=False, feature_name=max_feature_name, feature=max_feature)\n",
+ "\n",
+ " feature_list = train_data[max_feature_name].value_counts().index\n",
+ " for f in feature_list:\n",
+ " sub_train_df = train_data.loc[train_data[max_feature_name] ==\n",
+ " f].drop([max_feature_name], axis=1)\n",
+ "\n",
+ " # 6, 递归生成树\n",
+ " sub_tree = self.train(sub_train_df)\n",
+ " node_tree.add_node(f, sub_tree)\n",
+ "\n",
+ " # pprint.pprint(node_tree.tree)\n",
+ " return node_tree\n",
+ "\n",
+ " def fit(self, train_data):\n",
+ " self._tree = self.train(train_data)\n",
+ " return self._tree\n",
+ "\n",
+ " def predict(self, X_test):\n",
+ " return self._tree.predict(X_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "datasets, labels = create_data()\n",
+ "data_df = pd.DataFrame(datasets, columns=labels)\n",
+ "dt = DTree()\n",
+ "tree = dt.fit(data_df)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'label:': None, 'feature': 2, 'tree': {'否': {'label:': None, 'feature': 1, 'tree': {'否': {'label:': '否', 'feature': None, 'tree': {}}, '是': {'label:': '是', 'feature': None, 'tree': {}}}}, '是': {'label:': '是', 'feature': None, 'tree': {}}}}"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "tree"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'是'"
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "dt.predict(['老年', '否', '是', '一般'])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### scikit-learn实例"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# data\n",
+ "def create_data():\n",
+ " iris = load_iris()\n",
+ " df = pd.DataFrame(iris.data, columns=iris.feature_names)\n",
+ " df['label'] = iris.target\n",
+ " df.columns = [\n",
+ " 'sepal length', 'sepal width', 'petal length', 'petal width', 'label'\n",
+ " ]\n",
+ " data = np.array(df.iloc[:100, [0, 1, -1]])\n",
+ " # print(data)\n",
+ " return data[:, :2], data[:, -1]\n",
+ "\n",
+ "\n",
+ "X, y = create_data()\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.tree import DecisionTreeClassifier\n",
+ "from sklearn.tree import export_graphviz\n",
+ "import graphviz"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "DecisionTreeClassifier()"
+ ]
+ },
+ "execution_count": 37,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "clf = DecisionTreeClassifier() #X特征为连续变量\n",
+ "clf.fit(X_train, y_train,)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9333333333333333"
+ ]
+ },
+ "execution_count": 38,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "clf.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 43,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "tree_pic = export_graphviz(clf, out_file=\"mytree.pdf\")\n",
+ "with open('mytree.pdf') as f:\n",
+ " dot_graph = f.read()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 44,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 44,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "graphviz.Source(dot_graph)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 第5章决策树-习题\n",
+ "\n",
+ "### 习题5.1\n",
+ "根据表5.1所给的训练数据集,利用信息增益比(C4.5算法)生成决策树。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**解答:** \n",
+ "\n",
+ "表5.1 贷款申请样本数据表 \n",
+ "\n",
+ "ID | 年龄 | 有工作 | 有自己的房子 | 信贷情况 | 类别\n",
+ ":-: | :-: | :-: | :-: | :-: | :-: \n",
+ "1 | 青年 | 否 | 否 | 一般 | 否\n",
+ "2 | 青年 | 否 | 否 | 好 | 否\n",
+ "3 | 青年 | 是 | 否 | 好 | 是\n",
+ "4 | 青年 | 是 | 是 | 一般 | 是\n",
+ "5 | 青年 | 否 | 否 | 一般 | 否\n",
+ "6 | 中年 | 否 | 否 | 一般 | 否\n",
+ "7 | 中年 | 否 | 否 | 好 | 否\n",
+ "8 | 中年 | 是 | 是 | 好 | 是\n",
+ "9 | 中年 | 否 | 是 | 非常好 | 是\n",
+ "10 | 中年 | 否 | 是 | 非常好 | 是\n",
+ "11 | 老年 | 否 | 是 | 非常好 | 是\n",
+ "12 | 老年 | 否 | 是 | 好 | 是\n",
+ "13 | 老年 | 是 | 否 | 好 | 是\n",
+ "14 | 老年 | 是 | 否 | 非常好 | 是\n",
+ "15 | 老年 | 否 | 否 | 一般 | 否"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 50,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "X_train: 0 1 2 3\n",
+ "0 6 2 2 0\n",
+ "1 6 2 2 3\n",
+ "2 6 4 2 3\n",
+ "3 6 4 4 0\n",
+ "4 6 2 2 0\n",
+ "5 1 2 2 0\n",
+ "6 1 2 2 3\n",
+ "7 1 4 4 3\n",
+ "8 1 2 4 7\n",
+ "9 1 2 4 7\n",
+ "10 5 2 4 7\n",
+ "11 5 2 4 3\n",
+ "12 5 4 2 3\n",
+ "13 5 4 2 7\n",
+ "14 5 2 2 0\n",
+ "y_train: 0\n",
+ "0 0\n",
+ "1 0\n",
+ "2 1\n",
+ "3 1\n",
+ "4 0\n",
+ "5 0\n",
+ "6 0\n",
+ "7 1\n",
+ "8 1\n",
+ "9 1\n",
+ "10 1\n",
+ "11 1\n",
+ "12 1\n",
+ "13 1\n",
+ "14 0\n"
+ ]
+ },
+ {
+ "data": {
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 50,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.tree import DecisionTreeClassifier\n",
+ "from sklearn import preprocessing\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "\n",
+ "from sklearn import tree\n",
+ "import graphviz\n",
+ "\n",
+ "features = [\"年龄\", \"有工作\", \"有自己的房子\", \"信贷情况\"]\n",
+ "X_train = pd.DataFrame([\n",
+ " [\"青年\", \"否\", \"否\", \"一般\"],\n",
+ " [\"青年\", \"否\", \"否\", \"好\"],\n",
+ " [\"青年\", \"是\", \"否\", \"好\"],\n",
+ " [\"青年\", \"是\", \"是\", \"一般\"],\n",
+ " [\"青年\", \"否\", \"否\", \"一般\"],\n",
+ " [\"中年\", \"否\", \"否\", \"一般\"],\n",
+ " [\"中年\", \"否\", \"否\", \"好\"],\n",
+ " [\"中年\", \"是\", \"是\", \"好\"],\n",
+ " [\"中年\", \"否\", \"是\", \"非常好\"],\n",
+ " [\"中年\", \"否\", \"是\", \"非常好\"],\n",
+ " [\"老年\", \"否\", \"是\", \"非常好\"],\n",
+ " [\"老年\", \"否\", \"是\", \"好\"],\n",
+ " [\"老年\", \"是\", \"否\", \"好\"],\n",
+ " [\"老年\", \"是\", \"否\", \"非常好\"],\n",
+ " [\"老年\", \"否\", \"否\", \"一般\"]\n",
+ "])\n",
+ "y_train = pd.DataFrame([\"否\", \"否\", \"是\", \"是\", \"否\", \n",
+ " \"否\", \"否\", \"是\", \"是\", \"是\", \n",
+ " \"是\", \"是\", \"是\", \"是\", \"否\"])\n",
+ "# 数据预处理\n",
+ "le_x = preprocessing.LabelEncoder() #将X特征编码为连续变量\n",
+ "le_x.fit(np.unique(X_train))\n",
+ "X_train = X_train.apply(le_x.transform)\n",
+ "print(\"X_train: \", X_train)\n",
+ "le_y = preprocessing.LabelEncoder()\n",
+ "le_y\n",
+ "le_y.fit(np.unique(y_train))\n",
+ "y_train = y_train.apply(le_y.transform)\n",
+ "print(\"y_train: \", y_train)\n",
+ "# 调用sklearn.DT建立训练模型\n",
+ "model_tree = DecisionTreeClassifier() #需要X特征为连续变量\n",
+ "model_tree.fit(X_train, y_train)\n",
+ "\n",
+ "# 可视化\n",
+ "dot_data = tree.export_graphviz(model_tree, out_file=None,\n",
+ " feature_names=features,\n",
+ " class_names=[str(k) for k in np.unique(y_train)],\n",
+ " filled=True, rounded=True,\n",
+ " special_characters=True)\n",
+ "graph = graphviz.Source(dot_data)\n",
+ "graph"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 习题5.2\n",
+ " 已知如表5.2所示的训练数据,试用平方误差损失准则生成一个二叉回归树。 \n",
+ "表5.2 训练数据表 \n",
+ "\n",
+ "| $x_i$ | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | \n",
+ "| - | - | - | - | - | - | - | - | - | - | - | \n",
+ "| $y_i$ | 4.50 | 4.75 | 4.91 | 5.34 | 5.80 | 7.05 | 7.90 | 8.23 | 8.70 | 9.00"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**解答:** \n",
+ " 决策树的生成就是递归地构建二叉决策树的过程,对回归树用平方误差最小化准则,对分类树用基尼指数(Gini index)最小化准则,进行特征选择,生成二叉树。 \n",
+ "> 算法5.5(最小二乘回归树生成算法) \n",
+ "输入:训练数据集$D$ \n",
+ "输出:回归树$f(x)$ \n",
+ "在训练数据集所在的输入空间中,递归地将每个区域划分为两个子区域并决定每个子区域上的输出值,构建二叉决策树; \n",
+ "(1)选择最优切分变量$j$与切分点$s$,求解$$\\min_{j,s} \\left[ \\min_{c_1} \\sum_{x_i \\in R_1(j,s)} (y_i - c_1)^2 + \\min_{c_2} \\sum_{x_i \\in R_2(j,s)} (y_i - c_2)^2\\right]$$遍历变量$j$,对固定的切分变量$j$扫描切分点$s$,选择使得上式达到最小值的对$(j,s)$ \n",
+ "(2)用选定的对$(j,s)$划分区域并决定相应的输出值:$$R_1(j,s)=\\{x|x^{(j)}\\leqslant s\\}, R_2(j,s)=\\{x|x^{(j)} > s\\} \\\\ \n",
+ "\\hat{c_m} = \\frac{1}{N_m} \\sum_{x_i \\in R_m(j,s)} y_i, x \\in R_m, m=1,2 $$\n",
+ "(3)继续对两个子区域调用步骤(1),(2),直至满足停止条件 \n",
+ "(4)将输入空间划分为$M$个区域$R_1,R_2,\\cdots,R_M$,生成决策树:$$f(x)=\\sum_{m=1}^M \\hat{c_m} I(x \\in R_m)$$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 57,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "\n",
+ "\n",
+ "class LeastSqRTree:\n",
+ " def __init__(self, train_X, y, epsilon):\n",
+ " # 训练集特征值\n",
+ " self.x = train_X\n",
+ " # 类别\n",
+ " self.y = y\n",
+ " # 特征总数\n",
+ " self.feature_count = train_X.shape[1]\n",
+ " # 损失阈值\n",
+ " self.epsilon = epsilon\n",
+ " # 回归树\n",
+ " self.tree = None\n",
+ "\n",
+ " def _fit(self, x, y, feature_count, epsilon):\n",
+ " # 选择最优切分点变量j与切分点s\n",
+ " (j, s, minval, c1, c2) = self._divide(x, y, feature_count)\n",
+ " # 初始化树\n",
+ " tree = {\"feature\": j, \"value\": x[s, j], \"left\": None, \"right\": None}\n",
+ " # 如果平方误差小于epsilon 或者\n",
+ " if minval < self.epsilon or len(y[np.where(x[:, j] <= x[s, j])]) <= 1:\n",
+ " tree[\"left\"] = c1\n",
+ " else:\n",
+ " tree[\"left\"] = self._fit(x[np.where(x[:, j] <= x[s, j])],\n",
+ " y[np.where(x[:, j] <= x[s, j])],\n",
+ " self.feature_count, self.epsilon)\n",
+ " if minval < self.epsilon or len(y[np.where(x[:, j] > x[s, j])]) <= 1:\n",
+ " tree[\"right\"] = c2\n",
+ " else:\n",
+ " tree[\"right\"] = self._fit(x[np.where(x[:, j] > x[s, j])],\n",
+ " y[np.where(x[:, j] > x[s, j])],\n",
+ " self.feature_count, self.epsilon)\n",
+ " return tree\n",
+ "\n",
+ " def fit(self):\n",
+ " self.tree = self._fit(self.x, self.y, self.feature_count, self.epsilon)\n",
+ "\n",
+ " @staticmethod\n",
+ " def _divide(x, y, feature_count):\n",
+ " # 初始化损失误差\n",
+ " cost = np.zeros((feature_count, len(x)))\n",
+ " # 公式5.21\n",
+ " for i in range(feature_count):\n",
+ " for k in range(len(x)):\n",
+ " # k行i列的特征值\n",
+ " value = x[k, i]\n",
+ " y1 = y[np.where(x[:, i] <= value)]\n",
+ " c1 = np.mean(y1)\n",
+ " y2 = y[np.where(x[:, i] > value)]\n",
+ " c2 = np.mean(y2)\n",
+ " y1[:] = y1[:] - c1\n",
+ " y2[:] = y2[:] - c2\n",
+ " cost[i, k] = np.sum(y1 * y1) + np.sum(y2 * y2)\n",
+ " # 选取最优损失误差点\n",
+ " cost_index = np.where(cost == np.min(cost))\n",
+ " # 选取第几个特征值\n",
+ " j = cost_index[0][0]\n",
+ " # 选取特征值的切分点\n",
+ " s = cost_index[1][0]\n",
+ " # 求两个区域的均值c1,c2\n",
+ " c1 = np.mean(y[np.where(x[:, j] <= x[s, j])])\n",
+ " c2 = np.mean(y[np.where(x[:, j] > x[s, j])])\n",
+ " return j, s, cost[cost_index], c1, c2"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 58,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'feature': 0,\n",
+ " 'value': 5,\n",
+ " 'left': {'feature': 0, 'value': 3, 'left': 4.72, 'right': 5.57},\n",
+ " 'right': {'feature': 0,\n",
+ " 'value': 7,\n",
+ " 'left': {'feature': 0, 'value': 6, 'left': 7.05, 'right': 7.9},\n",
+ " 'right': {'feature': 0, 'value': 8, 'left': 8.23, 'right': 8.85}}}"
+ ]
+ },
+ "execution_count": 58,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "train_X = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]).T\n",
+ "y = np.array([4.50, 4.75, 4.91, 5.34, 5.80, 7.05, 7.90, 8.23, 8.70, 9.00])\n",
+ "\n",
+ "model_tree = LeastSqRTree(train_X, y, .2)\n",
+ "model_tree.fit()\n",
+ "model_tree.tree"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "根据上面程序的输出,可得到用平方误差损失准则生成一个二叉回归树:$$f(x)=\\begin{cases}\n",
+ "4.72 & x \\le 3\\\\\n",
+ "5.57 & 3 < x \\le 5\\\\\n",
+ "7.05 & 5 < x \\le 6\\\\\n",
+ "7.9 & 6 < x \\le 7 \\\\\n",
+ "8.23 & 7 < x \\le 8\\\\\n",
+ "8.85 & x > 8\\\\\n",
+ "\\end{cases}$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 习题5.3\n",
+ "\n",
+ " 证明 CART 剪枝算法中,当$\\alpha$确定时,存在唯一的最小子树$T_{\\alpha}$使损失函数$C_{\\alpha}(T)$最小。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**解答:** \n",
+ "**第1步:**内部节点是否剪枝只与以该节点为根节点的子树有关。 \n",
+ "剪枝过程: \n",
+ "计算子树的损失函数:$$C_{\\alpha}(T)=C(T)+\\alpha$$其中,$\\displaystyle C(T) = \\sum_{t=1}^{|T|}N_t (1 - \\sum_{k=1}^K (\\frac{N_{tk}}{N_t})^2)$,$|T|$是叶结点个数,$K$是类别个数。 \n",
+ "有剪枝前子树$T_0$,剪枝后子树$T_1$,满足$C_{\\alpha}(T_1) \\leqslant C_{\\alpha}(T_0)$则进行剪枝。 \n",
+ "\n",
+ "----\n",
+ "\n",
+ "**第2步(反证法):**假设当$\\alpha$确定时,存在两颗子树$T_1,T_2$都使得损失函数$C_{\\alpha}$最小。 \n",
+ "第1种情况:假设被剪枝的子树在同一边,易知其中一个子树会由另一个子树剪枝而得到,故不可能存在两个最优子树,原结论得证。 \n",
+ "第2种情况:假设被剪枝的子树不在同一边,易知被剪枝掉的子树都可以使损失函数$C_{\\alpha}$最小,故两颗子树都可以继续剪枝,故不可能存在两个最优子树,原结论得证。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 习题5.4\n",
+ "\n",
+ " 证明 CART 剪枝算法中求出的子树序列$\\{T_0,T_1,\\cdots,T_n\\}$分别是区间$\\alpha \\in [\\alpha_i,\\alpha_{i+1})$的最优子树$T_{\\alpha}$,这里$i=0,1,\\cdots,n,0=\\alpha_0 < \\alpha_1 < \\cdots, \\alpha_n < +\\infty$。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**解答:** \n",
+ "原结论可以表述为:将$\\alpha$从小增大,$0=\\alpha_0<\\alpha_1<\\cdots<\\alpha_n < +\\infty$,在每个区间$[\\alpha_i,\\alpha_{i+1})$中,子树$T_i$是这个区间里最优的。 \n",
+ "**第1步:**易证,当$\\alpha=0$时,整棵树$T_0$是最优的,当$\\alpha \\rightarrow +\\infty$时,根结点组成的单结点树(即$T_n$)是最优的。\n",
+ "\n",
+ "----\n",
+ "\n",
+ "**第2步:** \n",
+ " 由于每次剪枝剪的都是某个内部结点的子结点,也就是将某个内部结点的所有子结点回退到这个内部结点里,并将这个内部结点作为叶子结点。因此在计算整体的损失函数时,这个内部结点以外的值都没变,只有这个内部结点的局部损失函数改变了,因此本来需要计算全局的损失函数,但现在只需要计算内部结点剪枝前和剪枝后的损失函数。 \n",
+ "从整体树$T_0$开始剪枝,对$T_0$的任意内部结点$t$ \n",
+ "剪枝前的状态:有$|T_t|$个叶子结点,预测误差是$C(T_t)$ \n",
+ "剪枝后的状态:只有本身一个叶子结点,预测误差是$C(t)$\n",
+ "因此剪枝前的以$t$结点为根结点的子树的损失函数是$$C_{\\alpha}(T_t) = C(T_t) + \\alpha|T_t|$$剪枝后的损失函数是$$C_{\\alpha}(t) = C(t) + \\alpha$$易得,一定存在一个$\\alpha$使得$C_{\\alpha}(T_t) = C_{\\alpha}(t)$,这个值为$$\\alpha=\\frac{C(t)-C(T_t)}{|T_t|-1}$$可知,找到$\\alpha$即找到了子结点$t$,即完成了剪枝,得到最优子树$T_1$ \n",
+ "根据书中第73页,采用以下公式计算剪枝后整体损失函数减少的程度:$$g(t)=\\frac{C(t)-C(T_t)}{|T_t|-1}$$在$T_0$中剪去$g(t)$最小的$T_t$,将得到的子树作为$T_1$,同时将最小的$g(t)$设为$\\alpha_1$,$T_1$为区间$[\\alpha_1,\\alpha_2)$的最优子树。 \n",
+ "依次类推,子树$T_i$是区间$[\\alpha_i,\\alpha_{i+1})$里最优的,原结论得证。\n",
+ "\n",
+ "----\n",
+ "\n",
+ "**参考文献:** \n",
+ "1. MrTriste:https://blog.csdn.net/wjc1182511338/article/details/76793164\n",
+ "2. http://www.pianshen.com/article/1752163397/\n",
+ "\n",
+ "----\n",
+ "\n",
+ "**讨论:**为什么$\\alpha$要取最小的$g(t)$呢? \n",
+ "
\n",
+ "
图5.1 最小的$g(t)$
\n",
+ " 以图中两个点为例,结点1和结点2,$g(t)_2$大于$g(t)_1$,假设在所有结点中$g(t)_1$最小,$g(t)_2$最大,两种选择方法:当选择最大值$g(t)_2$,即结点2进行剪枝,但此时结点1的剪枝前的误差大于剪枝后的误差,即如果不剪枝,误差变大,依次类推,对其它所有的结点的$g(t)$都是如此,从而造成整体的累计误差更大。反之,如果选择最小值$g(t)_1$,即结点1进行剪枝,则其余结点不剪的误差要小于剪枝后的误差,不剪枝为好,且整体的误差最小。从而以最小$g(t)$剪枝获得的子树是该$\\alpha$值下的最优子树。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----\n",
+ "参考代码:https://github.com/wzyonggege/statistical-learning-method\n",
+ "\n",
+ "本文代码更新地址:https://github.com/fengdu78/lihang-code\n",
+ "\n",
+ "习题解答:https://github.com/datawhalechina/statistical-learning-method-solutions-manual\n",
+ "\n",
+ "中文注释制作:机器学习初学者公众号:ID:ai-start-com\n",
+ "\n",
+ "配置环境:python 3.5+\n",
+ "\n",
+ "代码全部测试通过。\n",
+ "![gongzhong](../gongzhong.jpg)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git "a/\347\254\25405\347\253\240 \345\206\263\347\255\226\346\240\221/5.DecisonTree.ipynb" "b/\347\254\25405\347\253\240 \345\206\263\347\255\226\346\240\221/5.DecisonTree.ipynb"
index fced97d..4f862a4 100644
--- "a/\347\254\25405\347\253\240 \345\206\263\347\255\226\346\240\221/5.DecisonTree.ipynb"
+++ "b/\347\254\25405\347\253\240 \345\206\263\347\255\226\346\240\221/5.DecisonTree.ipynb"
@@ -34,10 +34,11 @@
"(2)样本集合$D$对特征$A$的信息增益比(C4.5)\n",
"\n",
"\n",
- "$$g_{R}(D, A)=\\frac{g(D, A)}{H(D)}$$\n",
+ "$$g_{R}(D, A)=\\frac{g(D, A)}{H_A(D)}$$\n",
"\n",
+ "$$H_A(D)=\\sum_{i=1}^{n} \\frac{\\left|D_{i}\\right|}{|D|}\\log _{2} \\frac{\\left|D_{i}\\right|}{|D|} $$\n",
"\n",
- "其中,$g(D,A)$是信息增益,$H(D)$是数据集$D$的熵。\n",
+ "其中,$g(D,A)$是信息增益,$H_A(D)$是 D 关于特征 A 的值的熵。\n",
"\n",
"(3)样本集合$D$的基尼指数(CART)\n",
"\n",
@@ -313,7 +314,7 @@
},
{
"cell_type": "code",
- "execution_count": 6,
+ "execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
@@ -329,13 +330,6 @@
" ent = -sum([(p / data_length) * log(p / data_length, 2)\n",
" for p in label_count.values()])\n",
" return ent\n",
- "# def entropy(y):\n",
- "# \"\"\"\n",
- "# Entropy of a label sequence\n",
- "# \"\"\"\n",
- "# hist = np.bincount(y)\n",
- "# ps = hist / np.sum(hist)\n",
- "# return -np.sum([p * np.log2(p) for p in ps if p > 0])\n",
"\n",
"\n",
"# 经验条件熵\n",
@@ -360,7 +354,6 @@
"def info_gain_train(datasets):\n",
" count = len(datasets[0]) - 1\n",
" ent = calc_ent(datasets)\n",
- "# ent = entropy(datasets)\n",
" best_feature = []\n",
" for c in range(count):\n",
" c_info_gain = info_gain(ent, cond_ent(datasets, axis=c))\n",
@@ -373,7 +366,7 @@
},
{
"cell_type": "code",
- "execution_count": 7,
+ "execution_count": 9,
"metadata": {},
"outputs": [
{
@@ -392,7 +385,7 @@
"'特征(有自己的房子)的信息增益最大,选择为根节点特征'"
]
},
- "execution_count": 7,
+ "execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
@@ -414,7 +407,7 @@
},
{
"cell_type": "code",
- "execution_count": 8,
+ "execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
@@ -549,7 +542,7 @@
},
{
"cell_type": "code",
- "execution_count": 9,
+ "execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
@@ -561,7 +554,7 @@
},
{
"cell_type": "code",
- "execution_count": 10,
+ "execution_count": 15,
"metadata": {
"scrolled": true
},
@@ -572,7 +565,7 @@
"{'label:': None, 'feature': 2, 'tree': {'否': {'label:': None, 'feature': 1, 'tree': {'否': {'label:': '否', 'feature': None, 'tree': {}}, '是': {'label:': '是', 'feature': None, 'tree': {}}}}, '是': {'label:': '是', 'feature': None, 'tree': {}}}}"
]
},
- "execution_count": 10,
+ "execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
@@ -583,22 +576,22 @@
},
{
"cell_type": "code",
- "execution_count": 11,
+ "execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- "'否'"
+ "'是'"
]
},
- "execution_count": 11,
+ "execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
- "dt.predict(['老年', '否', '否', '一般'])"
+ "dt.predict(['老年', '否', '是', '一般'])"
]
},
{
@@ -610,7 +603,7 @@
},
{
"cell_type": "code",
- "execution_count": 12,
+ "execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
@@ -633,7 +626,7 @@
},
{
"cell_type": "code",
- "execution_count": 13,
+ "execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
@@ -644,7 +637,7 @@
},
{
"cell_type": "code",
- "execution_count": 14,
+ "execution_count": 37,
"metadata": {},
"outputs": [
{
@@ -653,28 +646,28 @@
"DecisionTreeClassifier()"
]
},
- "execution_count": 14,
+ "execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
- "clf = DecisionTreeClassifier()\n",
+ "clf = DecisionTreeClassifier() #X特征为连续变量\n",
"clf.fit(X_train, y_train,)"
]
},
{
"cell_type": "code",
- "execution_count": 15,
+ "execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- "0.9666666666666667"
+ "0.9333333333333333"
]
},
- "execution_count": 15,
+ "execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
@@ -685,7 +678,7 @@
},
{
"cell_type": "code",
- "execution_count": 16,
+ "execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
@@ -696,140 +689,186 @@
},
{
"cell_type": "code",
- "execution_count": 17,
+ "execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
- "\r\n",
- "\r\n",
- "\r\n",
- "\r\n",
- "\r\n"
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
],
"text/plain": [
- ""
+ ""
]
},
- "execution_count": 17,
+ "execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
@@ -877,95 +916,142 @@
},
{
"cell_type": "code",
- "execution_count": 18,
+ "execution_count": 50,
"metadata": {},
"outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "X_train: 0 1 2 3\n",
+ "0 6 2 2 0\n",
+ "1 6 2 2 3\n",
+ "2 6 4 2 3\n",
+ "3 6 4 4 0\n",
+ "4 6 2 2 0\n",
+ "5 1 2 2 0\n",
+ "6 1 2 2 3\n",
+ "7 1 4 4 3\n",
+ "8 1 2 4 7\n",
+ "9 1 2 4 7\n",
+ "10 5 2 4 7\n",
+ "11 5 2 4 3\n",
+ "12 5 4 2 3\n",
+ "13 5 4 2 7\n",
+ "14 5 2 2 0\n",
+ "y_train: 0\n",
+ "0 0\n",
+ "1 0\n",
+ "2 1\n",
+ "3 1\n",
+ "4 0\n",
+ "5 0\n",
+ "6 0\n",
+ "7 1\n",
+ "8 1\n",
+ "9 1\n",
+ "10 1\n",
+ "11 1\n",
+ "12 1\n",
+ "13 1\n",
+ "14 0\n"
+ ]
+ },
{
"data": {
"image/svg+xml": [
- "\r\n",
- "\r\n",
- "\r\n",
- "\r\n",
- "\r\n"
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
],
"text/plain": [
- ""
+ ""
]
},
- "execution_count": 18,
+ "execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
@@ -1001,14 +1087,17 @@
" \"否\", \"否\", \"是\", \"是\", \"是\", \n",
" \"是\", \"是\", \"是\", \"是\", \"否\"])\n",
"# 数据预处理\n",
- "le_x = preprocessing.LabelEncoder()\n",
+ "le_x = preprocessing.LabelEncoder() #将X特征编码为连续变量\n",
"le_x.fit(np.unique(X_train))\n",
"X_train = X_train.apply(le_x.transform)\n",
+ "print(\"X_train: \", X_train)\n",
"le_y = preprocessing.LabelEncoder()\n",
+ "le_y\n",
"le_y.fit(np.unique(y_train))\n",
"y_train = y_train.apply(le_y.transform)\n",
+ "print(\"y_train: \", y_train)\n",
"# 调用sklearn.DT建立训练模型\n",
- "model_tree = DecisionTreeClassifier()\n",
+ "model_tree = DecisionTreeClassifier() #需要X特征为连续变量\n",
"model_tree.fit(X_train, y_train)\n",
"\n",
"# 可视化\n",
@@ -1053,7 +1142,7 @@
},
{
"cell_type": "code",
- "execution_count": 19,
+ "execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
@@ -1078,13 +1167,14 @@
" (j, s, minval, c1, c2) = self._divide(x, y, feature_count)\n",
" # 初始化树\n",
" tree = {\"feature\": j, \"value\": x[s, j], \"left\": None, \"right\": None}\n",
+ " # 如果平方误差小于epsilon 或者\n",
" if minval < self.epsilon or len(y[np.where(x[:, j] <= x[s, j])]) <= 1:\n",
" tree[\"left\"] = c1\n",
" else:\n",
" tree[\"left\"] = self._fit(x[np.where(x[:, j] <= x[s, j])],\n",
" y[np.where(x[:, j] <= x[s, j])],\n",
" self.feature_count, self.epsilon)\n",
- " if minval < self.epsilon or len(y[np.where(x[:, j] > s)]) <= 1:\n",
+ " if minval < self.epsilon or len(y[np.where(x[:, j] > x[s, j])]) <= 1:\n",
" tree[\"right\"] = c2\n",
" else:\n",
" tree[\"right\"] = self._fit(x[np.where(x[:, j] > x[s, j])],\n",
@@ -1125,19 +1215,9 @@
},
{
"cell_type": "code",
- "execution_count": 20,
+ "execution_count": 58,
"metadata": {},
"outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\numpy\\core\\fromnumeric.py:3335: RuntimeWarning: Mean of empty slice.\n",
- " out=out, **kwargs)\n",
- "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\numpy\\core\\_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars\n",
- " ret = ret.dtype.type(ret / rcount)\n"
- ]
- },
{
"data": {
"text/plain": [
@@ -1150,7 +1230,7 @@
" 'right': {'feature': 0, 'value': 8, 'left': 8.23, 'right': 8.85}}}"
]
},
- "execution_count": 20,
+ "execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
@@ -1282,7 +1362,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.7.6"
+ "version": "3.8.3"
}
},
"nbformat": 4,
diff --git "a/\347\254\25405\347\253\240 \345\206\263\347\255\226\346\240\221/mytree.pdf" "b/\347\254\25405\347\253\240 \345\206\263\347\255\226\346\240\221/mytree.pdf"
index a3b5149..f0be410 100644
Binary files "a/\347\254\25405\347\253\240 \345\206\263\347\255\226\346\240\221/mytree.pdf" and "b/\347\254\25405\347\253\240 \345\206\263\347\255\226\346\240\221/mytree.pdf" differ
diff --git "a/\347\254\25406\347\253\240 \351\200\273\350\276\221\346\226\257\350\260\233\345\233\236\345\275\222/.ipynb_checkpoints/6.LogisticRegression-checkpoint.ipynb" "b/\347\254\25406\347\253\240 \351\200\273\350\276\221\346\226\257\350\260\233\345\233\236\345\275\222/.ipynb_checkpoints/6.LogisticRegression-checkpoint.ipynb"
new file mode 100644
index 0000000..276668d
--- /dev/null
+++ "b/\347\254\25406\347\253\240 \351\200\273\350\276\221\346\226\257\350\260\233\345\233\236\345\275\222/.ipynb_checkpoints/6.LogisticRegression-checkpoint.ipynb"
@@ -0,0 +1,2208 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 第6章 逻辑斯谛回归"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "逻辑斯谛回归(LR)是经典的分类方法\n",
+ "\n",
+ "1.逻辑斯谛回归模型是由以下条件概率分布表示的分类模型。逻辑斯谛回归模型可以用于二类或多类分类。\n",
+ "\n",
+ "$$P(Y=k | x)=\\frac{\\exp \\left(w_{k} \\cdot x\\right)}{1+\\sum_{k=1}^{K-1} \\exp \\left(w_{k} \\cdot x\\right)}, \\quad k=1,2, \\cdots, K-1$$\n",
+ "\n",
+ "$$P(Y=K | x)=\\frac{1}{1+\\sum_{k=1}^{K-1} \\exp \\left(w_{k} \\cdot x\\right)}$$\n",
+ "这里,$x$为输入特征,$w$为特征的权值。\n",
+ "\n",
+ "逻辑斯谛回归模型源自逻辑斯谛分布,其分布函数$F(x)$是$S$形函数。逻辑斯谛回归模型是由输入的线性函数表示的输出的对数几率模型。\n",
+ "\n",
+ "二分类的似然函数:\n",
+ "$$L(w)=\\prod[p(x_{i})]^{y_{i}}[1-p(x_{i})]^{1-y_{i}}$$\n",
+ "取平均对数似然函数,得到损失函数:\n",
+ "$$J(w) = -\\frac{1}{n}(\\sum_{i=1}^n(y_ilnp(x_i)+(1-y_i)ln(1-p(x_i)))$$\n",
+ "可以通过梯度下降法或牛顿法来进行极大似然估计,估计模型参数;\n",
+ "梯度下降法是通过 J(w) 对 w 的一阶导数来找下降方向,并且以迭代的方式来更新参数,更新方式为:\n",
+ "$$g_i = \\frac{\\partial J(w)} {\\partial w_i} =(p(x_i)-y_i)x_i \\\\ w^{k+1}_i=w^k_i-\\alpha g_i$$\n",
+ "其中 k 为迭代次数。每次更新参数后,可以通过比较 $||J(w^{k+1})−J(w^k)||$小于阈值或者到达最大迭代次数来停止迭代。\n",
+ "\n",
+ "2.最大熵模型是由以下条件概率分布表示的分类模型。最大熵模型也可以用于二类或多类分类。\n",
+ "\n",
+ "$$P_{w}(y | x)=\\frac{1}{Z_{w}(x)} \\exp \\left(\\sum_{i=1}^{n} w_{i} f_{i}(x, y)\\right)$$\n",
+ "$$Z_{w}(x)=\\sum_{y} \\exp \\left(\\sum_{i=1}^{n} w_{i} f_{i}(x, y)\\right)$$\n",
+ "\n",
+ "其中,$Z_w(x)$是规范化因子,$f_i$为特征函数,$w_i$为特征的权值。\n",
+ "\n",
+ "3.最大熵模型可以由最大熵原理推导得出。最大熵原理是概率模型学习或估计的一个准则。最大熵原理认为在所有可能的概率模型(分布)的集合中,熵最大的模型是最好的模型。\n",
+ "\n",
+ "最大熵原理应用到分类模型的学习中,有以下约束最优化问题:\n",
+ "\n",
+ "$$\\min -H(P)=\\sum_{x, y} \\tilde{P}(x) P(y | x) \\log P(y | x)$$\n",
+ "\n",
+ "$$s.t. \\quad P\\left(f_{i}\\right)-\\tilde{P}\\left(f_{i}\\right)=0, \\quad i=1,2, \\cdots, n$$\n",
+ " \n",
+ " $$\\sum_{y} P(y | x)=1$$\n",
+ " \n",
+ "求解此最优化问题的对偶问题得到最大熵模型。\n",
+ "\n",
+ "4.逻辑斯谛回归模型与最大熵模型都属于对数线性模型。\n",
+ "\n",
+ "5.逻辑斯谛回归模型及最大熵模型学习一般采用极大似然估计,或正则化的极大似然估计。逻辑斯谛回归模型及最大熵模型学习可以形式化为无约束最优化问题。求解该最优化问题的算法有改进的迭代尺度法、梯度下降法、拟牛顿法。\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ "回归模型:$f(x) = \\frac{1}{1+e^{-wx}}$\n",
+ "\n",
+ "其中wx线性函数:$wx =w_0\\cdot x_0 + w_1\\cdot x_1 + w_2\\cdot x_2 +...+w_n\\cdot x_n,(x_0=1)$\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from math import exp\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline\n",
+ "\n",
+ "from sklearn.datasets import load_iris\n",
+ "from sklearn.model_selection import train_test_split"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# data\n",
+ "def create_data():\n",
+ " iris = load_iris()\n",
+ " df = pd.DataFrame(iris.data, columns=iris.feature_names)\n",
+ " df['label'] = iris.target\n",
+ " df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']\n",
+ " data = np.array(df.iloc[:100, [0,1,-1]])\n",
+ " # print(data)\n",
+ " return data[:,:2], data[:,-1]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X, y = create_data()\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "class LogisticReressionClassifier:\n",
+ " def __init__(self, max_iter=200, learning_rate=0.01):\n",
+ " self.max_iter = max_iter\n",
+ " self.learning_rate = learning_rate\n",
+ "\n",
+ " def sigmoid(self, x):\n",
+ " return 1 / (1 + exp(-x))\n",
+ "\n",
+ " def data_matrix(self, X):\n",
+ " data_mat = []\n",
+ " for d in X:\n",
+ " data_mat.append([1.0, *d])\n",
+ " return data_mat\n",
+ "\n",
+ " def fit(self, X, y):\n",
+ " # label = np.mat(y)\n",
+ " data_mat = self.data_matrix(X) # m*n\n",
+ " self.weights = np.zeros((len(data_mat[0]), 1), dtype=np.float32)\n",
+ " # 梯度下降法\n",
+ " for iter_ in range(self.max_iter):\n",
+ " for i in range(len(X)):\n",
+ " result = self.sigmoid(np.dot(data_mat[i], self.weights))\n",
+ " error = y[i] - result\n",
+ " self.weights += self.learning_rate * error * np.transpose(\n",
+ " [data_mat[i]])\n",
+ " print('LogisticRegression Model(learning_rate={},max_iter={})'.format(\n",
+ " self.learning_rate, self.max_iter))\n",
+ "\n",
+ " # def f(self, x):\n",
+ " # return -(self.weights[0] + self.weights[1] * x) / self.weights[2]\n",
+ "\n",
+ " def score(self, X_test, y_test):\n",
+ " right = 0\n",
+ " X_test = self.data_matrix(X_test)\n",
+ " for x, y in zip(X_test, y_test):\n",
+ " result = np.dot(x, self.weights)\n",
+ " if (result > 0 and y == 1) or (result < 0 and y == 0):\n",
+ " right += 1\n",
+ " return right / len(X_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "LogisticRegression Model(learning_rate=0.01,max_iter=200)\n"
+ ]
+ }
+ ],
+ "source": [
+ "lr_clf = LogisticReressionClassifier()\n",
+ "lr_clf.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "1.0"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "lr_clf.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ "