{"id":1679,"date":"2025-02-06T20:44:58","date_gmt":"2025-02-06T12:44:58","guid":{"rendered":"https:\/\/www.forillusion.com\/?p=1679"},"modified":"2025-02-14T11:39:01","modified_gmt":"2025-02-14T03:39:01","slug":"3-12-weight-decay","status":"publish","type":"post","link":"https:\/\/www.forillusion.com\/index.php\/3-12-weight-decay\/","title":{"rendered":"3.12 \u6743\u91cd\u8870\u51cf"},"content":{"rendered":"\n<p><div class=\"has-toc have-toc\"><\/div><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u8303\u6570<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>L1\u8303\u6570<\/strong><\/h3>\n\n\n\n<p>L1\u8303\u6570\u662f\u5411\u91cf\u4e2d\u6240\u6709\u5143\u7d20\u7edd\u5bf9\u503c\u7684\u548c\u3002\u5bf9\u4e8e\u4e00\u4e2a\u5411\u91cf $ w = [w_1, w_2, \u2026, w_n] $\uff0c\u5176L1\u8303\u6570\u5b9a\u4e49\u4e3a\uff1a<\/p>\n\n\n\n<p>$$<br>|w|_1 = |w_1| + |w_2| + \u2026 + |w_n|<br>$$<\/p>\n\n\n\n<p>L1\u8303\u6570\u5e38\u7528\u4e8e\u7a00\u758f\u6027\u7ea6\u675f\uff0c\u56e0\u4e3a\u5b83\u4f1a\u4fc3\u4f7f\u67d0\u4e9b\u6743\u91cd\u53d8\u4e3a\u96f6\uff0c\u4ece\u800c\u5b9e\u73b0\u7279\u5f81\u9009\u62e9\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">L2\u8303\u6570<\/h3>\n\n\n\n<p>L2\u8303\u6570\u662f\u5411\u91cf\u4e2d\u6240\u6709\u5143\u7d20\u5e73\u65b9\u548c\u7684\u5e73\u65b9\u6839\u3002\u5bf9\u4e8e\u540c\u4e00\u4e2a\u5411\u91cf $w$\uff0c\u5176L2\u8303\u6570\u5b9a\u4e49\u4e3a\uff1a<\/p>\n\n\n\n<p>$$<br>|w|_2 = \\sqrt{w_1^2 + w_2^2 + \u2026 + w_n^2}<br>$$<\/p>\n\n\n\n<p>\u5728\u673a\u5668\u5b66\u4e60\u4e2d\uff0cL2\u8303\u6570\u901a\u5e38\u7528\u4e8e\u6b63\u5219\u5316\uff0c\u4ee5\u9632\u6b62\u6a21\u578b\u8fc7\u62df\u5408\u3002\u5728\u6743\u91cd\u8870\u51cf\u4e2d\uff0cL2\u8303\u6570\u60e9\u7f5a\u9879\u53ef\u4ee5\u9650\u5236\u6743\u91cd\u53c2\u6570\u7684\u5927\u5c0f\uff0c\u4f7f\u5b83\u4eec\u63a5\u8fd1\u4e8e\u96f6\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u65b9\u6cd5<\/h2>\n\n\n\n<p>\u6743\u91cd\u8870\u51cf\u7b49\u4ef7\u4e8e $L_2$ \u8303\u6570\u6b63\u5219\u5316\uff08regularization\uff09\u3002\u6b63\u5219\u5316\u901a\u8fc7\u4e3a\u6a21\u578b\u635f\u5931\u51fd\u6570\u6dfb\u52a0\u60e9\u7f5a\u9879\u4f7f\u5b66\u51fa\u7684\u6a21\u578b\u53c2\u6570\u503c\u8f83\u5c0f\uff0c\u662f\u5e94\u5bf9\u8fc7\u62df\u5408\u7684\u5e38\u7528\u624b\u6bb5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6743\u91cd\u8870\u51cf\u901a\u8fc7\u5728\u635f\u5931\u51fd\u6570\u4e2d\u52a0\u5165\u4e00\u4e2a\u4e0e\u6743\u91cd\u53c2\u6570\u7684L2\u8303\u6570\u6210\u6b63\u6bd4\u7684\u60e9\u7f5a\u9879\uff0c\u6765\u9650\u5236\u6a21\u578b\u6743\u91cd\u7684\u589e\u957f\u3002<\/p>\n\n\n\n<p>$L_2$\u8303\u6570\u6b63\u5219\u5316\u5728\u6a21\u578b\u539f\u635f\u5931\u51fd\u6570\u57fa\u7840\u4e0a\u6dfb\u52a0$L_2$\u8303\u6570\u60e9\u7f5a\u9879\uff0c\u4ece\u800c\u5f97\u5230\u8bad\u7ec3\u6240\u9700\u8981\u6700\u5c0f\u5316\u7684\u51fd\u6570\u3002$L_2$\u8303\u6570\u60e9\u7f5a\u9879\u6307\u7684\u662f\u6a21\u578b\u6743\u91cd\u53c2\u6570\u6bcf\u4e2a\u5143\u7d20\u7684\u5e73\u65b9\u548c\u4e0e\u4e00\u4e2a\u6b63\u7684\u5e38\u6570\u7684\u4e58\u79ef\u3002\u4ee53.1\u8282\uff08\u7ebf\u6027\u56de\u5f52\uff09\u4e2d\u7684\u7ebf\u6027\u56de\u5f52\u635f\u5931\u51fd\u6570<\/p>\n\n\n\n<p>$$<br>\\ell(w_1, w_2, b) = \\frac{1}{n} \\sum_{i=1}^n \\frac{1}{2}\\left(x_1^{(i)} w_1 + x_2^{(i)} w_2 + b - y^{(i)}\\right)^2<br>$$<\/p>\n\n\n\n<p>\u4e3a\u4f8b\uff0c\u5176\u4e2d$w_1, w_2$\u662f\u6743\u91cd\u53c2\u6570\uff0c$b$\u662f\u504f\u5dee\u53c2\u6570\uff0c\u6837\u672c$i$\u7684\u8f93\u5165\u4e3a$x_1^{(i)}, x_2^{(i)}$\uff0c\u6807\u7b7e\u4e3a$y^{(i)}$\uff0c\u6837\u672c\u6570\u4e3a$n$\u3002\u5c06\u6743\u91cd\u53c2\u6570\u7528\u5411\u91cf$\\boldsymbol{w} = [w_1, w_2]$\u8868\u793a\uff0c\u5e26\u6709$L_2$\u8303\u6570\u60e9\u7f5a\u9879\u7684\u65b0\u635f\u5931\u51fd\u6570\u4e3a<\/p>\n\n\n\n<p>$$<br>\\ell(w_1, w_2, b) + \\frac{\\lambda}{2n} |\\boldsymbol{w}|^2,<br>$$<\/p>\n\n\n\n<p>\u5176\u4e2d\u8d85\u53c2\u6570$\\lambda &gt; 0$\u3002\u5f53\u6743\u91cd\u53c2\u6570\u5747\u4e3a0\u65f6\uff0c\u60e9\u7f5a\u9879\u6700\u5c0f\u3002\u5f53$\\lambda$\u8f83\u5927\u65f6\uff0c\u60e9\u7f5a\u9879\u5728\u635f\u5931\u51fd\u6570\u4e2d\u7684\u6bd4\u91cd\u8f83\u5927\uff0c\u8fd9\u901a\u5e38\u4f1a\u4f7f\u5b66\u5230\u7684\u6743\u91cd\u53c2\u6570\u7684\u5143\u7d20\u8f83\u63a5\u8fd10\u3002\u5f53$\\lambda$\u8bbe\u4e3a0\u65f6\uff0c\u60e9\u7f5a\u9879\u5b8c\u5168\u4e0d\u8d77\u4f5c\u7528\u3002\u4e0a\u5f0f\u4e2d$L_2$\u8303\u6570\u5e73\u65b9$|\\boldsymbol{w}|^2$\u5c55\u5f00\u540e\u5f97\u5230$w_1^2 + w_2^2$\u3002\u6709\u4e86$L_2$\u8303\u6570\u60e9\u7f5a\u9879\u540e\uff0c\u5728\u5c0f\u6279\u91cf\u968f\u673a\u68af\u5ea6\u4e0b\u964d\u4e2d\uff0c\u5c06\u7ebf\u6027\u56de\u5f52\u4e00\u8282\u4e2d\u6743\u91cd$w_1$\u548c$w_2$\u7684\u8fed\u4ee3\u65b9\u5f0f\u66f4\u6539\u4e3a<\/p>\n\n\n\n<p>$$<br>\\begin{aligned}<br>w_1 &amp;\\leftarrow \\left(1- \\frac{\\eta\\lambda}{|\\mathcal{B}|} \\right)w_1 - \\frac{\\eta}{|\\mathcal{B}|} \\sum_{i \\in \\mathcal{B}}x_1^{(i)} \\left(x_1^{(i)} w_1 + x_2^{(i)} w_2 + b - y^{(i)}\\right),\\\\<br>w_2 &amp;\\leftarrow \\left(1- \\frac{\\eta\\lambda}{|\\mathcal{B}|} \\right)w_2 - \\frac{\\eta}{|\\mathcal{B}|} \\sum_{i \\in \\mathcal{B}}x_2^{(i)} \\left(x_1^{(i)} w_1 + x_2^{(i)} w_2 + b - y^{(i)}\\right).<br>\\end{aligned}<br>$$<\/p>\n\n\n\n<p>\u53ef\u89c1\uff0c$L_2$\u8303\u6570\u6b63\u5219\u5316\u4ee4\u6743\u91cd$w_1$\u548c$w_2$\u5148\u81ea\u4e58\u5c0f\u4e8e1\u7684\u6570\uff0c\u518d\u51cf\u53bb\u4e0d\u542b\u60e9\u7f5a\u9879\u7684\u68af\u5ea6\u3002\u56e0\u6b64\uff0c$L_2$\u8303\u6570\u6b63\u5219\u5316\u53c8\u53eb\u6743\u91cd\u8870\u51cf\u3002\u6743\u91cd\u8870\u51cf\u901a\u8fc7\u60e9\u7f5a\u7edd\u5bf9\u503c\u8f83\u5927\u7684\u6a21\u578b\u53c2\u6570\u4e3a\u9700\u8981\u5b66\u4e60\u7684\u6a21\u578b\u589e\u52a0\u4e86\u9650\u5236\uff0c\u8fd9\u53ef\u80fd\u5bf9\u8fc7\u62df\u5408\u6709\u6548\u3002\u5b9e\u9645\u573a\u666f\u4e2d\uff0c\u6709\u65f6\u4e5f\u5728\u60e9\u7f5a\u9879\u4e2d\u6dfb\u52a0\u504f\u5dee\u5143\u7d20\u7684\u5e73\u65b9\u548c\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>L2\u8303\u6570\u6b63\u5219\u5316\u5982\u4f55\u5e94\u5bf9\u8fc7\u62df\u5408\uff1f<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>\u9650\u5236\u6a21\u578b\u590d\u6742\u5ea6<\/strong><\/h4>\n\n\n\n<p>\u6a21\u578b\u7684\u590d\u6742\u5ea6\u901a\u5e38\u4e0e\u6743\u91cd\u53c2\u6570\u7684\u5927\u5c0f\u76f8\u5173\u3002\u8f83\u5927\u7684\u6743\u91cd\u53c2\u6570\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u5bf9\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u5fae\u5c0f\u53d8\u5316\u8fc7\u4e8e\u654f\u611f\uff0c\u4ece\u800c\u62df\u5408\u566a\u58f0\u3002L2\u6b63\u5219\u5316\u901a\u8fc7\u7ea6\u675f\u6743\u91cd\u53c2\u6570\u7684\u5927\u5c0f\uff0c\u964d\u4f4e\u4e86\u6a21\u578b\u7684\u590d\u6742\u5ea6\uff0c\u4f7f\u5176\u66f4\u503e\u5411\u4e8e\u5b66\u4e60\u6570\u636e\u7684\u6574\u4f53\u8d8b\u52bf\uff0c\u800c\u4e0d\u662f\u566a\u58f0\u6216\u5f02\u5e38\u70b9\u3002<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>\u5e73\u6ed1\u51b3\u7b56\u8fb9\u754c<\/strong><\/h4>\n\n\n\n<p>\u5728\u5206\u7c7b\u4efb\u52a1\u4e2d\uff0c\u8f83\u5927\u7684\u6743\u91cd\u53c2\u6570\u53ef\u80fd\u5bfc\u81f4\u51b3\u7b56\u8fb9\u754c\u975e\u5e38\u9661\u5ced\u6216\u590d\u6742\u3002\u8fd9\u79cd\u590d\u6742\u7684\u51b3\u7b56\u8fb9\u754c\u5bb9\u6613\u5bfc\u81f4\u8fc7\u62df\u5408\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u4f1a\u5c06\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u566a\u58f0\u8bef\u8ba4\u4e3a\u662f\u91cd\u8981\u7684\u7279\u5f81\u3002L2\u6b63\u5219\u5316\u901a\u8fc7\u7f29\u5c0f\u6743\u91cd\u53c2\u6570\uff0c\u4f7f\u5f97\u51b3\u7b56\u8fb9\u754c\u66f4\u52a0\u5e73\u6ed1\uff0c\u4ece\u800c\u63d0\u9ad8\u6a21\u578b\u7684\u6cdb\u5316\u80fd\u529b\u3002<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>\u51cf\u5c11\u6781\u7aef\u6743\u91cd\u7684\u5f71\u54cd<\/strong><\/h4>\n\n\n\n<p>\u5728\u6df1\u5ea6\u5b66\u4e60\u6216\u9ad8\u7ef4\u6570\u636e\u4e2d\uff0c\u67d0\u4e9b\u7279\u5f81\u53ef\u80fd\u5177\u6709\u6781\u5927\u7684\u6743\u91cd\u503c\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u6a21\u578b\u5bf9\u8fd9\u4e9b\u7279\u5f81\u7684\u53d8\u5316\u8fc7\u4e8e\u654f\u611f\u3002L2\u6b63\u5219\u5316\u901a\u8fc7\u60e9\u7f5a\u5927\u6743\u91cd\u503c\uff0c\u51cf\u5c11\u4e86\u8fd9\u4e9b\u6781\u7aef\u6743\u91cd\u5bf9\u6a21\u578b\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u4f7f\u6a21\u578b\u66f4\u52a0\u9c81\u68d2\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u9ad8\u7ef4\u7ebf\u6027\u56de\u5f52\u5b9e\u9a8c<\/h2>\n\n\n\n<p>\u8bbe\u6570\u636e\u6837\u672c\u7279\u5f81\u7684\u7ef4\u5ea6\u4e3a$p$\u3002\u5bf9\u4e8e\u8bad\u7ec3\u6570\u636e\u96c6\u548c\u6d4b\u8bd5\u6570\u636e\u96c6\u4e2d\u7279\u5f81\u4e3a$x_1, x_2, \\ldots, x_p$\u7684\u4efb\u4e00\u6837\u672c\uff0c\u4f7f\u7528\u5982\u4e0b\u7684\u7ebf\u6027\u51fd\u6570\u6765\u751f\u6210\u8be5\u6837\u672c\u7684\u6807\u7b7e\uff1a<\/p>\n\n\n\n<p>$$<br>y = 0.05 + \\sum_{i = 1}^p 0.01x_i + \\epsilon<br>$$<\/p>\n\n\n\n<p>\u5176\u4e2d\u566a\u58f0\u9879$\\epsilon$\u670d\u4ece\u5747\u503c\u4e3a0\u3001\u6807\u51c6\u5dee\u4e3a0.01\u7684\u6b63\u6001\u5206\u5e03\u3002\u4e3a\u4e86\u8f83\u5bb9\u6613\u5730\u89c2\u5bdf\u8fc7\u62df\u5408\uff0c\u8003\u8651\u9ad8\u7ef4\u7ebf\u6027\u56de\u5f52\u95ee\u9898\uff0c\u5982\u8bbe\u7ef4\u5ea6$p=200$\uff1b\u540c\u65f6\uff0c\u7279\u610f\u628a\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u6837\u672c\u6570\u8bbe\u4f4e\uff0c\u598220\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import torch\nimport torch.nn as nn\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom IPython import display\n\nn_train, n_test, num_inputs = 20, 100, 200\ntrue_w, true_b = torch.ones(num_inputs, 1) * 0.01, 0.05 # torch.ones \u662f\u751f\u6210\u4e00\u4e2a\u5168\u4e3a1\u7684\u5f20\u91cf\n\n\nfeatures = torch.randn((n_train + n_test, num_inputs)) # \u751f\u6210x\nlabels = torch.matmul(features, true_w) + true_b # \u751f\u6210y\nlabels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float) # \u52a0\u5165\u566a\u58f0\ntrain_features, test_features = features&#91;:n_train, :], features&#91;n_train:, :]  # \u5212\u5206\u8bad\u7ec3\u96c6\u548c\u6d4b\u8bd5\u96c6\ntrain_labels, test_labels = labels&#91;:n_train], labels&#91;n_train:] # \u5212\u5206\u8bad\u7ec3\u96c6\u548c\u6d4b\u8bd5\u96c6<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">\u4ece\u96f6\u5f00\u59cb\u5b9e\u73b0<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\u521d\u59cb\u5316\u6a21\u578b\u53c2\u6570<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>def init_params():\n    w = torch.randn((num_inputs, 1), requires_grad=True) # \u751f\u6210w\uff0c\u8bbe\u7f6e\u68af\u5ea6\n    b = torch.zeros(1, requires_grad=True) # \u751f\u6210b\uff0c\u8bbe\u7f6e\u68af\u5ea6\n    return &#91;w, b]  # \u8fd4\u56dew\u548cb<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">3.12.3.2 \u5b9a\u4e49$L_2$\u8303\u6570\u60e9\u7f5a\u9879<\/h3>\n\n\n\n<p>\u4e0b\u9762\u5b9a\u4e49$L_2$\u8303\u6570\u60e9\u7f5a\u9879\u3002\u8fd9\u91cc\u53ea\u60e9\u7f5a\u6a21\u578b\u7684\u6743\u91cd\u53c2\u6570\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def l2_penalty(w):\n    return (w**2).sum() \/ 2 # \u8fd4\u56del2\u8303\u6570<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">3.12.3.3 \u5b9a\u4e49\u8bad\u7ec3\u548c\u6d4b\u8bd5<\/h3>\n\n\n\n<p>\u5b9a\u4e49\u7ebf\u6027\u6a21\u578b\uff0c\u635f\u5931\u51fd\u6570\u548c\u4f18\u5316\u7b97\u6cd5<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def linreg(X, w, b):\n    return torch.mm(X, w) + b\n\ndef squared_loss(y_hat, y): \n    # \u6ce8\u610f\u8fd9\u91cc\u8fd4\u56de\u7684\u662f\u5411\u91cf, \u53e6\u5916, pytorch\u91cc\u7684MSELoss\u5e76\u6ca1\u6709\u9664\u4ee5 2\n    return ((y_hat - y.view(y_hat.size())) ** 2) \/ 2\n\ndef sgd(params, lr, batch_size):\n    for param in params:\n        param.data -= lr * param.grad \/ batch_size # \u6ce8\u610f\u8fd9\u91cc\u66f4\u6539param\u65f6\u7528\u7684param.data<\/code><\/pre>\n\n\n\n<p>\u5b9a\u4e49\u7ed8\u56fe\u51fd\u6570<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def set_figsize(figsize=(3.5, 2.5)):\n    display.set_matplotlib_formats('png')\n    plt.rcParams&#91;'figure.figsize'] = figsize\n\ndef semilogy(x_vals, y_vals, x_label, y_label, x2_vals=None, y2_vals=None,\n             legend=None, figsize=(3.5, 2.5)):\n    set_figsize(figsize)\n    plt.xlabel(x_label)\n    plt.ylabel(y_label)\n    plt.semilogy(x_vals, y_vals) # y\u8f74\u4f7f\u7528\u5bf9\u6570\u5c3a\u5ea6\n    if x2_vals and y2_vals:\n        plt.semilogy(x2_vals, y2_vals, linestyle=':')\n        plt.legend(legend)\n    plt.show()<\/code><\/pre>\n\n\n\n<p>\u8bad\u7ec3\u6a21\u578b<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>batch_size, num_epochs, lr = 1, 100, 0.003 # \u8bbe\u7f6e\u8d85\u53c2\u6570\nnet, loss = linreg, squared_loss # \u8bbe\u7f6e\u6a21\u578b\u548c\u635f\u5931\u51fd\u6570\n\ndataset = torch.utils.data.TensorDataset(train_features, train_labels) # \u8bbe\u7f6e\u6570\u636e\u96c6\ntrain_iter = torch.utils.data.DataLoader(dataset, batch_size, shuffle=True) # \u8bbe\u7f6e\u6570\u636e\u8fed\u4ee3\u5668\n\ndef fit_and_plot(lambd):  \n    w, b = init_params() # \u521d\u59cb\u5316\u53c2\u6570\n    train_ls, test_ls = &#91;], &#91;] # \u521d\u59cb\u5316\u635f\u5931\n    for _ in range(num_epochs):\n        for X, y in train_iter:\n            l = loss(net(X, w, b), y) + lambd * l2_penalty(w) # \u6dfb\u52a0\u4e86L2\u8303\u6570\u60e9\u7f5a\u9879\n            l = l.sum() # \u635f\u5931\u6c42\u548c\n\n            if w.grad is not None: # \u68af\u5ea6\u6e05\u96f6\n                w.grad.data.zero_() \n                b.grad.data.zero_()\n            l.backward() # \u53cd\u5411\u4f20\u64ad\n            sgd(&#91;w, b], lr, batch_size) # \u66f4\u65b0\u53c2\u6570\n        train_ls.append(loss(net(train_features, w, b), train_labels).mean().item()) # \u8ba1\u7b97\u8bad\u7ec3\u96c6\u635f\u5931\n        test_ls.append(loss(net(test_features, w, b), test_labels).mean().item())\n    semilogy(range(1, num_epochs + 1), train_ls, 'epochs', 'loss',\n                 range(1, num_epochs + 1), test_ls, &#91;'train', 'test']) # \u753b\u56fe\n    print('L2 norm of w:', w.norm().item()) # \u8f93\u51faL2\u8303\u6570<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">\u89c2\u5bdf\u8fc7\u62df\u5408<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>fit_and_plot(lambd=0) # \u4e0d\u4f7f\u7528\u6743\u91cd\u8870\u51cf<\/code><\/pre>\n\n\n\n<p>\u8f93\u51fa\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>L2 norm of w: 14.070374488830566<\/code><\/pre>\n\n\n\n<p>\u7ed3\u679c\u8bad\u7ec3\u8bef\u5dee\u8fdc\u5c0f\u4e8e\u6d4b\u8bd5\u96c6\u4e0a\u7684\u8bef\u5dee\u3002\u8fd9\u662f\u5178\u578b\u7684\u8fc7\u62df\u5408\u73b0\u8c61\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\"   class=\"lazyload\" data-src=\"https:\/\/cos.forillusion.top\/wp-content\/uploads\/2025\/02\/3.12_output1.png\" src=\"https:\/\/cdn.forillusion.com\/moezx\/img\/svg\/loader\/trans.ajax-spinner-preloader.svg\" onerror=\"imgError(this)\"  alt=\"\"\/><\/figure >\n<noscript><img decoding=\"async\" src=\"https:\/\/cos.forillusion.top\/wp-content\/uploads\/2025\/02\/3.12_output1.png\" alt=\"\"\/><\/figure><\/noscript>\n\n\n\n<h3 class=\"wp-block-heading\">\u4f7f\u7528\u6743\u91cd\u8870\u51cf<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>fit_and_plot(lambd=3) # \u4f7f\u7528\u6743\u91cd\u8870\u51cf<\/code><\/pre>\n\n\n\n<p>\u8f93\u51fa\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>L2 norm of w: 0.033579129725694656<\/code><\/pre>\n\n\n\n<p>\u8bad\u7ec3\u8bef\u5dee\u867d\u7136\u6709\u6240\u63d0\u9ad8\uff0c\u4f46\u6d4b\u8bd5\u96c6\u4e0a\u7684\u8bef\u5dee\u6709\u6240\u4e0b\u964d\u3002\u8fc7\u62df\u5408\u73b0\u8c61\u5f97\u5230\u4e00\u5b9a\u7a0b\u5ea6\u7684\u7f13\u89e3\u3002\u53e6\u5916\uff0c\u6743\u91cd\u53c2\u6570\u7684$L_2$\u8303\u6570\u6bd4\u4e0d\u4f7f\u7528\u6743\u91cd\u8870\u51cf\u65f6\u7684\u66f4\u5c0f\uff0c\u6b64\u65f6\u7684\u6743\u91cd\u53c2\u6570\u66f4\u63a5\u8fd10\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\"   class=\"lazyload\" data-src=\"https:\/\/cos.forillusion.top\/wp-content\/uploads\/2025\/02\/3.12_output2.png\" src=\"https:\/\/cdn.forillusion.com\/moezx\/img\/svg\/loader\/trans.ajax-spinner-preloader.svg\" onerror=\"imgError(this)\"  alt=\"\"\/><\/figure >\n<noscript><img decoding=\"async\" src=\"https:\/\/cos.forillusion.top\/wp-content\/uploads\/2025\/02\/3.12_output2.png\" alt=\"\"\/><\/figure><\/noscript>\n\n\n\n<h2 class=\"wp-block-heading\">\u7b80\u6d01\u5b9e\u73b0<\/h2>\n\n\n\n<p>\u76f4\u63a5\u5728\u6784\u9020\u4f18\u5316\u5668\u5b9e\u4f8b\u65f6\u901a\u8fc7<code>weight_decay<\/code>\u53c2\u6570\u6765\u6307\u5b9a\u6743\u91cd\u8870\u51cf\u8d85\u53c2\u6570\u3002\u9ed8\u8ba4\u4e0b\uff0cPyTorch\u4f1a\u5bf9\u6743\u91cd\u548c\u504f\u5dee\u540c\u65f6\u8870\u51cf\u3002\u5206\u522b\u5bf9\u6743\u91cd\u548c\u504f\u5dee\u6784\u9020\u4f18\u5316\u5668\u5b9e\u4f8b\uff0c\u4ece\u800c\u53ea\u5bf9\u6743\u91cd\u8870\u51cf\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def fit_and_plot_pytorch(wd):\n    # \u5bf9\u6743\u91cd\u53c2\u6570\u8870\u51cf\u3002\u6743\u91cd\u540d\u79f0\u4e00\u822c\u662f\u4ee5weight\u7ed3\u5c3e\n    net = nn.Linear(num_inputs, 1) # \u7ebf\u6027\u56de\u5f52\u6a21\u578b\n    nn.init.normal_(net.weight, mean=0, std=1) # \u521d\u59cb\u5316\u6743\u91cd\n    nn.init.normal_(net.bias, mean=0, std=1) # \u521d\u59cb\u5316\u504f\u5dee\n    optimizer_w = torch.optim.SGD(params=&#91;net.weight], lr=lr, weight_decay=wd) # \u5bf9\u6743\u91cd\u53c2\u6570\u8870\u51cf\n    optimizer_b = torch.optim.SGD(params=&#91;net.bias], lr=lr)  # \u4e0d\u5bf9\u504f\u5dee\u53c2\u6570\u8870\u51cf\n\n    train_ls, test_ls = &#91;], &#91;] # \u521d\u59cb\u5316\u635f\u5931\n    for _ in range(num_epochs):\n        for X, y in train_iter:\n            l = loss(net(X), y).mean() # \u8ba1\u7b97\u635f\u5931\n            optimizer_w.zero_grad() # \u68af\u5ea6\u6e05\u96f6\n            optimizer_b.zero_grad() # \u68af\u5ea6\u6e05\u96f6\n\n            l.backward() # \u53cd\u5411\u4f20\u64ad\n\n            # \u5bf9\u4e24\u4e2aoptimizer\u5b9e\u4f8b\u5206\u522b\u8c03\u7528step\u51fd\u6570\uff0c\u4ece\u800c\u5206\u522b\u66f4\u65b0\u6743\u91cd\u548c\u504f\u5dee\n            optimizer_w.step()\n            optimizer_b.step()\n        train_ls.append(loss(net(train_features), train_labels).mean().item())\n        test_ls.append(loss(net(test_features), test_labels).mean().item())\n    semilogy(range(1, num_epochs + 1), train_ls, 'epochs', 'loss',\n                 range(1, num_epochs + 1), test_ls, &#91;'train', 'test'])\n    print('L2 norm of w:', net.weight.data.norm().item())<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>\u8303\u6570 L1\u8303\u6570 L1\u8303\u6570\u662f\u5411\u91cf\u4e2d\u6240\u6709\u5143\u7d20\u7edd\u5bf9\u503c\u7684\u548c\u3002\u5bf9\u4e8e\u4e00\u4e2a\u5411\u91cf $ w = [w_1, w_2, \u2026, w_n] $\uff0c\u5176L1\u8303\u6570 &#8230;<\/p>","protected":false},"author":1,"featured_media":1681,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[46,3],"tags":[45,44,12,22],"class_list":["post-1679","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-46","category-3","tag-45","tag-44","tag-12","tag-22"],"_links":{"self":[{"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/posts\/1679","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/comments?post=1679"}],"version-history":[{"count":1,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/posts\/1679\/revisions"}],"predecessor-version":[{"id":1710,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/posts\/1679\/revisions\/1710"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/media\/1681"}],"wp:attachment":[{"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/media?parent=1679"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/categories?post=1679"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/tags?post=1679"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}