{"id":1867,"date":"2025-03-07T11:54:54","date_gmt":"2025-03-07T03:54:54","guid":{"rendered":"https:\/\/www.forillusion.com\/?p=1867"},"modified":"2025-03-07T12:03:28","modified_gmt":"2025-03-07T04:03:28","slug":"6-6-bptt","status":"publish","type":"post","link":"https:\/\/www.forillusion.com\/index.php\/6-6-bptt\/","title":{"rendered":"6.6 \u901a\u8fc7\u65f6\u95f4\u53cd\u5411\u4f20\u64ad"},"content":{"rendered":"\n<p><div class=\"has-toc have-toc\"><\/div><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u5b9a\u4e49\u6a21\u578b<\/h2>\n\n\n\n<p>\u7b80\u5355\u8d77\u89c1\uff0c\u8003\u8651\u4e00\u4e2a\u65e0\u504f\u5dee\u9879\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\uff0c\u4e14\u6fc0\u6d3b\u51fd\u6570\u4e3a\u6052\u7b49\u6620\u5c04\uff08$\\phi(x)=x$\uff09\u3002\u8bbe\u65f6\u95f4\u6b65 $t$ \u7684\u8f93\u5165\u4e3a\u5355\u6837\u672c $\\boldsymbol{x}_t \\in \\mathbb{R}^d$\uff0c\u6807\u7b7e\u4e3a $y_t$\uff0c\u90a3\u4e48\u9690\u85cf\u72b6\u6001 $\\boldsymbol{h}_t \\in \\mathbb{R}^h$\u7684\u8ba1\u7b97\u8868\u8fbe\u5f0f\u4e3a<\/p>\n\n\n\n<p>$$<br>\\boldsymbol{h}_t = \\boldsymbol{W}_{hx} \\boldsymbol{x}_t + \\boldsymbol{W}_{hh} \\boldsymbol{h}_{t-1},<br>$$<\/p>\n\n\n\n<p>\u5176\u4e2d$\\boldsymbol{W}_{hx} \\in \\mathbb{R}^{h \\times d}$\u548c$\\boldsymbol{W}_{hh} \\in \\mathbb{R}^{h \\times h}$\u662f\u9690\u85cf\u5c42\u6743\u91cd\u53c2\u6570\u3002\u8bbe\u8f93\u51fa\u5c42\u6743\u91cd\u53c2\u6570$\\boldsymbol{W}_{qh} \\in \\mathbb{R}^{q \\times h}$\uff0c\u65f6\u95f4\u6b65$t$\u7684\u8f93\u51fa\u5c42\u53d8\u91cf$\\boldsymbol{o}_t \\in \\mathbb{R}^q$\u8ba1\u7b97\u4e3a<\/p>\n\n\n\n<p>$$<br>\\boldsymbol{o}_t = \\boldsymbol{W}_{qh} \\boldsymbol{h}_{t}.<br>$$<\/p>\n\n\n\n<p>\u8bbe\u65f6\u95f4\u6b65$t$\u7684\u635f\u5931\u4e3a$\\ell(\\boldsymbol{o}_t, y_t)$\u3002\u65f6\u95f4\u6b65\u6570\u4e3a$T$\u7684\u635f\u5931\u51fd\u6570$L$\u5b9a\u4e49\u4e3a<\/p>\n\n\n\n<p>$$<br>L = \\frac{1}{T} \\sum_{t=1}^T \\ell (\\boldsymbol{o}_t, y_t).<br>$$<\/p>\n\n\n\n<p>\u5c06$L$\u79f0\u4e3a\u6709\u5173\u7ed9\u5b9a\u65f6\u95f4\u6b65\u7684\u6570\u636e\u6837\u672c\u7684\u76ee\u6807\u51fd\u6570\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u6a21\u578b\u8ba1\u7b97\u56fe<\/h2>\n\n\n\n<p>\u4e3a\u4e86\u53ef\u89c6\u5316\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u4e2d\u6a21\u578b\u53d8\u91cf\u548c\u53c2\u6570\u5728\u8ba1\u7b97\u4e2d\u7684\u4f9d\u8d56\u5173\u7cfb\uff0c\u53ef\u4ee5\u7ed8\u5236\u6a21\u578b\u8ba1\u7b97\u56fe\uff0c\u5982\u4e0b\u56fe\u6240\u793a\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\"   class=\"lazyload\" data-src=\"https:\/\/cos.forillusion.top\/wp-content\/uploads\/2025\/03\/6.6_rnn-bptt.png\" src=\"https:\/\/cdn.forillusion.com\/moezx\/img\/svg\/loader\/trans.ajax-spinner-preloader.svg\" onerror=\"imgError(this)\"  alt=\"\"\/><\/figure >\n<noscript><img decoding=\"async\" src=\"https:\/\/cos.forillusion.top\/wp-content\/uploads\/2025\/03\/6.6_rnn-bptt.png\" alt=\"\"\/><\/figure><\/noscript>\n\n\n\n<h2 class=\"wp-block-heading\">\u65b9\u6cd5<\/h2>\n\n\n\n<p>\u521a\u521a\u63d0\u5230\uff0c\u56fe\u4e2d\u7684\u6a21\u578b\u7684\u53c2\u6570\u662f $\\boldsymbol{W}_{hx}$, $\\boldsymbol{W}_{hh}$ \u548c $\\boldsymbol{W}_{qh}$\u3002\u8bad\u7ec3\u6a21\u578b\u901a\u5e38\u9700\u8981\u6a21\u578b\u53c2\u6570\u7684\u68af\u5ea6$\\partial L\/\\partial \\boldsymbol{W}_{hx}$\u3001$\\partial L\/\\partial \\boldsymbol{W}_{hh}$\u548c$\\partial L\/\\partial \\boldsymbol{W}_{qh}$\u3002<br>\u6839\u636e\u56fe6.3\u4e2d\u7684\u4f9d\u8d56\u5173\u7cfb\uff0c\u6309\u7167\u5176\u4e2d\u7bad\u5934\u6240\u6307\u7684\u53cd\u65b9\u5411\u4f9d\u6b21\u8ba1\u7b97\u5e76\u5b58\u50a8\u68af\u5ea6\u3002<\/p>\n\n\n\n<p>\u9996\u5148\uff0c\u76ee\u6807\u51fd\u6570\u6709\u5173\u5404\u65f6\u95f4\u6b65\u8f93\u51fa\u5c42\u53d8\u91cf\u7684\u68af\u5ea6$\\partial L\/\\partial \\boldsymbol{o}_t \\in \\mathbb{R}^q$\uff1a<\/p>\n\n\n\n<p>$$<br>\\frac{\\partial L}{\\partial \\boldsymbol{o}_t} = \\frac{\\partial \\ell (\\boldsymbol{o}_t, y_t)}{T \\cdot \\partial \\boldsymbol{o}_t}.<br>$$<\/p>\n\n\n\n<p>\u4e0b\u9762\uff0c\u53ef\u4ee5\u8ba1\u7b97\u76ee\u6807\u51fd\u6570\u6709\u5173\u6a21\u578b\u53c2\u6570$\\boldsymbol{W}_{qh}$\u7684\u68af\u5ea6$\\partial L\/\\partial \\boldsymbol{W}_{qh} \\in \\mathbb{R}^{q \\times h}$\u3002\u6839\u636e\u4e0a\u56fe\uff0c$L$\u901a\u8fc7$\\boldsymbol{o}_1, \\ldots, \\boldsymbol{o}_T$\u4f9d\u8d56$\\boldsymbol{W}_{qh}$\u3002\u4f9d\u636e\u94fe\u5f0f\u6cd5\u5219\uff0c<\/p>\n\n\n\n<p>$$<br>\\frac{\\partial L}{\\partial \\boldsymbol{W}_{qh}}<br>= \\sum_{t=1}^T \\text{prod}\\left(\\frac{\\partial L}{\\partial \\boldsymbol{o}_t}, \\frac{\\partial \\boldsymbol{o}_t}{\\partial \\boldsymbol{W}_{qh}}\\right)<br>= \\sum_{t=1}^T \\frac{\\partial L}{\\partial \\boldsymbol{o}_t} \\boldsymbol{h}_t^\\top.<br>$$<\/p>\n\n\n\n<p>\u5176\u6b21\uff0c\u6ce8\u610f\u5230\u9690\u85cf\u72b6\u6001\u4e4b\u95f4\u4e5f\u5b58\u5728\u4f9d\u8d56\u5173\u7cfb\u3002<br>\u5728\u4e0a\u56fe\u4e2d\uff0c$L$\u53ea\u901a\u8fc7$\\boldsymbol{o}_T$\u4f9d\u8d56\u6700\u7ec8\u65f6\u95f4\u6b65$T$\u7684\u9690\u85cf\u72b6\u6001$\\boldsymbol{h}_T$\u3002\u56e0\u6b64\uff0c\u5148\u8ba1\u7b97\u76ee\u6807\u51fd\u6570\u6709\u5173\u6700\u7ec8\u65f6\u95f4\u6b65\u9690\u85cf\u72b6\u6001\u7684\u68af\u5ea6$\\partial L\/\\partial \\boldsymbol{h}_T \\in \\mathbb{R}^h$\u3002\u4f9d\u636e\u94fe\u5f0f\u6cd5\u5219\uff0c\u5f97\u5230<\/p>\n\n\n\n<p>$$<br>\\frac{\\partial L}{\\partial \\boldsymbol{h}_T} = \\text{prod}\\left(\\frac{\\partial L}{\\partial \\boldsymbol{o}_T}, \\frac{\\partial \\boldsymbol{o}_T}{\\partial \\boldsymbol{h}_T} \\right) = \\boldsymbol{W}_{qh}^\\top \\frac{\\partial L}{\\partial \\boldsymbol{o}_T}.<br>$$<\/p>\n\n\n\n<p>\u63a5\u4e0b\u6765\u5bf9\u4e8e\u65f6\u95f4\u6b65$t &lt; T$, \u5728\u56fe\u4e2d\uff0c$L$\u901a\u8fc7$\\boldsymbol{h}_{t+1}$\u548c$\\boldsymbol{o}_t$\u4f9d\u8d56$\\boldsymbol{h}_t$\u3002\u4f9d\u636e\u94fe\u5f0f\u6cd5\u5219\uff0c<br>\u76ee\u6807\u51fd\u6570\u6709\u5173\u65f6\u95f4\u6b65$t &lt; T$\u7684\u9690\u85cf\u72b6\u6001\u7684\u68af\u5ea6$\\partial L\/\\partial \\boldsymbol{h}_t \\in \\mathbb{R}^h$\u9700\u8981\u6309\u7167\u65f6\u95f4\u6b65\u4ece\u5927\u5230\u5c0f\u4f9d\u6b21\u8ba1\u7b97\uff1a<\/p>\n\n\n\n<p>$$<br>\\frac{\\partial L}{\\partial \\boldsymbol{h}_t}<br>= \\text{prod} (\\frac{\\partial L}{\\partial \\boldsymbol{h}_{t+1}}, \\frac{\\partial \\boldsymbol{h}_{t+1}}{\\partial \\boldsymbol{h}_t}) + \\text{prod} (\\frac{\\partial L}{\\partial \\boldsymbol{o}_t}, \\frac{\\partial \\boldsymbol{o}_t}{\\partial \\boldsymbol{h}_t} ) = \\boldsymbol{W}_{hh}^\\top \\frac{\\partial L}{\\partial \\boldsymbol{h}_{t+1}} + \\boldsymbol{W}_{qh}^\\top \\frac{\\partial L}{\\partial \\boldsymbol{o}_t}<br>$$<\/p>\n\n\n\n<p>\u5c06\u4e0a\u9762\u7684\u9012\u5f52\u516c\u5f0f\u5c55\u5f00\uff0c\u5bf9\u4efb\u610f\u65f6\u95f4\u6b65$1 \\leq t \\leq T$\uff0c\u53ef\u4ee5\u5f97\u5230\u76ee\u6807\u51fd\u6570\u6709\u5173\u9690\u85cf\u72b6\u6001\u68af\u5ea6\u7684\u901a\u9879\u516c\u5f0f<\/p>\n\n\n\n<p>$$<br>\\frac{\\partial L}{\\partial \\boldsymbol{h}_t}<br>= \\sum_{i=t}^T {\\left(\\boldsymbol{W}_{hh}^\\top\\right)}^{T-i} \\boldsymbol{W}_{qh}^\\top \\frac{\\partial L}{\\partial \\boldsymbol{o}_{T+t-i}}.<br>$$<\/p>\n\n\n\n<p>\u7531\u4e0a\u5f0f\u4e2d\u7684\u6307\u6570\u9879\u53ef\u89c1\uff0c\u5f53\u65f6\u95f4\u6b65\u6570 $T$ \u8f83\u5927\u6216\u8005\u65f6\u95f4\u6b65 $t$ \u8f83\u5c0f\u65f6\uff0c\u76ee\u6807\u51fd\u6570\u6709\u5173\u9690\u85cf\u72b6\u6001\u7684\u68af\u5ea6\u8f83\u5bb9\u6613\u51fa\u73b0\u8870\u51cf\u548c\u7206\u70b8\u3002\u8fd9\u4e5f\u4f1a\u5f71\u54cd\u5176\u4ed6\u5305\u542b$\\partial L \/ \\partial \\boldsymbol{h}_t$\u9879\u7684\u68af\u5ea6\uff0c\u4f8b\u5982\u9690\u85cf\u5c42\u4e2d\u6a21\u578b\u53c2\u6570\u7684\u68af\u5ea6$\\partial L \/ \\partial \\boldsymbol{W}_{hx} \\in \\mathbb{R}^{h \\times d}$\u548c$\\partial L \/ \\partial \\boldsymbol{W}_{hh} \\in \\mathbb{R}^{h \\times h}$\u3002<br>\u5728\u56fe\u4e2d\uff0c$L$\u901a\u8fc7$\\boldsymbol{h}_1, \\ldots, \\boldsymbol{h}_T$\u4f9d\u8d56\u8fd9\u4e9b\u6a21\u578b\u53c2\u6570\u3002<br>\u4f9d\u636e\u94fe\u5f0f\u6cd5\u5219\uff0c\u6709<\/p>\n\n\n\n<p>$$<br>\\begin{aligned}<br>\\frac{\\partial L}{\\partial \\boldsymbol{W}_{hx}}<br>&amp;= \\sum_{t=1}^T \\text{prod}\\left(\\frac{\\partial L}{\\partial \\boldsymbol{h}_t}, \\frac{\\partial \\boldsymbol{h}_t}{\\partial \\boldsymbol{W}_{hx}}\\right)<br>= \\sum_{t=1}^T \\frac{\\partial L}{\\partial \\boldsymbol{h}_t} \\boldsymbol{x}_t^\\top,\\\\<br>\\frac{\\partial L}{\\partial \\boldsymbol{W}_{hh}}<br>&amp;= \\sum_{t=1}^T \\text{prod}\\left(\\frac{\\partial L}{\\partial \\boldsymbol{h}_t}, \\frac{\\partial \\boldsymbol{h}_t}{\\partial \\boldsymbol{W}_{hh}}\\right)<br>= \\sum_{t=1}^T \\frac{\\partial L}{\\partial \\boldsymbol{h}_t} \\boldsymbol{h}_{t-1}^\\top.<br>\\end{aligned}<br>$$<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u5b9a\u4e49\u6a21\u578b \u7b80\u5355\u8d77\u89c1\uff0c\u8003\u8651\u4e00\u4e2a\u65e0\u504f\u5dee\u9879\u7684\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\uff0c\u4e14\u6fc0\u6d3b\u51fd\u6570\u4e3a\u6052\u7b49\u6620\u5c04\uff08$\\phi(x)=x$\uff09\u3002\u8bbe\u65f6\u95f4\u6b65 $t$ \u7684\u8f93\u5165\u4e3a\u5355\u6837 &#8230;<\/p>","protected":false},"author":1,"featured_media":1866,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[46,3],"tags":[45,44,12,22],"class_list":["post-1867","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-46","category-3","tag-45","tag-44","tag-12","tag-22"],"_links":{"self":[{"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/posts\/1867","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/comments?post=1867"}],"version-history":[{"count":5,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/posts\/1867\/revisions"}],"predecessor-version":[{"id":1874,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/posts\/1867\/revisions\/1874"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/media\/1866"}],"wp:attachment":[{"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/media?parent=1867"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/categories?post=1867"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.forillusion.com\/index.php\/wp-json\/wp\/v2\/tags?post=1867"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}