<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>均衡 on heyaohua's Blog</title><link>https://blog.heyaohua.com/tags/%E5%9D%87%E8%A1%A1/</link><description>Recent content in 均衡 on heyaohua's Blog</description><image><title>heyaohua's Blog</title><url>https://blog.heyaohua.com/og-image.png</url><link>https://blog.heyaohua.com/og-image.png</link></image><generator>Hugo</generator><language>zh-cn</language><lastBuildDate>Wed, 01 May 2024 11:00:00 +0800</lastBuildDate><atom:link href="https://blog.heyaohua.com/tags/%E5%9D%87%E8%A1%A1/index.xml" rel="self" type="application/rss+xml"/><item><title>HDFS均衡操作快速参考</title><link>https://blog.heyaohua.com/posts/2024/05/hdfs-balancer-fast/</link><pubDate>Wed, 01 May 2024 11:00:00 +0800</pubDate><guid>https://blog.heyaohua.com/posts/2024/05/hdfs-balancer-fast/</guid><description>查看日志：</description><content:encoded><![CDATA[<h2 id="快速判断是否需要均衡">快速判断是否需要均衡</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4"># 计算当前均衡度（标准差）</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#ff79c6">|</span> python3 <span style="color:#ff79c6">-</span>c <span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> sys<span style="color:#ff79c6">,</span> re
</span></span><span style="display:flex;"><span>used_percents <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> sys<span style="color:#ff79c6">.</span>stdin:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span> <span style="color:#ff79c6">in</span> line:
</span></span><span style="display:flex;"><span>        percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;(\d+\.?\d*)%&#39;</span>, line)<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>))
</span></span><span style="display:flex;"><span>        used_percents<span style="color:#ff79c6">.</span>append(percent)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> used_percents:
</span></span><span style="display:flex;"><span>    avg <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;标准差: </span><span style="color:#f1fa8c">{</span>std_dev<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">%&#39;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> std_dev <span style="color:#ff79c6">&gt;</span> <span style="color:#bd93f9">15</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;⚠️  需要立即均衡&#39;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">elif</span> std_dev <span style="color:#ff79c6">&gt;</span> <span style="color:#bd93f9">10</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;⚠️  建议进行均衡&#39;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">else</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;✅ 集群已均衡&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;</span>
</span></span></code></pre></div><h2 id="常用均衡命令">常用均衡命令</h2>
<h3 id="基本均衡">基本均衡</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 标准均衡（推荐）
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 -policy datanode &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 严格均衡
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 5 -policy datanode &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 宽松均衡
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 15 -policy datanode &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span></code></pre></div><h3 id="高级均衡">高级均衡</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 排除特定节点
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 -exclude 192.168.1.100,192.168.1.101 &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 只均衡特定节点
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 -include 192.168.1.102,192.168.1.103 &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 指定源节点
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 -source 192.168.1.100,192.168.1.101 &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span></code></pre></div><h2 id="参数说明">参数说明</h2>
<table>
  <thead>
      <tr>
          <th>参数</th>
          <th>用途</th>
          <th>默认值</th>
          <th>推荐值</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>-threshold</code></td>
          <td>均衡阈值(%)</td>
          <td>10</td>
          <td>5-15</td>
      </tr>
      <tr>
          <td><code>-policy</code></td>
          <td>均衡策略</td>
          <td>datanode</td>
          <td>datanode</td>
      </tr>
      <tr>
          <td><code>-exclude</code></td>
          <td>排除节点</td>
          <td>-</td>
          <td>维护节点</td>
      </tr>
      <tr>
          <td><code>-include</code></td>
          <td>包含节点</td>
          <td>-</td>
          <td>特定节点</td>
      </tr>
      <tr>
          <td><code>-source</code></td>
          <td>源节点</td>
          <td>-</td>
          <td>高负载节点</td>
      </tr>
      <tr>
          <td><code>-idleiterations</code></td>
          <td>空闲迭代次数</td>
          <td>5</td>
          <td>3-5</td>
      </tr>
  </tbody>
</table>
<h2 id="监控命令">监控命令</h2>
<h3 id="检查均衡状态">检查均衡状态</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查均衡进程</span>
</span></span><span style="display:flex;"><span>ps aux | grep balancer
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 查看均衡日志</span>
</span></span><span style="display:flex;"><span>tail -f /tmp/balancer.log
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 实时监控均衡进度</span>
</span></span><span style="display:flex;"><span>python3 /tmp/monitor_hdfs_balancer.py
</span></span></code></pre></div><h3 id="停止均衡">停止均衡</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 查找并停止均衡进程
</span></span><span style="display:flex;"><span>pkill -f &#34;hdfs.*balancer&#34;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 或者通过PID停止
</span></span><span style="display:flex;"><span>kill $(cat /tmp/balancer.pid)
</span></span></code></pre></div><h2 id="性能优化">性能优化</h2>
<h3 id="调整均衡带宽">调整均衡带宽</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span><span style="color:#6272a4">&lt;!-- 在hdfs-site.xml中添加 --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>dfs.datanode.balance.bandwidthPerSec<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>52428800<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 50MB/s --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span></code></pre></div><h3 id="系统优化">系统优化</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 网络优化
</span></span><span style="display:flex;"><span>echo &#39;net.core.rmem_max = 134217728&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>echo &#39;net.core.wmem_max = 134217728&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>sysctl -p
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 磁盘优化
</span></span><span style="display:flex;"><span>echo noop &gt; /sys/block/sda/queue/scheduler
</span></span></code></pre></div><h2 id="故障排除">故障排除</h2>
<h3 id="常见问题">常见问题</h3>
<ol>
<li><strong>均衡进程无法启动</strong></li>
<li>检查HDFS服务状态：<code>hdfs dfsadmin -report</code></li>
<li>检查权限：<code>whoami</code></li>
<li></li>
</ol>
<p>查看日志：<code>tail -f $HADOOP_LOG_DIR/hadoop-*-balancer-*.log</code></p>
<ol start="5">
<li></li>
</ol>
<p><strong>均衡速度过慢</strong></p>
<ol start="6">
<li>检查网络：<code>iperf3 -c &lt;target_node&gt;</code></li>
<li>检查磁盘I/O：<code>iostat -x 1 5</code></li>
<li></li>
</ol>
<p>调整均衡带宽</p>
<ol start="9">
<li></li>
</ol>
<p><strong>均衡进程异常退出</strong></p>
<ol start="10">
<li>检查系统资源：<code>free -h</code>, <code>df -h</code></li>
<li>查看系统日志：<code>dmesg | tail -50</code></li>
<li>重新启动均衡</li>
</ol>
<h2 id="最佳实践">最佳实践</h2>
<ol>
<li><strong>时间选择</strong>：在业务低峰期进行均衡</li>
<li><strong>参数设置</strong>：生产环境使用5-10%阈值</li>
<li><strong>监控告警</strong>：设置自动化监控和告警</li>
<li><strong>分批进行</strong>：大型集群可以分批均衡</li>
<li><strong>数据验证</strong>：均衡后检查数据完整性</li>
</ol>
<h2 id="自动化脚本">自动化脚本</h2>
<h3 id="一键均衡脚本">一键均衡脚本</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4">#!/bin/bash</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查均衡度并自动启动均衡</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>STD_DEV<span style="color:#ff79c6">=</span>$(hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#ff79c6">|</span> python3 <span style="color:#ff79c6">-</span>c <span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> sys<span style="color:#ff79c6">,</span> re
</span></span><span style="display:flex;"><span>used_percents <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> sys<span style="color:#ff79c6">.</span>stdin:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span> <span style="color:#ff79c6">in</span> line:
</span></span><span style="display:flex;"><span>        percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;(\d+\.?\d*)%&#39;</span>, line)<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>))
</span></span><span style="display:flex;"><span>        used_percents<span style="color:#ff79c6">.</span>append(percent)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> used_percents:
</span></span><span style="display:flex;"><span>    avg <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;</span><span style="color:#f1fa8c">{</span>std_dev<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">else</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;0&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;当前均衡度: $</span><span style="color:#f1fa8c">{STD_DEV}</span><span style="color:#f1fa8c">%&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> (( $(echo <span style="color:#f1fa8c">&#34;$STD_DEV &gt; 10&#34;</span> <span style="color:#ff79c6">|</span> bc <span style="color:#ff79c6">-</span>l) )); then
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;启动均衡...&#34;</span>
</span></span><span style="display:flex;"><span>    nohup hdfs balancer <span style="color:#ff79c6">-</span>threshold <span style="color:#bd93f9">10</span> <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>log <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;&amp;</span><span style="color:#bd93f9">1</span> <span style="color:#ff79c6">&amp;</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;均衡进程已启动，PID: $!&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">else</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;集群已均衡，无需操作&#34;</span>
</span></span><span style="display:flex;"><span>fi
</span></span></code></pre></div><h2 id="监控脚本">监控脚本</h2>
<h3 id="简化监控脚本">简化监控脚本</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#ff79c6">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#6272a4"># 简化版均衡监控</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">while</span> true; <span style="color:#ff79c6">do</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;=== </span><span style="color:#ff79c6">$(</span>date<span style="color:#ff79c6">)</span><span style="color:#f1fa8c"> ===&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 检查均衡进程</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> pgrep -f <span style="color:#f1fa8c">&#34;hdfs.*balancer&#34;</span> &gt; /dev/null; <span style="color:#ff79c6">then</span>
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;✅ 均衡进程正在运行&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">else</span>
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;❌ 均衡进程未运行&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 显示各节点使用率</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;各节点使用率:&#34;</span>
</span></span><span style="display:flex;"><span>    hdfs dfsadmin -report | grep -E <span style="color:#f1fa8c">&#34;(Name:|DFS Used%:)&#34;</span> | <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>        awk <span style="color:#f1fa8c">&#39;NR%2==1{name=$0} NR%2==0{print name &#34; &#34; $0}&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;----------------------------------------&#34;</span>
</span></span><span style="display:flex;"><span>    sleep <span style="color:#bd93f9">60</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">done</span>
</span></span></code></pre></div><hr>
<p><strong>注意</strong>：本快速参考适用于日常运维，详细操作请参考完整版文档。</p>
]]></content:encoded></item><item><title>HDFS均衡操作完整指南</title><link>https://blog.heyaohua.com/posts/2024/05/hdfs-balancer/</link><pubDate>Wed, 01 May 2024 10:00:00 +0800</pubDate><guid>https://blog.heyaohua.com/posts/2024/05/hdfs-balancer/</guid><description>HDFS均衡器（Balancer）是Hadoop分布式文件系统中的一个重要工具，用于重新分布数据块，确保集群中所有DataNode的存储使用率保持相对均衡。当集群中添加新节点或删除节点后，数据分布可能会变得不均匀，这时就需要使用均衡器来重新分布数据。</description><content:encoded><![CDATA[<h2 id="目录">目录</h2>
<ul>
<li><a href="#%E6%A6%82%E8%BF%B0">概述</a></li>
<li><a href="#%E4%BB%80%E4%B9%88%E6%97%B6%E5%80%99%E9%9C%80%E8%A6%81hdfs%E5%9D%87%E8%A1%A1">什么时候需要HDFS均衡</a></li>
<li><a href="#hdfs%E5%9D%87%E8%A1%A1%E5%8E%9F%E7%90%86">HDFS均衡原理</a></li>
<li><a href="#%E5%9D%87%E8%A1%A1%E5%8F%82%E6%95%B0%E8%AF%A6%E8%A7%A3">均衡参数详解</a></li>
<li><a href="#%E6%93%8D%E4%BD%9C%E6%AD%A5%E9%AA%A4">操作步骤</a></li>
<li><a href="#%E7%9B%91%E6%8E%A7%E5%92%8C%E7%AE%A1%E7%90%86">监控和管理</a></li>
<li><a href="#%E6%95%85%E9%9A%9C%E6%8E%92%E9%99%A4">故障排除</a></li>
<li><a href="#%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5">最佳实践</a></li>
<li><a href="#%E6%80%A7%E8%83%BD%E4%BC%98%E5%8C%96%E5%BB%BA%E8%AE%AE">性能优化建议</a></li>
</ul>
<h2 id="概述">概述</h2>
<p>HDFS均衡器（Balancer）是Hadoop分布式文件系统中的一个重要工具，用于重新分布数据块，确保集群中所有DataNode的存储使用率保持相对均衡。当集群中添加新节点或删除节点后，数据分布可能会变得不均匀，这时就需要使用均衡器来重新分布数据。</p>
<h2 id="什么时候需要hdfs均衡">什么时候需要HDFS均衡</h2>
<h3 id="1-集群扩容后">1. 集群扩容后</h3>
<ul>
<li><strong>新增DataNode节点</strong>：新节点加入集群后，存储使用率为0%，而原有节点可能已经接近满载</li>
<li><strong>添加存储设备</strong>：为现有DataNode添加新的磁盘后</li>
</ul>
<h3 id="2-集群缩容后">2. 集群缩容后</h3>
<ul>
<li><strong>移除DataNode节点</strong>：节点下线前需要将其数据迁移到其他节点</li>
<li><strong>磁盘故障</strong>：某个磁盘故障后，需要重新分布数据</li>
</ul>
<h3 id="3-数据倾斜">3. 数据倾斜</h3>
<ul>
<li><strong>节点间使用率差异过大</strong>：标准差超过10-15%</li>
<li><strong>热点数据</strong>：某些节点存储了过多的热点数据</li>
<li><strong>写入模式不均</strong>：应用写入模式导致的数据分布不均</li>
</ul>
<h3 id="4-性能优化">4. 性能优化</h3>
<ul>
<li><strong>负载均衡</strong>：提高集群整体I/O性能</li>
<li><strong>故障恢复</strong>：确保数据副本分布合理</li>
</ul>
<h3 id="判断标准">判断标准</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 计算节点使用率标准差
</span></span><span style="display:flex;"><span># 标准差 &gt; 10%：建议进行均衡
</span></span><span style="display:flex;"><span># 标准差 &gt; 20%：强烈建议立即均衡
</span></span><span style="display:flex;"><span># 标准差 &lt; 5%：认为已均衡
</span></span></code></pre></div><h2 id="hdfs均衡原理">HDFS均衡原理</h2>
<h3 id="1-均衡策略">1. 均衡策略</h3>
<ul>
<li><strong>DataNode策略</strong>：基于整个DataNode的使用率进行均衡</li>
<li><strong>BlockPool策略</strong>：基于命名空间的使用率进行均衡（适用于Federation）</li>
</ul>
<h3 id="2-均衡算法">2. 均衡算法</h3>
<ol>
<li><strong>识别源节点</strong>：使用率高于平均值的节点</li>
<li><strong>识别目标节点</strong>：使用率低于平均值的节点</li>
<li><strong>选择数据块</strong>：从源节点选择合适的数据块</li>
<li><strong>数据迁移</strong>：通过三阶段复制进行数据迁移</li>
<li><strong>验证完整性</strong>：确保数据迁移成功</li>
</ol>
<h3 id="3-数据迁移过程">3. 数据迁移过程</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>源节点 → 中间节点 → 目标节点
</span></span></code></pre></div><ul>
<li>避免直接复制，减少网络压力</li>
<li>通过中间节点进行数据转发</li>
<li>确保数据完整性和一致性</li>
</ul>
<h2 id="均衡参数详解">均衡参数详解</h2>
<h3 id="1-基本参数">1. 基本参数</h3>
<h4 id="-threshold-threshold"><code>-threshold &lt;threshold&gt;</code></h4>
<ul>
<li><strong>用途</strong>：设置均衡阈值，单位为百分比</li>
<li><strong>默认值</strong>：10%</li>
<li><strong>说明</strong>：只有当节点使用率差异超过此阈值时才开始均衡</li>
<li><strong>推荐值</strong>：</li>
<li>生产环境：5-10%</li>
<li>测试环境：10-15%</li>
<li>紧急情况：20%</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 示例</span>
</span></span><span style="display:flex;"><span>hdfs balancer -threshold <span style="color:#bd93f9">5</span>    <span style="color:#6272a4"># 5%阈值，更严格的均衡</span>
</span></span><span style="display:flex;"><span>hdfs balancer -threshold <span style="color:#bd93f9">15</span>   <span style="color:#6272a4"># 15%阈值，更宽松的均衡</span>
</span></span></code></pre></div><h4 id="-policy-policy"><code>-policy &lt;policy&gt;</code></h4>
<ul>
<li><strong>用途</strong>：指定均衡策略</li>
<li><strong>可选值</strong>：</li>
<li><code>datanode</code>：基于DataNode使用率（默认）</li>
<li><code>blockpool</code>：基于BlockPool使用率</li>
<li><strong>推荐</strong>：一般使用<code>datanode</code>策略</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 示例</span>
</span></span><span style="display:flex;"><span>hdfs balancer -policy datanode    <span style="color:#6272a4"># DataNode策略</span>
</span></span><span style="display:flex;"><span>hdfs balancer -policy blockpool   <span style="color:#6272a4"># BlockPool策略</span>
</span></span></code></pre></div><h3 id="2-节点选择参数">2. 节点选择参数</h3>
<h4 id="-exclude--f-hosts-file--comma-separated-list-of-hosts"><code>-exclude [-f &lt;hosts-file&gt; | &lt;comma-separated list of hosts&gt;]</code></h4>
<ul>
<li><strong>用途</strong>：排除指定的DataNode节点</li>
<li><strong>使用场景</strong>：</li>
<li>节点维护期间</li>
<li>性能较差的节点</li>
<li>网络不稳定的节点</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 排除单个节点</span>
</span></span><span style="display:flex;"><span>hdfs balancer -exclude 192.168.1.100
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 排除多个节点</span>
</span></span><span style="display:flex;"><span>hdfs balancer -exclude 192.168.1.100,192.168.1.101
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 从文件读取排除列表</span>
</span></span><span style="display:flex;"><span>hdfs balancer -exclude -f /path/to/exclude_hosts.txt
</span></span></code></pre></div><h4 id="-include--f-hosts-file--comma-separated-list-of-hosts"><code>-include [-f &lt;hosts-file&gt; | &lt;comma-separated list of hosts&gt;]</code></h4>
<ul>
<li><strong>用途</strong>：只对指定的DataNode节点进行均衡</li>
<li><strong>使用场景</strong>：</li>
<li>只均衡特定节点</li>
<li>测试环境</li>
<li>部分节点维护</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 只均衡指定节点</span>
</span></span><span style="display:flex;"><span>hdfs balancer -include 192.168.1.100,192.168.1.101
</span></span></code></pre></div><h4 id="-source--f-hosts-file--comma-separated-list-of-hosts"><code>-source [-f &lt;hosts-file&gt; | &lt;comma-separated list of hosts&gt;]</code></h4>
<ul>
<li><strong>用途</strong>：指定源节点（数据来源）</li>
<li><strong>使用场景</strong>：</li>
<li>特定节点需要减少负载</li>
<li>节点即将下线</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 指定源节点</span>
</span></span><span style="display:flex;"><span>hdfs balancer -source 192.168.1.100,192.168.1.101
</span></span></code></pre></div><h3 id="3-性能控制参数">3. 性能控制参数</h3>
<h4 id="-idleiterations-idleiterations"><code>-idleiterations &lt;idleiterations&gt;</code></h4>
<ul>
<li><strong>用途</strong>：设置连续空闲迭代次数</li>
<li><strong>默认值</strong>：5</li>
<li><strong>说明</strong>：连续N次迭代没有数据移动时退出</li>
<li><strong>推荐值</strong>：</li>
<li>生产环境：3-5</li>
<li>测试环境：1-2</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 示例</span>
</span></span><span style="display:flex;"><span>hdfs balancer -idleiterations <span style="color:#bd93f9">3</span>  <span style="color:#6272a4"># 连续3次无移动则退出</span>
</span></span></code></pre></div><h4 id="-runduringupgrade"><code>-runDuringUpgrade</code></h4>
<ul>
<li><strong>用途</strong>：在HDFS升级期间运行均衡器</li>
<li><strong>默认值</strong>：false</li>
<li><strong>说明</strong>：通常不建议在升级期间运行</li>
</ul>
<h3 id="4-高级参数">4. 高级参数</h3>
<h4 id="-blockpools-comma-separated-list-of-blockpool-ids"><code>-blockpools &lt;comma-separated list of blockpool ids&gt;</code></h4>
<ul>
<li><strong>用途</strong>：指定要均衡的BlockPool ID</li>
<li><strong>适用场景</strong>：Federation环境</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 示例</span>
</span></span><span style="display:flex;"><span>hdfs balancer -blockpools BP-REPLACE_WITH_NEW_PASSWORD789-192.168.1.100-REPLACE_WITH_NEW_PASSWORD7890123
</span></span></code></pre></div><h2 id="操作步骤">操作步骤</h2>
<h3 id="1-环境检查">1. 环境检查</h3>
<h4 id="检查集群状态">检查集群状态</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查HDFS状态</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin -report
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查NameNode状态</span>
</span></span><span style="display:flex;"><span>hdfs haadmin -getServiceState nn1
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查DataNode状态</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin -printTopology
</span></span></code></pre></div><h4 id="检查磁盘空间">检查磁盘空间</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-php" data-lang="php"><span style="display:flex;"><span><span style="color:#6272a4"># 检查各节点磁盘使用情况
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> node in $(hdfs dfsadmin <span style="color:#ff79c6">-</span>printTopology <span style="color:#ff79c6">|</span> grep <span style="color:#ff79c6">-</span>o <span style="color:#f1fa8c">&#39;[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+&#39;</span>); <span style="color:#ff79c6">do</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">echo</span> <span style="color:#f1fa8c">&#34;=== </span><span style="color:#f1fa8c">$node</span><span style="color:#f1fa8c"> ===&#34;</span>
</span></span><span style="display:flex;"><span>    ssh <span style="color:#8be9fd;font-style:italic">$node</span> <span style="color:#f1fa8c">&#34;df -h&#34;</span>
</span></span><span style="display:flex;"><span>done
</span></span></code></pre></div><h4 id="检查网络状况">检查网络状况</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查节点间网络延迟</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin -printTopology | grep -o <span style="color:#f1fa8c">&#39;[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+&#39;</span> | <span style="color:#ff79c6">while</span> <span style="color:#8be9fd;font-style:italic">read</span> node; <span style="color:#ff79c6">do</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;Testing </span><span style="color:#8be9fd;font-style:italic">$node</span><span style="color:#f1fa8c">...&#34;</span>
</span></span><span style="display:flex;"><span>    ping -c <span style="color:#bd93f9">3</span> <span style="color:#8be9fd;font-style:italic">$node</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">done</span>
</span></span></code></pre></div><h3 id="2-均衡前准备">2. 均衡前准备</h3>
<h4 id="备份重要配置">备份重要配置</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 备份HDFS配置</span>
</span></span><span style="display:flex;"><span>cp -r <span style="color:#8be9fd;font-style:italic">$HADOOP_CONF_DIR</span> /backup/hdfs_conf_<span style="color:#ff79c6">$(</span>date +%Y%m%d<span style="color:#ff79c6">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 记录当前状态</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin -report &gt; /backup/hdfs_report_<span style="color:#ff79c6">$(</span>date +%Y%m%d_%H%M%S<span style="color:#ff79c6">)</span>.txt
</span></span></code></pre></div><h4 id="设置均衡参数">设置均衡参数</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span># 设置均衡带宽（可选）
</span></span><span style="display:flex;"><span># 在hdfs-site.xml中添加：
</span></span><span style="display:flex;"><span># <span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>#   <span style="color:#ff79c6">&lt;name&gt;</span>dfs.datanode.balance.bandwidthPerSec<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>#   <span style="color:#ff79c6">&lt;value&gt;</span>10485760<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 10MB/s --&gt;</span>
</span></span><span style="display:flex;"><span># <span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span></code></pre></div><h3 id="3-执行均衡">3. 执行均衡</h3>
<h4 id="基本均衡命令">基本均衡命令</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 标准均衡命令
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 -policy datanode &gt; /tmp/hdfs_balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 获取进程ID
</span></span><span style="display:flex;"><span>BALANCER_PID=$!
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 记录PID
</span></span><span style="display:flex;"><span>echo $BALANCER_PID &gt; /tmp/hdfs_balancer.pid
</span></span></code></pre></div><h4 id="高级均衡命令">高级均衡命令</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 排除特定节点的均衡
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 5 -policy datanode \
</span></span><span style="display:flex;"><span>    -exclude 192.168.1.100,192.168.1.101 \
</span></span><span style="display:flex;"><span>    -idleiterations 3 &gt; /tmp/hdfs_balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 只对特定节点进行均衡
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 -policy datanode \
</span></span><span style="display:flex;"><span>    -include 192.168.1.102,192.168.1.103 &gt; /tmp/hdfs_balancer.log 2&gt;&amp;1 &amp;
</span></span></code></pre></div><h3 id="4-监控均衡进度">4. 监控均衡进度</h3>
<h4 id="实时监控脚本">实时监控脚本</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4">#!/usr/bin/env python3</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># monitor_hdfs_balancer.py</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> subprocess
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> time
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> re
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">from</span> datetime <span style="color:#ff79c6">import</span> datetime
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">get_hdfs_report</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">try</span>:
</span></span><span style="display:flex;"><span>        result <span style="color:#ff79c6">=</span> subprocess<span style="color:#ff79c6">.</span>run([<span style="color:#f1fa8c">&#39;hdfs&#39;</span>, <span style="color:#f1fa8c">&#39;dfsadmin&#39;</span>, <span style="color:#f1fa8c">&#39;-report&#39;</span>],
</span></span><span style="display:flex;"><span>                              stdout<span style="color:#ff79c6">=</span>subprocess<span style="color:#ff79c6">.</span>PIPE, stderr<span style="color:#ff79c6">=</span>subprocess<span style="color:#ff79c6">.</span>PIPE,
</span></span><span style="display:flex;"><span>                              universal_newlines<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>, check<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> result<span style="color:#ff79c6">.</span>stdout
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">except</span> subprocess<span style="color:#ff79c6">.</span>CalledProcessError <span style="color:#ff79c6">as</span> e:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;获取HDFS报告失败: </span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">&#34;</span><span style="color:#ff79c6">.</span>format(e))
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#ff79c6">None</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">parse_datanode_info</span>(report):
</span></span><span style="display:flex;"><span>    datanodes <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span>    lines <span style="color:#ff79c6">=</span> report<span style="color:#ff79c6">.</span>split(<span style="color:#f1fa8c">&#39;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">&#39;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    current_node <span style="color:#ff79c6">=</span> {}
</span></span><span style="display:flex;"><span>    in_datanode_section <span style="color:#ff79c6">=</span> <span style="color:#ff79c6">False</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> lines:
</span></span><span style="display:flex;"><span>        line <span style="color:#ff79c6">=</span> line<span style="color:#ff79c6">.</span>strip()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> line<span style="color:#ff79c6">.</span>startswith(<span style="color:#f1fa8c">&#39;Name:&#39;</span>):
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">if</span> current_node:
</span></span><span style="display:flex;"><span>                datanodes<span style="color:#ff79c6">.</span>append(current_node)
</span></span><span style="display:flex;"><span>            current_node <span style="color:#ff79c6">=</span> {<span style="color:#f1fa8c">&#39;name&#39;</span>: line<span style="color:#ff79c6">.</span>split(<span style="color:#f1fa8c">&#39;:&#39;</span>, <span style="color:#bd93f9">1</span>)[<span style="color:#bd93f9">1</span>]<span style="color:#ff79c6">.</span>strip()}
</span></span><span style="display:flex;"><span>            in_datanode_section <span style="color:#ff79c6">=</span> <span style="color:#ff79c6">True</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">elif</span> in_datanode_section <span style="color:#ff79c6">and</span> line<span style="color:#ff79c6">.</span>startswith(<span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span>):
</span></span><span style="display:flex;"><span>            current_node[<span style="color:#f1fa8c">&#39;used_percent&#39;</span>] <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(line<span style="color:#ff79c6">.</span>split(<span style="color:#f1fa8c">&#39;:&#39;</span>)[<span style="color:#bd93f9">1</span>]<span style="color:#ff79c6">.</span>strip()<span style="color:#ff79c6">.</span>replace(<span style="color:#f1fa8c">&#39;%&#39;</span>, <span style="color:#f1fa8c">&#39;&#39;</span>))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">elif</span> in_datanode_section <span style="color:#ff79c6">and</span> line<span style="color:#ff79c6">.</span>startswith(<span style="color:#f1fa8c">&#39;DFS Remaining%:&#39;</span>):
</span></span><span style="display:flex;"><span>            current_node[<span style="color:#f1fa8c">&#39;remaining_percent&#39;</span>] <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(line<span style="color:#ff79c6">.</span>split(<span style="color:#f1fa8c">&#39;:&#39;</span>)[<span style="color:#bd93f9">1</span>]<span style="color:#ff79c6">.</span>strip()<span style="color:#ff79c6">.</span>replace(<span style="color:#f1fa8c">&#39;%&#39;</span>, <span style="color:#f1fa8c">&#39;&#39;</span>))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> current_node:
</span></span><span style="display:flex;"><span>        datanodes<span style="color:#ff79c6">.</span>append(current_node)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">return</span> datanodes
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">calculate_balance_metrics</span>(datanodes):
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#ff79c6">not</span> datanodes:
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#ff79c6">None</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    used_percents <span style="color:#ff79c6">=</span> [node<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>) <span style="color:#ff79c6">for</span> node <span style="color:#ff79c6">in</span> datanodes]
</span></span><span style="display:flex;"><span>    avg_used_percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 计算标准差</span>
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg_used_percent) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 找出最高和最低使用率节点</span>
</span></span><span style="display:flex;"><span>    max_used_node <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">max</span>(datanodes, key<span style="color:#ff79c6">=</span><span style="color:#ff79c6">lambda</span> x: x<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>))
</span></span><span style="display:flex;"><span>    min_used_node <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">min</span>(datanodes, key<span style="color:#ff79c6">=</span><span style="color:#ff79c6">lambda</span> x: x<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">return</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;avg_used_percent&#39;</span>: avg_used_percent,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;std_dev&#39;</span>: std_dev,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;max_used_node&#39;</span>: max_used_node,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;min_used_node&#39;</span>: min_used_node,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;datanodes&#39;</span>: datanodes
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">monitor_balancer</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;HDFS均衡监控开始...&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;=&#34;</span> <span style="color:#ff79c6">*</span> <span style="color:#bd93f9">80</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    start_time <span style="color:#ff79c6">=</span> datetime<span style="color:#ff79c6">.</span>now()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">try</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">while</span> <span style="color:#ff79c6">True</span>:
</span></span><span style="display:flex;"><span>            current_time <span style="color:#ff79c6">=</span> datetime<span style="color:#ff79c6">.</span>now()
</span></span><span style="display:flex;"><span>            elapsed <span style="color:#ff79c6">=</span> current_time <span style="color:#ff79c6">-</span> start_time
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            report <span style="color:#ff79c6">=</span> get_hdfs_report()
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">if</span> <span style="color:#ff79c6">not</span> report:
</span></span><span style="display:flex;"><span>                <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;[</span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">] 无法获取HDFS报告&#34;</span><span style="color:#ff79c6">.</span>format(current_time<span style="color:#ff79c6">.</span>strftime(<span style="color:#f1fa8c">&#39;%H:%M:%S&#39;</span>)))
</span></span><span style="display:flex;"><span>                time<span style="color:#ff79c6">.</span>sleep(<span style="color:#bd93f9">30</span>)
</span></span><span style="display:flex;"><span>                <span style="color:#ff79c6">continue</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            datanodes <span style="color:#ff79c6">=</span> parse_datanode_info(report)
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">if</span> <span style="color:#ff79c6">not</span> datanodes:
</span></span><span style="display:flex;"><span>                <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;[</span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">] 无法解析datanode信息&#34;</span><span style="color:#ff79c6">.</span>format(current_time<span style="color:#ff79c6">.</span>strftime(<span style="color:#f1fa8c">&#39;%H:%M:%S&#39;</span>)))
</span></span><span style="display:flex;"><span>                time<span style="color:#ff79c6">.</span>sleep(<span style="color:#bd93f9">30</span>)
</span></span><span style="display:flex;"><span>                <span style="color:#ff79c6">continue</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            metrics <span style="color:#ff79c6">=</span> calculate_balance_metrics(datanodes)
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">if</span> <span style="color:#ff79c6">not</span> metrics:
</span></span><span style="display:flex;"><span>                <span style="color:#ff79c6">continue</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            <span style="color:#6272a4"># 显示当前状态</span>
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">[</span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">] 运行时间: </span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">&#34;</span><span style="color:#ff79c6">.</span>format(current_time<span style="color:#ff79c6">.</span>strftime(<span style="color:#f1fa8c">&#39;%H:%M:%S&#39;</span>), elapsed))
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;平均使用率: </span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">%&#34;</span><span style="color:#ff79c6">.</span>format(metrics[<span style="color:#f1fa8c">&#39;avg_used_percent&#39;</span>]))
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;均衡度(标准差): </span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">%&#34;</span><span style="color:#ff79c6">.</span>format(metrics[<span style="color:#f1fa8c">&#39;std_dev&#39;</span>]))
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;最高使用率节点: </span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c"> (</span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">%)&#34;</span><span style="color:#ff79c6">.</span>format(
</span></span><span style="display:flex;"><span>                metrics[<span style="color:#f1fa8c">&#39;max_used_node&#39;</span>]<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;name&#39;</span>, <span style="color:#f1fa8c">&#39;N/A&#39;</span>),
</span></span><span style="display:flex;"><span>                metrics[<span style="color:#f1fa8c">&#39;max_used_node&#39;</span>]<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>)))
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;最低使用率节点: </span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c"> (</span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">%)&#34;</span><span style="color:#ff79c6">.</span>format(
</span></span><span style="display:flex;"><span>                metrics[<span style="color:#f1fa8c">&#39;min_used_node&#39;</span>]<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;name&#39;</span>, <span style="color:#f1fa8c">&#39;N/A&#39;</span>),
</span></span><span style="display:flex;"><span>                metrics[<span style="color:#f1fa8c">&#39;min_used_node&#39;</span>]<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>)))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">各节点使用率:&#34;</span>)
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">for</span> node <span style="color:#ff79c6">in</span> <span style="color:#8be9fd;font-style:italic">sorted</span>(metrics[<span style="color:#f1fa8c">&#39;datanodes&#39;</span>], key<span style="color:#ff79c6">=</span><span style="color:#ff79c6">lambda</span> x: x<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>), reverse<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>):
</span></span><span style="display:flex;"><span>                name <span style="color:#ff79c6">=</span> node<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;name&#39;</span>, <span style="color:#f1fa8c">&#39;N/A&#39;</span>)
</span></span><span style="display:flex;"><span>                used_pct <span style="color:#ff79c6">=</span> node<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>)
</span></span><span style="display:flex;"><span>                remaining_pct <span style="color:#ff79c6">=</span> node<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;remaining_percent&#39;</span>, <span style="color:#bd93f9">0</span>)
</span></span><span style="display:flex;"><span>                <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;  </span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">: </span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">% (剩余: </span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">%)&#34;</span><span style="color:#ff79c6">.</span>format(name, used_pct, remaining_pct))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            <span style="color:#6272a4"># 检查是否达到均衡</span>
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">if</span> metrics[<span style="color:#f1fa8c">&#39;std_dev&#39;</span>] <span style="color:#ff79c6">&lt;</span> <span style="color:#bd93f9">5.0</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">🎉 均衡完成! 标准差: </span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">%&#34;</span><span style="color:#ff79c6">.</span>format(metrics[<span style="color:#f1fa8c">&#39;std_dev&#39;</span>]))
</span></span><span style="display:flex;"><span>                <span style="color:#ff79c6">break</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;=&#34;</span> <span style="color:#ff79c6">*</span> <span style="color:#bd93f9">80</span>)
</span></span><span style="display:flex;"><span>            time<span style="color:#ff79c6">.</span>sleep(<span style="color:#bd93f9">60</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">except</span> KeyboardInterrupt:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">\n\n</span><span style="color:#f1fa8c">监控已停止&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">except</span> Exception <span style="color:#ff79c6">as</span> e:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">监控出错: </span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">&#34;</span><span style="color:#ff79c6">.</span>format(e))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> <span style="color:#8be9fd;font-style:italic">__name__</span> <span style="color:#ff79c6">==</span> <span style="color:#f1fa8c">&#34;__main__&#34;</span>:
</span></span><span style="display:flex;"><span>    monitor_balancer()
</span></span></code></pre></div><h4 id="手动检查命令">手动检查命令</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4"># 检查均衡进程状态</span>
</span></span><span style="display:flex;"><span>ps aux <span style="color:#ff79c6">|</span> grep balancer
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 查看均衡日志</span>
</span></span><span style="display:flex;"><span>tail <span style="color:#ff79c6">-</span>f <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>hdfs_balancer<span style="color:#ff79c6">.</span>log
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查集群状态</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#ff79c6">|</span> grep <span style="color:#ff79c6">-</span>A <span style="color:#bd93f9">20</span> <span style="color:#f1fa8c">&#34;Live datanodes&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 计算当前均衡度</span>
</span></span><span style="display:flex;"><span>python3 <span style="color:#ff79c6">-</span>c <span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> subprocess
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> re
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>result <span style="color:#ff79c6">=</span> subprocess<span style="color:#ff79c6">.</span>run([<span style="color:#f1fa8c">&#39;hdfs&#39;</span>, <span style="color:#f1fa8c">&#39;dfsadmin&#39;</span>, <span style="color:#f1fa8c">&#39;-report&#39;</span>],
</span></span><span style="display:flex;"><span>                       stdout<span style="color:#ff79c6">=</span>subprocess<span style="color:#ff79c6">.</span>PIPE, universal_newlines<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>)
</span></span><span style="display:flex;"><span>report <span style="color:#ff79c6">=</span> result<span style="color:#ff79c6">.</span>stdout
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>used_percents <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> report<span style="color:#ff79c6">.</span>split(<span style="color:#f1fa8c">&#39;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">&#39;</span>):
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span> <span style="color:#ff79c6">in</span> line:
</span></span><span style="display:flex;"><span>        percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;(\d+\.?\d*)%&#39;</span>, line)<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>))
</span></span><span style="display:flex;"><span>        used_percents<span style="color:#ff79c6">.</span>append(percent)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> used_percents:
</span></span><span style="display:flex;"><span>    avg <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;平均使用率: </span><span style="color:#f1fa8c">{</span>avg<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">%&#39;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;标准差: </span><span style="color:#f1fa8c">{</span>std_dev<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">%&#39;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;最高使用率: </span><span style="color:#f1fa8c">{</span><span style="color:#8be9fd;font-style:italic">max</span>(used_percents)<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">%&#39;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;最低使用率: </span><span style="color:#f1fa8c">{</span><span style="color:#8be9fd;font-style:italic">min</span>(used_percents)<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">%&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;</span>
</span></span></code></pre></div><h2 id="监控和管理">监控和管理</h2>
<h3 id="1-均衡状态监控">1. 均衡状态监控</h3>
<h4 id="实时监控">实时监控</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 启动监控脚本</span>
</span></span><span style="display:flex;"><span>python3 /tmp/monitor_hdfs_balancer.py
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 后台运行监控</span>
</span></span><span style="display:flex;"><span>nohup python3 /tmp/monitor_hdfs_balancer.py &gt; /tmp/balancer_monitor.log 2&gt;&amp;<span style="color:#bd93f9">1</span> &amp;
</span></span></code></pre></div><h4 id="定期检查">定期检查</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 创建定期检查脚本</span>
</span></span><span style="display:flex;"><span>cat &gt; /tmp/check_balance.sh <span style="color:#f1fa8c">&lt;&lt; &#39;EOF&#39;
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">LOG_FILE=&#34;/tmp/balance_check.log&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">DATE=$(date &#39;+%Y-%m-%d %H:%M:%S&#39;)
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">echo &#34;[$DATE] 开始检查HDFS均衡状态&#34; &gt;&gt; $LOG_FILE
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c"># 检查均衡进程
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">if pgrep -f &#34;hdfs.*balancer&#34; &gt; /dev/null; then
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    echo &#34;[$DATE] 均衡进程正在运行&#34; &gt;&gt; $LOG_FILE
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">else
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    echo &#34;[$DATE] 警告: 均衡进程未运行&#34; &gt;&gt; $LOG_FILE
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">fi
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c"># 检查集群状态
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">hdfs dfsadmin -report | grep -A 20 &#34;Live datanodes&#34; &gt;&gt; $LOG_FILE
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">echo &#34;[$DATE] 检查完成&#34; &gt;&gt; $LOG_FILE
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">echo &#34;----------------------------------------&#34; &gt;&gt; $LOG_FILE
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">EOF</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>chmod +x /tmp/check_balance.sh
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 添加到crontab，每10分钟检查一次</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;*/10 * * * * /tmp/check_balance.sh&#34;</span> | crontab -
</span></span></code></pre></div><h3 id="2-均衡进程管理">2. 均衡进程管理</h3>
<h4 id="启动均衡">启动均衡</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 基本启动
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 高级启动
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 5 -policy datanode \
</span></span><span style="display:flex;"><span>    -exclude 192.168.1.100 -idleiterations 3 \
</span></span><span style="display:flex;"><span>    &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span></code></pre></div><h4 id="停止均衡">停止均衡</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 查找均衡进程
</span></span><span style="display:flex;"><span>BALANCER_PID=$(pgrep -f &#34;hdfs.*balancer&#34;)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 停止均衡进程
</span></span><span style="display:flex;"><span>if [ ! -z &#34;$BALANCER_PID&#34; ]; then
</span></span><span style="display:flex;"><span>    kill $BALANCER_PID
</span></span><span style="display:flex;"><span>    echo &#34;均衡进程 $BALANCER_PID 已停止&#34;
</span></span><span style="display:flex;"><span>else
</span></span><span style="display:flex;"><span>    echo &#34;未找到运行中的均衡进程&#34;
</span></span><span style="display:flex;"><span>fi
</span></span></code></pre></div><h4 id="重启均衡">重启均衡</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 停止现有均衡
</span></span><span style="display:flex;"><span>pkill -f &#34;hdfs.*balancer&#34;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 等待进程完全停止
</span></span><span style="display:flex;"><span>sleep 5
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 重新启动均衡
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span></code></pre></div><h3 id="3-日志分析">3. 日志分析</h3>
<h4 id="均衡日志分析">均衡日志分析</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 统计移动的数据块数量
</span></span><span style="display:flex;"><span>grep &#34;Successfully moved&#34; /tmp/hdfs_balancer.log | wc -l
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 统计移动的数据量
</span></span><span style="display:flex;"><span>grep &#34;Successfully moved&#34; /tmp/hdfs_balancer.log | \
</span></span><span style="display:flex;"><span>    awk &#39;{sum += $NF} END {print &#34;总移动数据量: &#34; sum/1024/1024/1024 &#34; GB&#34;}&#39;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 分析移动速度
</span></span><span style="display:flex;"><span>grep &#34;Successfully moved&#34; /tmp/hdfs_balancer.log | \
</span></span><span style="display:flex;"><span>    awk &#39;{print $1, $2, $NF}&#39; | \
</span></span><span style="display:flex;"><span>    tail -100 | \
</span></span><span style="display:flex;"><span>    awk &#39;BEGIN{prev_time=&#34;&#34;} {
</span></span><span style="display:flex;"><span>        if(prev_time != &#34;&#34;) {
</span></span><span style="display:flex;"><span>            split($1&#34; &#34;$2, time_arr, &#34;:&#34;)
</span></span><span style="display:flex;"><span>            current_sec = time_arr[1]*3600 + time_arr[2]*60 + time_arr[3]
</span></span><span style="display:flex;"><span>            split(prev_time, prev_arr, &#34;:&#34;)
</span></span><span style="display:flex;"><span>            prev_sec = prev_arr[1]*3600 + prev_arr[2]*60 + prev_arr[3]
</span></span><span style="display:flex;"><span>            if(current_sec &gt; prev_sec) {
</span></span><span style="display:flex;"><span>                speed = $3 / (current_sec - prev_sec)
</span></span><span style="display:flex;"><span>                print &#34;移动速度: &#34; speed/1024/1024 &#34; MB/s&#34;
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>        prev_time = $1&#34; &#34;$2
</span></span><span style="display:flex;"><span>    }&#39;
</span></span></code></pre></div><h2 id="故障排除">故障排除</h2>
<h3 id="1-常见问题">1. 常见问题</h3>
<h4 id="均衡进程无法启动">均衡进程无法启动</h4>
<p><strong>症状</strong>：执行均衡命令后立即退出
<strong>可能原因</strong>：</p>
<ul>
<li>HDFS服务未正常运行</li>
<li>权限不足</li>
<li>配置错误</li>
</ul>
<p><strong>解决方法</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查HDFS服务状态</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin -report
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查权限</span>
</span></span><span style="display:flex;"><span>whoami
</span></span><span style="display:flex;"><span>groups
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查配置</span>
</span></span><span style="display:flex;"><span>hdfs getconf -confKey dfs.namenode.rpc-address
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 查看错误日志</span>
</span></span><span style="display:flex;"><span>tail -f <span style="color:#8be9fd;font-style:italic">$HADOOP_LOG_DIR</span>/hadoop-*-balancer-*.log
</span></span></code></pre></div><h4 id="均衡速度过慢">均衡速度过慢</h4>
<p><strong>症状</strong>：数据移动速度很慢，均衡时间过长
<strong>可能原因</strong>：</p>
<ul>
<li>网络带宽限制</li>
<li>磁盘I/O性能差</li>
<li>均衡带宽设置过低</li>
</ul>
<p><strong>解决方法</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查网络带宽</span>
</span></span><span style="display:flex;"><span>iperf3 -s &amp;  <span style="color:#6272a4"># 在源节点启动服务器</span>
</span></span><span style="display:flex;"><span>iperf3 -c &lt;source_node&gt; -t <span style="color:#bd93f9">60</span>  <span style="color:#6272a4"># 在目标节点测试</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查磁盘I/O</span>
</span></span><span style="display:flex;"><span>iostat -x <span style="color:#bd93f9">1</span> <span style="color:#bd93f9">5</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 调整均衡带宽</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 在hdfs-site.xml中设置：</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># &lt;property&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">#   &lt;name&gt;dfs.datanode.balance.bandwidthPerSec&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">#   &lt;value&gt;52428800&lt;/value&gt;  &lt;!-- 50MB/s --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># &lt;/property&gt;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 重启DataNode服务</span>
</span></span><span style="display:flex;"><span>sudo systemctl restart hadoop-datanode
</span></span></code></pre></div><h4 id="均衡进程异常退出">均衡进程异常退出</h4>
<p><strong>症状</strong>：均衡进程运行一段时间后自动退出
<strong>可能原因</strong>：</p>
<ul>
<li>内存不足</li>
<li>网络中断</li>
<li>磁盘空间不足</li>
</ul>
<p><strong>解决方法</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 检查系统资源
</span></span><span style="display:flex;"><span>free -h
</span></span><span style="display:flex;"><span>df -h
</span></span><span style="display:flex;"><span>dmesg | tail -50
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 检查均衡日志
</span></span><span style="display:flex;"><span>tail -100 /tmp/hdfs_balancer.log
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 检查HDFS日志
</span></span><span style="display:flex;"><span>tail -100 $HADOOP_LOG_DIR/hadoop-*-balancer-*.log
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 重新启动均衡
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 &gt; /tmp/balancer_retry.log 2&gt;&amp;1 &amp;
</span></span></code></pre></div><h3 id="2-性能问题">2. 性能问题</h3>
<h4 id="网络瓶颈">网络瓶颈</h4>
<p><strong>诊断</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 检查网络使用情况
</span></span><span style="display:flex;"><span>iftop -i eth0
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 检查网络延迟
</span></span><span style="display:flex;"><span>ping -c 10 &lt;target_node&gt;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 检查网络丢包
</span></span><span style="display:flex;"><span>mtr -r -c 10 &lt;target_node&gt;
</span></span></code></pre></div><p><strong>优化</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 调整网络参数
</span></span><span style="display:flex;"><span>echo &#39;net.core.rmem_max = 134217728&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>echo &#39;net.core.wmem_max = 134217728&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>sysctl -p
</span></span></code></pre></div><h4 id="磁盘io瓶颈">磁盘I/O瓶颈</h4>
<p><strong>诊断</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 检查磁盘使用情况
</span></span><span style="display:flex;"><span>iostat -x 1 10
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 检查磁盘队列
</span></span><span style="display:flex;"><span>iostat -x 1 10 | grep -E &#34;(Device|sd)&#34;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 检查磁盘错误
</span></span><span style="display:flex;"><span>dmesg | grep -i error
</span></span></code></pre></div><p><strong>优化</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 调整I/O调度器
</span></span><span style="display:flex;"><span>echo noop &gt; /sys/block/sda/queue/scheduler
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 调整I/O参数
</span></span><span style="display:flex;"><span>echo 1024 &gt; /sys/block/sda/queue/nr_requests
</span></span></code></pre></div><h3 id="3-数据完整性检查">3. 数据完整性检查</h3>
<h4 id="均衡后验证">均衡后验证</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查数据块完整性</span>
</span></span><span style="display:flex;"><span>hdfs fsck / -files -blocks -locations
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查副本数量</span>
</span></span><span style="display:flex;"><span>hdfs fsck / -files -blocks | grep -E <span style="color:#f1fa8c">&#34;(Missing|Under-replicated)&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查损坏的数据块</span>
</span></span><span style="display:flex;"><span>hdfs fsck / -files -blocks | grep -i corrupt
</span></span></code></pre></div><h4 id="数据恢复">数据恢复</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 修复损坏的数据块</span>
</span></span><span style="display:flex;"><span>hdfs fsck / -delete
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 重新平衡副本</span>
</span></span><span style="display:flex;"><span>hdfs balancer -threshold <span style="color:#bd93f9">1</span>
</span></span></code></pre></div><h2 id="最佳实践">最佳实践</h2>
<h3 id="1-均衡策略-1">1. 均衡策略</h3>
<h4 id="时间选择">时间选择</h4>
<ul>
<li><strong>业务低峰期</strong>：选择业务访问量最低的时间段</li>
<li><strong>维护窗口</strong>：在计划维护期间进行</li>
<li><strong>分批进行</strong>：对于大型集群，可以分批进行均衡</li>
</ul>
<h4 id="参数设置">参数设置</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 生产环境推荐参数</span>
</span></span><span style="display:flex;"><span>hdfs balancer -threshold <span style="color:#bd93f9">5</span> -policy datanode -idleiterations <span style="color:#bd93f9">3</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 测试环境参数</span>
</span></span><span style="display:flex;"><span>hdfs balancer -threshold <span style="color:#bd93f9">10</span> -policy datanode -idleiterations <span style="color:#bd93f9">1</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 紧急情况参数</span>
</span></span><span style="display:flex;"><span>hdfs balancer -threshold <span style="color:#bd93f9">20</span> -policy datanode
</span></span></code></pre></div><h3 id="2-监控策略">2. 监控策略</h3>
<h4 id="实时监控-1">实时监控</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4"># 创建监控脚本</span>
</span></span><span style="display:flex;"><span>cat <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_monitor<span style="color:#ff79c6">.</span>sh <span style="color:#ff79c6">&lt;&lt;</span> <span style="color:#f1fa8c">&#39;EOF&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">#!/bin/bash</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查均衡进程</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> ! pgrep <span style="color:#ff79c6">-</span>f <span style="color:#f1fa8c">&#34;hdfs.*balancer&#34;</span> <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>dev<span style="color:#ff79c6">/</span>null; then
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;$(date): 均衡进程未运行，尝试重启&#34;</span> <span style="color:#ff79c6">&gt;&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_monitor<span style="color:#ff79c6">.</span>log
</span></span><span style="display:flex;"><span>    nohup hdfs balancer <span style="color:#ff79c6">-</span>threshold <span style="color:#bd93f9">10</span> <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>log <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;&amp;</span><span style="color:#bd93f9">1</span> <span style="color:#ff79c6">&amp;</span>
</span></span><span style="display:flex;"><span>fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查集群状态</span>
</span></span><span style="display:flex;"><span>REPORT<span style="color:#ff79c6">=</span>$(hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;/</span>dev<span style="color:#ff79c6">/</span>null)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> [ $? <span style="color:#ff79c6">-</span>ne <span style="color:#bd93f9">0</span> ]; then
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;$(date): HDFS服务异常&#34;</span> <span style="color:#ff79c6">&gt;&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_monitor<span style="color:#ff79c6">.</span>log
</span></span><span style="display:flex;"><span>    exit <span style="color:#bd93f9">1</span>
</span></span><span style="display:flex;"><span>fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 计算均衡度</span>
</span></span><span style="display:flex;"><span>STD_DEV<span style="color:#ff79c6">=</span>$(echo <span style="color:#f1fa8c">&#34;$REPORT&#34;</span> <span style="color:#ff79c6">|</span> python3 <span style="color:#ff79c6">-</span>c <span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> sys
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> re
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>used_percents <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> sys<span style="color:#ff79c6">.</span>stdin:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span> <span style="color:#ff79c6">in</span> line:
</span></span><span style="display:flex;"><span>        percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;(\d+\.?\d*)%&#39;</span>, line)<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>))
</span></span><span style="display:flex;"><span>        used_percents<span style="color:#ff79c6">.</span>append(percent)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> used_percents:
</span></span><span style="display:flex;"><span>    avg <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;</span><span style="color:#f1fa8c">{</span>std_dev<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">else</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;0&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;$(date): 当前均衡度: $</span><span style="color:#f1fa8c">{STD_DEV}</span><span style="color:#f1fa8c">%&#34;</span> <span style="color:#ff79c6">&gt;&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_monitor<span style="color:#ff79c6">.</span>log
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 如果均衡度过高，启动均衡</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> (( $(echo <span style="color:#f1fa8c">&#34;$STD_DEV &gt; 15&#34;</span> <span style="color:#ff79c6">|</span> bc <span style="color:#ff79c6">-</span>l) )); then
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;$(date): 均衡度过高，启动均衡&#34;</span> <span style="color:#ff79c6">&gt;&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_monitor<span style="color:#ff79c6">.</span>log
</span></span><span style="display:flex;"><span>    nohup hdfs balancer <span style="color:#ff79c6">-</span>threshold <span style="color:#bd93f9">10</span> <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>log <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;&amp;</span><span style="color:#bd93f9">1</span> <span style="color:#ff79c6">&amp;</span>
</span></span><span style="display:flex;"><span>fi
</span></span><span style="display:flex;"><span>EOF
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>chmod <span style="color:#ff79c6">+</span>x <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_monitor<span style="color:#ff79c6">.</span>sh
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 添加到crontab，每30分钟检查一次</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;*/30 * * * * /tmp/balance_monitor.sh&#34;</span> <span style="color:#ff79c6">|</span> crontab <span style="color:#ff79c6">-</span>
</span></span></code></pre></div><h3 id="3-自动化脚本">3. 自动化脚本</h3>
<h4 id="完整均衡脚本">完整均衡脚本</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>cat <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>auto_balance<span style="color:#ff79c6">.</span>sh <span style="color:#ff79c6">&lt;&lt;</span> <span style="color:#f1fa8c">&#39;EOF&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">#!/bin/bash</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 配置参数</span>
</span></span><span style="display:flex;"><span>THRESHOLD<span style="color:#ff79c6">=</span><span style="color:#bd93f9">10</span>
</span></span><span style="display:flex;"><span>LOG_FILE<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;/tmp/auto_balance.log&#34;</span>
</span></span><span style="display:flex;"><span>BALANCE_LOG<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;/tmp/hdfs_balancer.log&#34;</span>
</span></span><span style="display:flex;"><span>MAX_RUNTIME<span style="color:#ff79c6">=</span><span style="color:#bd93f9">7200</span>  <span style="color:#6272a4"># 最大运行时间（秒）</span>
</span></span><span style="display:flex;"><span>CHECK_INTERVAL<span style="color:#ff79c6">=</span><span style="color:#bd93f9">300</span>  <span style="color:#6272a4"># 检查间隔（秒）</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 日志函数</span>
</span></span><span style="display:flex;"><span>log() {
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;[$(date &#39;+%Y-%m-</span><span style="color:#f1fa8c">%d</span><span style="color:#f1fa8c"> %H:%M:%S&#39;)] $1&#34;</span> <span style="color:#ff79c6">|</span> tee <span style="color:#ff79c6">-</span>a $LOG_FILE
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查HDFS状态</span>
</span></span><span style="display:flex;"><span>check_hdfs_status() {
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> ! hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>dev<span style="color:#ff79c6">/</span>null <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;&amp;</span><span style="color:#bd93f9">1</span>; then
</span></span><span style="display:flex;"><span>        log <span style="color:#f1fa8c">&#34;错误: HDFS服务不可用&#34;</span>
</span></span><span style="display:flex;"><span>        exit <span style="color:#bd93f9">1</span>
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;HDFS服务状态正常&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 计算均衡度</span>
</span></span><span style="display:flex;"><span>calculate_balance_degree() {
</span></span><span style="display:flex;"><span>    local report<span style="color:#ff79c6">=</span>$(hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;/</span>dev<span style="color:#ff79c6">/</span>null)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> [ $? <span style="color:#ff79c6">-</span>ne <span style="color:#bd93f9">0</span> ]; then
</span></span><span style="display:flex;"><span>        echo <span style="color:#f1fa8c">&#34;0&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span>
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;$report&#34;</span> <span style="color:#ff79c6">|</span> python3 <span style="color:#ff79c6">-</span>c <span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> sys
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> re
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>used_percents <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> sys<span style="color:#ff79c6">.</span>stdin:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span> <span style="color:#ff79c6">in</span> line:
</span></span><span style="display:flex;"><span>        percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;(\d+\.?\d*)%&#39;</span>, line)<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>))
</span></span><span style="display:flex;"><span>        used_percents<span style="color:#ff79c6">.</span>append(percent)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> used_percents:
</span></span><span style="display:flex;"><span>    avg <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;</span><span style="color:#f1fa8c">{</span>std_dev<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">else</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;0&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 启动均衡</span>
</span></span><span style="display:flex;"><span>start_balancer() {
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;启动HDFS均衡，阈值: $</span><span style="color:#f1fa8c">{THRESHOLD}</span><span style="color:#f1fa8c">%&#34;</span>
</span></span><span style="display:flex;"><span>    nohup hdfs balancer <span style="color:#ff79c6">-</span>threshold $THRESHOLD <span style="color:#ff79c6">-</span>policy datanode <span style="color:#ff79c6">&gt;</span> $BALANCE_LOG <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;&amp;</span><span style="color:#bd93f9">1</span> <span style="color:#ff79c6">&amp;</span>
</span></span><span style="display:flex;"><span>    BALANCER_PID<span style="color:#ff79c6">=</span>$!
</span></span><span style="display:flex;"><span>    echo $BALANCER_PID <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>pid
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;均衡进程已启动，PID: $BALANCER_PID&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 停止均衡</span>
</span></span><span style="display:flex;"><span>stop_balancer() {
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> [ <span style="color:#ff79c6">-</span>f <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>pid ]; then
</span></span><span style="display:flex;"><span>        local pid<span style="color:#ff79c6">=</span>$(cat <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>pid)
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> kill <span style="color:#ff79c6">-</span><span style="color:#bd93f9">0</span> $pid <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;/</span>dev<span style="color:#ff79c6">/</span>null; then
</span></span><span style="display:flex;"><span>            kill $pid
</span></span><span style="display:flex;"><span>            log <span style="color:#f1fa8c">&#34;均衡进程已停止，PID: $pid&#34;</span>
</span></span><span style="display:flex;"><span>        fi
</span></span><span style="display:flex;"><span>        rm <span style="color:#ff79c6">-</span>f <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>pid
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查均衡进度</span>
</span></span><span style="display:flex;"><span>check_balance_progress() {
</span></span><span style="display:flex;"><span>    local current_degree<span style="color:#ff79c6">=</span>$(calculate_balance_degree)
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;当前均衡度: $</span><span style="color:#f1fa8c">{current_degree}</span><span style="color:#f1fa8c">%&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> (( $(echo <span style="color:#f1fa8c">&#34;$current_degree &lt; 5&#34;</span> <span style="color:#ff79c6">|</span> bc <span style="color:#ff79c6">-</span>l) )); then
</span></span><span style="display:flex;"><span>        log <span style="color:#f1fa8c">&#34;均衡完成！&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#bd93f9">0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">else</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#bd93f9">1</span>
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 主函数</span>
</span></span><span style="display:flex;"><span>main() {
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;开始自动均衡流程&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 检查HDFS状态</span>
</span></span><span style="display:flex;"><span>    check_hdfs_status
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 计算初始均衡度</span>
</span></span><span style="display:flex;"><span>    initial_degree<span style="color:#ff79c6">=</span>$(calculate_balance_degree)
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;初始均衡度: $</span><span style="color:#f1fa8c">{initial_degree}</span><span style="color:#f1fa8c">%&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 如果已经均衡，退出</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> (( $(echo <span style="color:#f1fa8c">&#34;$initial_degree &lt; $THRESHOLD&#34;</span> <span style="color:#ff79c6">|</span> bc <span style="color:#ff79c6">-</span>l) )); then
</span></span><span style="display:flex;"><span>        log <span style="color:#f1fa8c">&#34;集群已经均衡，无需执行均衡操作&#34;</span>
</span></span><span style="display:flex;"><span>        exit <span style="color:#bd93f9">0</span>
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 启动均衡</span>
</span></span><span style="display:flex;"><span>    start_balancer
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 监控均衡进度</span>
</span></span><span style="display:flex;"><span>    start_time<span style="color:#ff79c6">=</span>$(date <span style="color:#ff79c6">+%</span>s)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">while</span> true; do
</span></span><span style="display:flex;"><span>        current_time<span style="color:#ff79c6">=</span>$(date <span style="color:#ff79c6">+%</span>s)
</span></span><span style="display:flex;"><span>        elapsed<span style="color:#ff79c6">=</span>$((current_time <span style="color:#ff79c6">-</span> start_time))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 检查是否超时</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> [ $elapsed <span style="color:#ff79c6">-</span>gt $MAX_RUNTIME ]; then
</span></span><span style="display:flex;"><span>            log <span style="color:#f1fa8c">&#34;均衡超时，停止均衡进程&#34;</span>
</span></span><span style="display:flex;"><span>            stop_balancer
</span></span><span style="display:flex;"><span>            exit <span style="color:#bd93f9">1</span>
</span></span><span style="display:flex;"><span>        fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 检查均衡进程是否还在运行</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> [ <span style="color:#ff79c6">-</span>f <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>pid ]; then
</span></span><span style="display:flex;"><span>            local pid<span style="color:#ff79c6">=</span>$(cat <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>pid)
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">if</span> ! kill <span style="color:#ff79c6">-</span><span style="color:#bd93f9">0</span> $pid <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;/</span>dev<span style="color:#ff79c6">/</span>null; then
</span></span><span style="display:flex;"><span>                log <span style="color:#f1fa8c">&#34;均衡进程异常退出&#34;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#ff79c6">break</span>
</span></span><span style="display:flex;"><span>            fi
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">else</span>
</span></span><span style="display:flex;"><span>            log <span style="color:#f1fa8c">&#34;均衡进程PID文件不存在&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">break</span>
</span></span><span style="display:flex;"><span>        fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 检查均衡进度</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> check_balance_progress; then
</span></span><span style="display:flex;"><span>            stop_balancer
</span></span><span style="display:flex;"><span>            log <span style="color:#f1fa8c">&#34;均衡成功完成&#34;</span>
</span></span><span style="display:flex;"><span>            exit <span style="color:#bd93f9">0</span>
</span></span><span style="display:flex;"><span>        fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 等待下次检查</span>
</span></span><span style="display:flex;"><span>        sleep $CHECK_INTERVAL
</span></span><span style="display:flex;"><span>    done
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 清理</span>
</span></span><span style="display:flex;"><span>    stop_balancer
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;均衡流程结束&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 信号处理</span>
</span></span><span style="display:flex;"><span>trap <span style="color:#f1fa8c">&#39;log &#34;收到中断信号，停止均衡&#34;; stop_balancer; exit 1&#39;</span> INT TERM
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 执行主函数</span>
</span></span><span style="display:flex;"><span>main <span style="color:#f1fa8c">&#34;$@&#34;</span>
</span></span><span style="display:flex;"><span>EOF
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>chmod <span style="color:#ff79c6">+</span>x <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>auto_balance<span style="color:#ff79c6">.</span>sh
</span></span></code></pre></div><h2 id="性能优化建议">性能优化建议</h2>
<h3 id="1-系统级优化">1. 系统级优化</h3>
<h4 id="网络优化">网络优化</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 调整网络缓冲区
</span></span><span style="display:flex;"><span>echo &#39;net.core.rmem_max = 134217728&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>echo &#39;net.core.wmem_max = 134217728&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>echo &#39;net.core.rmem_default = 65536&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>echo &#39;net.core.wmem_default = 65536&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>sysctl -p
</span></span></code></pre></div><h4 id="磁盘优化">磁盘优化</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 调整I/O调度器
</span></span><span style="display:flex;"><span>echo noop &gt; /sys/block/sda/queue/scheduler
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 调整I/O参数
</span></span><span style="display:flex;"><span>echo 1024 &gt; /sys/block/sda/queue/nr_requests
</span></span><span style="display:flex;"><span>echo 0 &gt; /sys/block/sda/queue/add_random
</span></span></code></pre></div><h3 id="2-hdfs配置优化">2. HDFS配置优化</h3>
<h4 id="均衡带宽设置">均衡带宽设置</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span><span style="color:#6272a4">&lt;!-- hdfs-site.xml --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>dfs.datanode.balance.bandwidthPerSec<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>52428800<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 50MB/s --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span></code></pre></div><h4 id="复制参数优化">复制参数优化</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span><span style="color:#6272a4">&lt;!-- hdfs-site.xml --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>dfs.replication<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>3<span style="color:#ff79c6">&lt;/value&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>dfs.namenode.replication.work.multiplier.per.iteration<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>2<span style="color:#ff79c6">&lt;/value&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span></code></pre></div><h3 id="3-监控和告警">3. 监控和告警</h3>
<h4 id="设置告警">设置告警</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4"># 创建告警脚本</span>
</span></span><span style="display:flex;"><span>cat <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_alert<span style="color:#ff79c6">.</span>sh <span style="color:#ff79c6">&lt;&lt;</span> <span style="color:#f1fa8c">&#39;EOF&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">#!/bin/bash</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 配置参数</span>
</span></span><span style="display:flex;"><span>ALERT_THRESHOLD<span style="color:#ff79c6">=</span><span style="color:#bd93f9">20</span>
</span></span><span style="display:flex;"><span>EMAIL_LIST<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;admin@company.com&#34;</span>
</span></span><span style="display:flex;"><span>LOG_FILE<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;/tmp/balance_alert.log&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 计算均衡度</span>
</span></span><span style="display:flex;"><span>STD_DEV<span style="color:#ff79c6">=</span>$(hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#ff79c6">|</span> python3 <span style="color:#ff79c6">-</span>c <span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> sys
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> re
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>used_percents <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> sys<span style="color:#ff79c6">.</span>stdin:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span> <span style="color:#ff79c6">in</span> line:
</span></span><span style="display:flex;"><span>        percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;(\d+\.?\d*)%&#39;</span>, line)<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>))
</span></span><span style="display:flex;"><span>        used_percents<span style="color:#ff79c6">.</span>append(percent)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> used_percents:
</span></span><span style="display:flex;"><span>    avg <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;</span><span style="color:#f1fa8c">{</span>std_dev<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">else</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;0&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查是否需要告警</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> (( $(echo <span style="color:#f1fa8c">&#34;$STD_DEV &gt; $ALERT_THRESHOLD&#34;</span> <span style="color:#ff79c6">|</span> bc <span style="color:#ff79c6">-</span>l) )); then
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;$(date): 警告: HDFS均衡度过高 ($</span><span style="color:#f1fa8c">{STD_DEV}</span><span style="color:#f1fa8c">%)&#34;</span> <span style="color:#ff79c6">&gt;&gt;</span> $LOG_FILE
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 发送邮件告警</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;HDFS集群均衡度过高: $</span><span style="color:#f1fa8c">{STD_DEV}</span><span style="color:#f1fa8c">%&#34;</span> <span style="color:#ff79c6">|</span> \
</span></span><span style="display:flex;"><span>        mail <span style="color:#ff79c6">-</span>s <span style="color:#f1fa8c">&#34;HDFS均衡告警&#34;</span> $EMAIL_LIST
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 自动启动均衡</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> ! pgrep <span style="color:#ff79c6">-</span>f <span style="color:#f1fa8c">&#34;hdfs.*balancer&#34;</span> <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>dev<span style="color:#ff79c6">/</span>null; then
</span></span><span style="display:flex;"><span>        nohup hdfs balancer <span style="color:#ff79c6">-</span>threshold <span style="color:#bd93f9">10</span> <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>log <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;&amp;</span><span style="color:#bd93f9">1</span> <span style="color:#ff79c6">&amp;</span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#f1fa8c">&#34;$(date): 已自动启动均衡进程&#34;</span> <span style="color:#ff79c6">&gt;&gt;</span> $LOG_FILE
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>fi
</span></span><span style="display:flex;"><span>EOF
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>chmod <span style="color:#ff79c6">+</span>x <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_alert<span style="color:#ff79c6">.</span>sh
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 添加到crontab，每小时检查一次</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;0 * * * * /tmp/balance_alert.sh&#34;</span> <span style="color:#ff79c6">|</span> crontab <span style="color:#ff79c6">-</span>
</span></span></code></pre></div><h2 id="总结">总结</h2>
<p>HDFS均衡是维护集群健康状态的重要操作。通过合理使用均衡参数、建立完善的监控体系、遵循最佳实践，可以确保集群始终保持良好的数据分布和性能表现。</p>
<h3 id="关键要点">关键要点：</h3>
<ol>
<li><strong>及时均衡</strong>：在节点使用率差异超过10%时及时进行均衡</li>
<li><strong>合理参数</strong>：根据集群规模和环境选择合适的均衡参数</li>
<li><strong>持续监控</strong>：建立自动化监控和告警机制</li>
<li><strong>性能优化</strong>：从系统、网络、存储等多个层面进行优化</li>
<li><strong>安全操作</strong>：在业务低峰期进行均衡，确保数据安全</li>
</ol>
<p>通过遵循本指南，您可以有效地管理和维护HDFS集群的数据均衡，确保集群的高可用性和高性能。</p>
]]></content:encoded></item></channel></rss>