<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>大数据 on heyaohua's Blog</title><link>https://blog.heyaohua.com/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/</link><description>Recent content in 大数据 on heyaohua's Blog</description><image><title>heyaohua's Blog</title><url>https://blog.heyaohua.com/og-image.png</url><link>https://blog.heyaohua.com/og-image.png</link></image><generator>Hugo</generator><language>zh-cn</language><lastBuildDate>Tue, 09 Sep 2025 01:00:00 +0800</lastBuildDate><atom:link href="https://blog.heyaohua.com/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/index.xml" rel="self" type="application/rss+xml"/><item><title>最佳实践：调优 Impala 与 Hive 的资源竞争关系，避免 Impala 查询 OOM</title><link>https://blog.heyaohua.com/posts/2025/09/impala-hive-resource-optimization/</link><pubDate>Tue, 09 Sep 2025 01:00:00 +0800</pubDate><guid>https://blog.heyaohua.com/posts/2025/09/impala-hive-resource-optimization/</guid><description>核心结论： 要有效避免 Impala 查询因资源被批处理（Hive/Tez）占满而导致 OOM，需在集群级和服务级两个维度协同调优，重点在于隔离资源、配置队列及精细化设置查询内存和并发。</description><content:encoded><![CDATA[<p><strong>核心结论：</strong>
要有效避免 Impala 查询因资源被批处理（Hive/Tez）占满而导致 OOM，需在集群级和服务级两个维度协同调优，重点在于隔离资源、配置队列及精细化设置查询内存和并发。</p>
<hr>
<h2 id="一集群级资源隔离">一、集群级资源隔离</h2>
<h3 id="1-使用-yarn-容器隔离-hivetez批处理与-impala">1. 使用 YARN 容器隔离 Hive（Tez）批处理与 Impala</h3>
<p>将 Hive-on-Tez 运行在 YARN 上，通过配置不同的 YARN 队列（Queue）来隔离批处理作业与交互式查询。</p>
<p><strong>示例配置（<code>capacity-scheduler.xml</code>）：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>yarn.scheduler.capacity.root.interactive.capacity<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>30<span style="color:#ff79c6">&lt;/value&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>yarn.scheduler.capacity.root.batch.capacity<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>70<span style="color:#ff79c6">&lt;/value&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span></code></pre></div><p>如上，Batch 队列占 70%，Interactive（即 Hive LLAP/Impala）队列占 30%，确保 Impala 始终保留至少 30% 资源。</p>
<h3 id="2-cloudera-manager或-ambari中的-cgroup-资源池">2. Cloudera Manager（或 Ambari）中的 cGroup 资源池</h3>
<ul>
<li>在 Cloudera Manager 上，启用 Impala 服务的 CPU &amp; Memory cGroup 限制</li>
<li>设置 Impala 每台节点最大可用内存比率，以及各服务内不同工作负载（Workload）的最小/最大资源保证</li>
</ul>
<p><strong>配置步骤：</strong></p>
<ol>
<li></li>
</ol>
<p><strong>启用 cGroup 资源管理</strong>`bash</p>
<h1 id="在每个节点上启用-cgroup">在每个节点上启用 cGroup</h1>
<p>sudo systemctl enable cgconfig
sudo systemctl start cgconfig`</p>
<ol start="2">
<li></li>
</ol>
<p><strong>配置资源池</strong>`bash</p>
<h1 id="创建-impala-专用资源池">创建 Impala 专用资源池</h1>
<p>echo &lsquo;group impala {
memory {
memory.limit_in_bytes = 32G;
}
cpu {
cpu.shares = 1024;
}
}&rsquo; &raquo; /etc/cgconfig.conf`</p>
<ol start="3">
<li></li>
</ol>
<p><strong>应用配置</strong><code>bash sudo cgconfigparser -l /etc/cgconfig.conf</code></p>
<hr>
<h2 id="二impala-层面调优">二、Impala 层面调优</h2>
<h3 id="1-配置-admission-control">1. 配置 Admission Control</h3>
<p>启用并配置 Impala 的 <strong>Admission Control</strong>（Impala Daemon → Admission Control）。</p>
<p><strong>关键设置：</strong></p>
<ul>
<li><strong>Concurrent queries limit</strong>（并发查询数）：限制同时执行的查询数量</li>
<li><strong>Queue timeout</strong>（排队超时）：避免过多查询长时间排队</li>
<li><strong>Memory limit per pool</strong>：针对不同资源池（Pool）设置内存上下限</li>
</ul>
<p><strong>配置示例：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 在 Impala 配置文件中添加
</span></span><span style="display:flex;"><span>--admission_control_slots=16
</span></span><span style="display:flex;"><span>--admission_control_stale_topic_threshold_ms=30000
</span></span><span style="display:flex;"><span>--queue_wait_timeout_ms=60000
</span></span></code></pre></div><h3 id="2-定义并使用资源池resource-pools">2. 定义并使用资源池（Resource Pools）</h3>
<p>将查询分别分配到不同的资源池（如 <code>high_mem_pool</code>、<code>standard_pool</code>），并在资源池级别配置：</p>
<ul>
<li><code>max_requests</code>：同时执行最大请求数</li>
<li><code>max_mem</code>：最大内存配额</li>
<li><code>query_timeout_s</code>：超时设置</li>
</ul>
<p><strong>示例配置：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#6272a4">-- 创建高内存资源池
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">ALTER</span> RESOURCE POOL high_mem_pool <span style="color:#ff79c6">SET</span> MAX_MEM<span style="color:#ff79c6">=</span><span style="color:#bd93f9">200</span>GB, MAX_QUERIES<span style="color:#ff79c6">=</span><span style="color:#bd93f9">5</span>;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 创建标准资源池
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">ALTER</span> RESOURCE POOL standard_pool <span style="color:#ff79c6">SET</span> MAX_MEM<span style="color:#ff79c6">=</span><span style="color:#bd93f9">100</span>GB, MAX_QUERIES<span style="color:#ff79c6">=</span><span style="color:#bd93f9">10</span>;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 创建轻量级资源池
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">ALTER</span> RESOURCE POOL light_pool <span style="color:#ff79c6">SET</span> MAX_MEM<span style="color:#ff79c6">=</span><span style="color:#bd93f9">50</span>GB, MAX_QUERIES<span style="color:#ff79c6">=</span><span style="color:#bd93f9">20</span>;
</span></span></code></pre></div><p><strong>使用资源池：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#6272a4">-- 在查询中指定资源池
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">SET</span> REQUEST_POOL<span style="color:#ff79c6">=</span>high_mem_pool;
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">SELECT</span> <span style="color:#ff79c6">*</span> <span style="color:#ff79c6">FROM</span> large_table <span style="color:#ff79c6">WHERE</span> complex_condition;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 或者在连接时指定
</span></span></span><span style="display:flex;"><span><span style="color:#6272a4">-- impala-shell -i hostname:21000 --request_pool=standard_pool
</span></span></span></code></pre></div><h3 id="3-调整单查询内存限制">3. 调整单查询内存限制</h3>
<p>Impala 默认使用所有可用内存作为单查询内存上限。可通过启动参数或查询选项限制：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>-- 设置单查询内存限制
</span></span><span style="display:flex;"><span>SET MEM_LIMIT=8g;  -- 单查询可用内存上限
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>-- 设置查询超时
</span></span><span style="display:flex;"><span>SET QUERY_TIMEOUT_S=3600;  -- 1小时超时
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>-- 设置批处理大小
</span></span><span style="display:flex;"><span>SET BATCH_SIZE=1024;
</span></span></code></pre></div><p><strong>在 Cloudera Manager 中的全局配置：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># Impala Daemon → Configuration → Query Options
</span></span><span style="display:flex;"><span>--default_query_options=MEM_LIMIT=8GB,QUERY_TIMEOUT_S=3600
</span></span></code></pre></div><h3 id="4-优化查询执行参数">4. 优化查询执行参数</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>-- 启用运行时过滤
</span></span><span style="display:flex;"><span>SET RUNTIME_FILTER_MODE=GLOBAL;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>-- 优化 Join 策略
</span></span><span style="display:flex;"><span>SET DISABLE_CODEGEN=false;
</span></span><span style="display:flex;"><span>SET NUM_NODES=0;  -- 自动选择节点数
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>-- 控制并行度
</span></span><span style="display:flex;"><span>SET NUM_SCANNER_THREADS=4;
</span></span><span style="display:flex;"><span>SET MT_DOP=4;  -- 多线程并行度
</span></span></code></pre></div><hr>
<h2 id="三hivellap-层面调优">三、Hive/LLAP 层面调优</h2>
<h3 id="1-限制-llap-容器内存">1. 限制 LLAP 容器内存</h3>
<p>在 Hive LLAP 中，将 LLAP daemon 容器的内存和并发分配合理划分，避免 LLAP 过度消耗 YARN 容器。</p>
<p><strong>关键配置参数：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span><span style="color:#6272a4">&lt;!-- hive-site.xml --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>hive.llap.daemon.memory.per.instance.mb<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>16384<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 16GB per LLAP daemon --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>hive.llap.daemon.num.executors<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>8<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 每个 daemon 的执行器数量 --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>hive.llap.io.memory.size<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>8192<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- IO 缓存大小 --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>hive.llap.daemon.vcpus.per.instance<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>8<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 每个实例的虚拟CPU数 --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span></code></pre></div><h3 id="2-控制-hive-并发与队列">2. 控制 Hive 并发与队列</h3>
<p>在 Hive Server2 或 Tez 上，设置相关参数防止单个大作业占满整个队列。</p>
<p><strong>Tez 配置：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span><span style="color:#6272a4">&lt;!-- tez-site.xml --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>tez.am.resource.memory.mb<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>4096<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- Application Master 内存 --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>tez.task.resource.memory.mb<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>2048<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 单个任务内存 --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>tez.am.container.reuse.enabled<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>true<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 启用容器复用 --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>tez.am.container.idle.release-timeout-min.millis<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>10000<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 容器空闲释放时间 --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span></code></pre></div><p><strong>YARN 队列配置：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span><span style="color:#6272a4">&lt;!-- capacity-scheduler.xml --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>yarn.scheduler.capacity.root.batch.maximum-applications<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>50<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 批处理队列最大应用数 --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>yarn.scheduler.capacity.root.batch.maximum-am-resource-percent<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>0.3<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- AM 资源占比限制 --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>yarn.scheduler.capacity.root.interactive.user-limit-factor<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>2<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 用户资源倍数限制 --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span></code></pre></div><h3 id="3-hive-查询优化">3. Hive 查询优化</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>-- 启用向量化执行
</span></span><span style="display:flex;"><span>SET hive.vectorized.execution.enabled=true;
</span></span><span style="display:flex;"><span>SET hive.vectorized.execution.reduce.enabled=true;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>-- 优化 Join 策略
</span></span><span style="display:flex;"><span>SET hive.auto.convert.join=true;
</span></span><span style="display:flex;"><span>SET hive.mapjoin.smalltable.filesize=25000000;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>-- 启用 CBO（基于成本的优化器）
</span></span><span style="display:flex;"><span>SET hive.cbo.enable=true;
</span></span><span style="display:flex;"><span>SET hive.compute.query.using.stats=true;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>-- 控制并行度
</span></span><span style="display:flex;"><span>SET hive.exec.parallel=true;
</span></span><span style="display:flex;"><span>SET hive.exec.parallel.thread.number=8;
</span></span></code></pre></div><hr>
<h2 id="四运维与监控建议">四、运维与监控建议</h2>
<h3 id="1-实时监控与告警">1. 实时监控与告警</h3>
<p><strong>利用 Cloudera Manager 监控：</strong></p>
<ul>
<li><strong>Impala 指标监控</strong>：</li>
<li>查询队列长度</li>
<li>内存使用率</li>
<li>查询执行时间</li>
<li></li>
</ul>
<p>失败查询数量</p>
<ul>
<li></li>
</ul>
<p><strong>YARN 队列监控</strong>：</p>
<ul>
<li>队列资源使用率</li>
<li>应用等待时间</li>
<li>容器分配情况</li>
</ul>
<p><strong>Grafana 监控面板配置：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&#34;dashboard&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">&#34;title&#34;</span>: <span style="color:#f1fa8c">&#34;Impala &amp; Hive Resource Monitor&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">&#34;panels&#34;</span>: [
</span></span><span style="display:flex;"><span>      {
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;title&#34;</span>: <span style="color:#f1fa8c">&#34;Impala Memory Usage&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;type&#34;</span>: <span style="color:#f1fa8c">&#34;graph&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;targets&#34;</span>: [
</span></span><span style="display:flex;"><span>          {
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">&#34;expr&#34;</span>: <span style="color:#f1fa8c">&#34;impala_daemon_mem_rss / impala_daemon_mem_limit * 100&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">&#34;legendFormat&#34;</span>: <span style="color:#f1fa8c">&#34;Memory Usage %&#34;</span>
</span></span><span style="display:flex;"><span>          }
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      },
</span></span><span style="display:flex;"><span>      {
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;title&#34;</span>: <span style="color:#f1fa8c">&#34;YARN Queue Utilization&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;type&#34;</span>: <span style="color:#f1fa8c">&#34;graph&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;targets&#34;</span>: [
</span></span><span style="display:flex;"><span>          {
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">&#34;expr&#34;</span>: <span style="color:#f1fa8c">&#34;yarn_queue_used_capacity{queue=\&#34;interactive\&#34;}&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">&#34;legendFormat&#34;</span>: <span style="color:#f1fa8c">&#34;Interactive Queue&#34;</span>
</span></span><span style="display:flex;"><span>          },
</span></span><span style="display:flex;"><span>          {
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">&#34;expr&#34;</span>: <span style="color:#f1fa8c">&#34;yarn_queue_used_capacity{queue=\&#34;batch\&#34;}&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">&#34;legendFormat&#34;</span>: <span style="color:#f1fa8c">&#34;Batch Queue&#34;</span>
</span></span><span style="display:flex;"><span>          }
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p><strong>告警规则配置：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># Prometheus 告警规则
</span></span><span style="display:flex;"><span>groups:
</span></span><span style="display:flex;"><span>- name: impala_alerts
</span></span><span style="display:flex;"><span>  rules:
</span></span><span style="display:flex;"><span>  - alert: ImpalaHighMemoryUsage
</span></span><span style="display:flex;"><span>    expr: impala_daemon_mem_rss / impala_daemon_mem_limit &gt; 0.9
</span></span><span style="display:flex;"><span>    for: 5m
</span></span><span style="display:flex;"><span>    labels:
</span></span><span style="display:flex;"><span>      severity: warning
</span></span><span style="display:flex;"><span>    annotations:
</span></span><span style="display:flex;"><span>      summary: &#34;Impala daemon memory usage is high&#34;
</span></span><span style="display:flex;"><span>      description: &#34;Memory usage is {{ $value }}%&#34;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - alert: ImpalaQueryQueueHigh
</span></span><span style="display:flex;"><span>    expr: impala_admission_controller_queue_size &gt; 10
</span></span><span style="display:flex;"><span>    for: 2m
</span></span><span style="display:flex;"><span>    labels:
</span></span><span style="display:flex;"><span>      severity: critical
</span></span><span style="display:flex;"><span>    annotations:
</span></span><span style="display:flex;"><span>      summary: &#34;Impala query queue is too long&#34;
</span></span><span style="display:flex;"><span>      description: &#34;Queue size: {{ $value }}&#34;
</span></span></code></pre></div><h3 id="2-定期审计大查询">2. 定期审计大查询</h3>
<p><strong>查询性能分析脚本：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4">#!/usr/bin/env python3</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># -*- coding: utf-8 -*-</span>
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;&#34;&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">Impala 查询性能分析脚本
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">用于识别和分析耗时/耗内存的查询
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> impala.dbapi
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> pandas <span style="color:#ff79c6">as</span> pd
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">from</span> datetime <span style="color:#ff79c6">import</span> datetime, timedelta
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">analyze_slow_queries</span>(host<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;localhost&#39;</span>, port<span style="color:#ff79c6">=</span><span style="color:#bd93f9">21000</span>, days<span style="color:#ff79c6">=</span><span style="color:#bd93f9">7</span>):
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;&#34;&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    分析慢查询
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    Args:
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        host: Impala 主机地址
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        port: Impala 端口
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        days: 分析最近几天的查询
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    &#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    conn <span style="color:#ff79c6">=</span> impala<span style="color:#ff79c6">.</span>dbapi<span style="color:#ff79c6">.</span>connect(host<span style="color:#ff79c6">=</span>host, port<span style="color:#ff79c6">=</span>port)
</span></span><span style="display:flex;"><span>    cursor <span style="color:#ff79c6">=</span> conn<span style="color:#ff79c6">.</span>cursor()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 查询最近的慢查询</span>
</span></span><span style="display:flex;"><span>    query <span style="color:#ff79c6">=</span> <span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;&#34;&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    SELECT
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        query_id,
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        user,
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        default_db,
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        statement,
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        start_time,
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        end_time,
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        duration_ms,
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        rows_produced,
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        peak_memory_usage
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    FROM sys.impala_query_log
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    WHERE start_time &gt;= NOW() - INTERVAL </span><span style="color:#f1fa8c">{</span>days<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c"> DAYS
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        AND duration_ms &gt; 60000  -- 超过1分钟的查询
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    ORDER BY duration_ms DESC
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    LIMIT 50
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    &#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    cursor<span style="color:#ff79c6">.</span>execute(query)
</span></span><span style="display:flex;"><span>    results <span style="color:#ff79c6">=</span> cursor<span style="color:#ff79c6">.</span>fetchall()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 转换为 DataFrame 进行分析</span>
</span></span><span style="display:flex;"><span>    df <span style="color:#ff79c6">=</span> pd<span style="color:#ff79c6">.</span>DataFrame(results, columns<span style="color:#ff79c6">=</span>[
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;query_id&#39;</span>, <span style="color:#f1fa8c">&#39;user&#39;</span>, <span style="color:#f1fa8c">&#39;default_db&#39;</span>, <span style="color:#f1fa8c">&#39;statement&#39;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;start_time&#39;</span>, <span style="color:#f1fa8c">&#39;end_time&#39;</span>, <span style="color:#f1fa8c">&#39;duration_ms&#39;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;rows_produced&#39;</span>, <span style="color:#f1fa8c">&#39;peak_memory_usage&#39;</span>
</span></span><span style="display:flex;"><span>    ])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 分析结果</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;=== 慢查询分析报告 ===&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;分析时间范围: 最近 </span><span style="color:#f1fa8c">{</span>days<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c"> 天&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;慢查询总数: </span><span style="color:#f1fa8c">{</span><span style="color:#8be9fd;font-style:italic">len</span>(df)<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;平均执行时间: </span><span style="color:#f1fa8c">{</span>df[<span style="color:#f1fa8c">&#39;duration_ms&#39;</span>]<span style="color:#ff79c6">.</span>mean()<span style="color:#ff79c6">/</span><span style="color:#bd93f9">1000</span><span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c"> 秒&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;最大内存使用: </span><span style="color:#f1fa8c">{</span>df[<span style="color:#f1fa8c">&#39;peak_memory_usage&#39;</span>]<span style="color:#ff79c6">.</span>max()<span style="color:#ff79c6">/</span><span style="color:#bd93f9">1024</span><span style="color:#ff79c6">/</span><span style="color:#bd93f9">1024</span><span style="color:#ff79c6">/</span><span style="color:#bd93f9">1024</span><span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c"> GB&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 按用户统计</span>
</span></span><span style="display:flex;"><span>    user_stats <span style="color:#ff79c6">=</span> df<span style="color:#ff79c6">.</span>groupby(<span style="color:#f1fa8c">&#39;user&#39;</span>)<span style="color:#ff79c6">.</span>agg({
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;query_id&#39;</span>: <span style="color:#f1fa8c">&#39;count&#39;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;duration_ms&#39;</span>: <span style="color:#f1fa8c">&#39;mean&#39;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;peak_memory_usage&#39;</span>: <span style="color:#f1fa8c">&#39;max&#39;</span>
</span></span><span style="display:flex;"><span>    })<span style="color:#ff79c6">.</span>round(<span style="color:#bd93f9">2</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">=== 用户查询统计 ===&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(user_stats)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 识别需要优化的查询</span>
</span></span><span style="display:flex;"><span>    high_memory_queries <span style="color:#ff79c6">=</span> df[df[<span style="color:#f1fa8c">&#39;peak_memory_usage&#39;</span>] <span style="color:#ff79c6">&gt;</span> <span style="color:#bd93f9">10</span><span style="color:#ff79c6">*</span><span style="color:#bd93f9">1024</span><span style="color:#ff79c6">*</span><span style="color:#bd93f9">1024</span><span style="color:#ff79c6">*</span><span style="color:#bd93f9">1024</span>]  <span style="color:#6272a4"># 超过10GB</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">=== 高内存查询 (&gt;10GB): </span><span style="color:#f1fa8c">{</span><span style="color:#8be9fd;font-style:italic">len</span>(high_memory_queries)<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c"> 条 ===&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">for</span> _, query <span style="color:#ff79c6">in</span> high_memory_queries<span style="color:#ff79c6">.</span>iterrows():
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;Query ID: </span><span style="color:#f1fa8c">{</span>query[<span style="color:#f1fa8c">&#39;query_id&#39;</span>]<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;User: </span><span style="color:#f1fa8c">{</span>query[<span style="color:#f1fa8c">&#39;user&#39;</span>]<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;Memory: </span><span style="color:#f1fa8c">{</span>query[<span style="color:#f1fa8c">&#39;peak_memory_usage&#39;</span>]<span style="color:#ff79c6">/</span><span style="color:#bd93f9">1024</span><span style="color:#ff79c6">/</span><span style="color:#bd93f9">1024</span><span style="color:#ff79c6">/</span><span style="color:#bd93f9">1024</span><span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c"> GB&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;Duration: </span><span style="color:#f1fa8c">{</span>query[<span style="color:#f1fa8c">&#39;duration_ms&#39;</span>]<span style="color:#ff79c6">/</span><span style="color:#bd93f9">1000</span><span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c"> seconds&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;Statement: </span><span style="color:#f1fa8c">{</span>query[<span style="color:#f1fa8c">&#39;statement&#39;</span>][:<span style="color:#bd93f9">100</span>]<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">...&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;-&#34;</span> <span style="color:#ff79c6">*</span> <span style="color:#bd93f9">50</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    conn<span style="color:#ff79c6">.</span>close()
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">return</span> df
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">get_query_profile</span>(query_id, host<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;localhost&#39;</span>, port<span style="color:#ff79c6">=</span><span style="color:#bd93f9">21000</span>):
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;&#34;&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    获取查询的详细执行计划
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    Args:
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        query_id: 查询ID
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        host: Impala 主机地址
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        port: Impala 端口
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    &#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    conn <span style="color:#ff79c6">=</span> impala<span style="color:#ff79c6">.</span>dbapi<span style="color:#ff79c6">.</span>connect(host<span style="color:#ff79c6">=</span>host, port<span style="color:#ff79c6">=</span>port)
</span></span><span style="display:flex;"><span>    cursor <span style="color:#ff79c6">=</span> conn<span style="color:#ff79c6">.</span>cursor()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    cursor<span style="color:#ff79c6">.</span>execute(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;PROFILE </span><span style="color:#f1fa8c">{</span>query_id<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#34;</span>)
</span></span><span style="display:flex;"><span>    profile <span style="color:#ff79c6">=</span> cursor<span style="color:#ff79c6">.</span>fetchall()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;=== Query Profile for </span><span style="color:#f1fa8c">{</span>query_id<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c"> ===&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> profile:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(line[<span style="color:#bd93f9">0</span>])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    conn<span style="color:#ff79c6">.</span>close()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> <span style="color:#8be9fd;font-style:italic">__name__</span> <span style="color:#ff79c6">==</span> <span style="color:#f1fa8c">&#34;__main__&#34;</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 分析最近7天的慢查询</span>
</span></span><span style="display:flex;"><span>    df <span style="color:#ff79c6">=</span> analyze_slow_queries()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 如果有高内存查询，获取详细的执行计划</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#8be9fd;font-style:italic">len</span>(df) <span style="color:#ff79c6">&gt;</span> <span style="color:#bd93f9">0</span>:
</span></span><span style="display:flex;"><span>        top_query_id <span style="color:#ff79c6">=</span> df<span style="color:#ff79c6">.</span>iloc[<span style="color:#bd93f9">0</span>][<span style="color:#f1fa8c">&#39;query_id&#39;</span>]
</span></span><span style="display:flex;"><span>        get_query_profile(top_query_id)
</span></span></code></pre></div><p><strong>SQL 优化建议脚本：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#6272a4">-- 查询优化检查清单
</span></span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 1. 检查表统计信息是否最新
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">SHOW</span> <span style="color:#ff79c6">TABLE</span> STATS your_table;
</span></span><span style="display:flex;"><span>COMPUTE STATS your_table;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 2. 检查分区剪裁是否生效
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">EXPLAIN</span> <span style="color:#ff79c6">SELECT</span> <span style="color:#ff79c6">*</span> <span style="color:#ff79c6">FROM</span> partitioned_table <span style="color:#ff79c6">WHERE</span> partition_col <span style="color:#ff79c6">=</span> <span style="color:#f1fa8c">&#39;value&#39;</span>;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 3. 检查列裁剪是否生效
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">EXPLAIN</span> <span style="color:#ff79c6">SELECT</span> col1, col2 <span style="color:#ff79c6">FROM</span> large_table <span style="color:#ff79c6">WHERE</span> condition;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 4. 检查 Join 策略
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">SET</span> EXPLAIN_LEVEL<span style="color:#ff79c6">=</span><span style="color:#bd93f9">2</span>;
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">EXPLAIN</span> <span style="color:#ff79c6">SELECT</span> <span style="color:#ff79c6">*</span> <span style="color:#ff79c6">FROM</span> table1 t1 <span style="color:#ff79c6">JOIN</span> table2 t2 <span style="color:#ff79c6">ON</span> t1.id <span style="color:#ff79c6">=</span> t2.id;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 5. 优化大表 Join
</span></span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 使用 broadcast join 对小表
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">SET</span> RUNTIME_FILTER_MODE<span style="color:#ff79c6">=</span><span style="color:#ff79c6">GLOBAL</span>;
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">SELECT</span> <span style="color:#6272a4">/*+ BROADCAST(small_table) */</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">*</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">FROM</span> large_table
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">JOIN</span> small_table <span style="color:#ff79c6">ON</span> large_table.id <span style="color:#ff79c6">=</span> small_table.id;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 6. 使用分区 Join
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">SELECT</span> <span style="color:#ff79c6">*</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">FROM</span> partitioned_table1 pt1
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">JOIN</span> partitioned_table2 pt2
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">ON</span> pt1.partition_key <span style="color:#ff79c6">=</span> pt2.partition_key
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">AND</span> pt1.join_key <span style="color:#ff79c6">=</span> pt2.join_key
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">WHERE</span> pt1.partition_key <span style="color:#ff79c6">=</span> <span style="color:#f1fa8c">&#39;specific_partition&#39;</span>;
</span></span></code></pre></div><h3 id="3-版本与补丁管理">3. 版本与补丁管理</h3>
<p><strong>版本兼容性检查：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#ff79c6">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查 Impala 和 Hive 版本兼容性</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;=== 组件版本信息 ===&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;Impala Version:&#34;</span>
</span></span><span style="display:flex;"><span>impala-shell --version
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;\nHive Version:&#34;</span>
</span></span><span style="display:flex;"><span>hive --version
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;\nHadoop Version:&#34;</span>
</span></span><span style="display:flex;"><span>hadoop version
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;\nYARN Version:&#34;</span>
</span></span><span style="display:flex;"><span>yarn version
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查关键配置</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;\n=== 关键配置检查 ===&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;YARN 调度器类型:&#34;</span>
</span></span><span style="display:flex;"><span>hadoop conf -get yarn.resourcemanager.scheduler.class
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;\nImpala Admission Control:&#34;</span>
</span></span><span style="display:flex;"><span>impala-shell -q <span style="color:#f1fa8c">&#34;SHOW CONFIG&#34;</span> | grep admission
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;\nHive LLAP 状态:&#34;</span>
</span></span><span style="display:flex;"><span>hive --service llap --instances
</span></span></code></pre></div><p><strong>自动化补丁检查脚本：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4">#!/usr/bin/env python3</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># -*- coding: utf-8 -*-</span>
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;&#34;&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">自动检查 Impala/Hive 相关组件的补丁状态
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> subprocess
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> re
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">from</span> packaging <span style="color:#ff79c6">import</span> version
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">check_component_version</span>(component_name, current_version, min_recommended_version):
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;&#34;&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    检查组件版本是否满足最低推荐版本
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    Args:
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        component_name: 组件名称
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        current_version: 当前版本
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">        min_recommended_version: 最低推荐版本
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    &#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">try</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> version<span style="color:#ff79c6">.</span>parse(current_version) <span style="color:#ff79c6">&gt;=</span> version<span style="color:#ff79c6">.</span>parse(min_recommended_version):
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;✅ </span><span style="color:#f1fa8c">{</span>component_name<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">: </span><span style="color:#f1fa8c">{</span>current_version<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c"> (推荐版本: </span><span style="color:#f1fa8c">{</span>min_recommended_version<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">)&#34;</span>)
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">return</span> <span style="color:#ff79c6">True</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">else</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;⚠️  </span><span style="color:#f1fa8c">{</span>component_name<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">: </span><span style="color:#f1fa8c">{</span>current_version<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c"> (需要升级到: </span><span style="color:#f1fa8c">{</span>min_recommended_version<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">)&#34;</span>)
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">return</span> <span style="color:#ff79c6">False</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">except</span> Exception <span style="color:#ff79c6">as</span> e:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;❌ </span><span style="color:#f1fa8c">{</span>component_name<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">: 版本检查失败 - </span><span style="color:#f1fa8c">{</span>e<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#ff79c6">False</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">get_impala_version</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;&#34;&#34;获取 Impala 版本&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">try</span>:
</span></span><span style="display:flex;"><span>        result <span style="color:#ff79c6">=</span> subprocess<span style="color:#ff79c6">.</span>run([<span style="color:#f1fa8c">&#39;impala-shell&#39;</span>, <span style="color:#f1fa8c">&#39;--version&#39;</span>],
</span></span><span style="display:flex;"><span>                              capture_output<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>, text<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>)
</span></span><span style="display:flex;"><span>        version_match <span style="color:#ff79c6">=</span> re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;version (\d+\.\d+\.\d+)&#39;</span>, result<span style="color:#ff79c6">.</span>stdout)
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> version_match<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>) <span style="color:#ff79c6">if</span> version_match <span style="color:#ff79c6">else</span> <span style="color:#f1fa8c">&#34;unknown&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">except</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#f1fa8c">&#34;unknown&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">get_hive_version</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;&#34;&#34;获取 Hive 版本&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">try</span>:
</span></span><span style="display:flex;"><span>        result <span style="color:#ff79c6">=</span> subprocess<span style="color:#ff79c6">.</span>run([<span style="color:#f1fa8c">&#39;hive&#39;</span>, <span style="color:#f1fa8c">&#39;--version&#39;</span>],
</span></span><span style="display:flex;"><span>                              capture_output<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>, text<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>)
</span></span><span style="display:flex;"><span>        version_match <span style="color:#ff79c6">=</span> re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;Hive (\d+\.\d+\.\d+)&#39;</span>, result<span style="color:#ff79c6">.</span>stdout)
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> version_match<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>) <span style="color:#ff79c6">if</span> version_match <span style="color:#ff79c6">else</span> <span style="color:#f1fa8c">&#34;unknown&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">except</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#f1fa8c">&#34;unknown&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">main</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;&#34;&#34;主函数&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;=== 组件版本检查 ===&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 定义最低推荐版本</span>
</span></span><span style="display:flex;"><span>    min_versions <span style="color:#ff79c6">=</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;Impala&#39;</span>: <span style="color:#f1fa8c">&#39;3.4.0&#39;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;Hive&#39;</span>: <span style="color:#f1fa8c">&#39;3.1.2&#39;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;Hadoop&#39;</span>: <span style="color:#f1fa8c">&#39;3.2.0&#39;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 检查各组件版本</span>
</span></span><span style="display:flex;"><span>    impala_version <span style="color:#ff79c6">=</span> get_impala_version()
</span></span><span style="display:flex;"><span>    hive_version <span style="color:#ff79c6">=</span> get_hive_version()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    results <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span>    results<span style="color:#ff79c6">.</span>append(check_component_version(<span style="color:#f1fa8c">&#39;Impala&#39;</span>, impala_version, min_versions[<span style="color:#f1fa8c">&#39;Impala&#39;</span>]))
</span></span><span style="display:flex;"><span>    results<span style="color:#ff79c6">.</span>append(check_component_version(<span style="color:#f1fa8c">&#39;Hive&#39;</span>, hive_version, min_versions[<span style="color:#f1fa8c">&#39;Hive&#39;</span>]))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 输出总结</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">=== 检查结果 ===&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#8be9fd;font-style:italic">all</span>(results):
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;✅ 所有组件版本都满足推荐要求&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">else</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;⚠️  部分组件需要升级，请参考官方升级指南&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;   - Impala: https://impala.apache.org/docs/build.html&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;   - Hive: https://hive.apache.org/downloads.html&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> <span style="color:#8be9fd;font-style:italic">__name__</span> <span style="color:#ff79c6">==</span> <span style="color:#f1fa8c">&#34;__main__&#34;</span>:
</span></span><span style="display:flex;"><span>    main()
</span></span></code></pre></div><hr>
<h2 id="五故障排查与应急处理">五、故障排查与应急处理</h2>
<h3 id="1-常见-oom-场景分析">1. 常见 OOM 场景分析</h3>
<p><strong>场景一：大表 Join 导致的 OOM</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#6272a4">-- 问题查询示例
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">SELECT</span> <span style="color:#ff79c6">*</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">FROM</span> large_table1 lt1
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">JOIN</span> large_table2 lt2 <span style="color:#ff79c6">ON</span> lt1.id <span style="color:#ff79c6">=</span> lt2.id;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 优化方案
</span></span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 1. 添加过滤条件
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">SELECT</span> <span style="color:#ff79c6">*</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">FROM</span> large_table1 lt1
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">JOIN</span> large_table2 lt2 <span style="color:#ff79c6">ON</span> lt1.id <span style="color:#ff79c6">=</span> lt2.id
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">WHERE</span> lt1.date_col <span style="color:#ff79c6">&gt;=</span> <span style="color:#f1fa8c">&#39;2024-01-01&#39;</span>;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 2. 使用分区 Join
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">SELECT</span> <span style="color:#ff79c6">*</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">FROM</span> large_table1 lt1
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">JOIN</span> large_table2 lt2
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">ON</span> lt1.id <span style="color:#ff79c6">=</span> lt2.id
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">AND</span> lt1.partition_col <span style="color:#ff79c6">=</span> lt2.partition_col
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">WHERE</span> lt1.partition_col <span style="color:#ff79c6">=</span> <span style="color:#f1fa8c">&#39;specific_partition&#39;</span>;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 3. 分阶段处理
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">CREATE</span> <span style="color:#ff79c6">TABLE</span> temp_result <span style="color:#ff79c6">AS</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">SELECT</span> lt1.id, lt1.col1, lt2.col2
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">FROM</span> large_table1 lt1
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">JOIN</span> (
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">SELECT</span> id, col2
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">FROM</span> large_table2
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">WHERE</span> filter_condition
</span></span><span style="display:flex;"><span>) lt2 <span style="color:#ff79c6">ON</span> lt1.id <span style="color:#ff79c6">=</span> lt2.id;
</span></span></code></pre></div><p><strong>场景二：聚合查询内存溢出</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#6272a4">-- 问题查询
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">SELECT</span>
</span></span><span style="display:flex;"><span>  high_cardinality_col,
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">COUNT</span>(<span style="color:#ff79c6">*</span>),
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">SUM</span>(large_numeric_col),
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">AVG</span>(another_col)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">FROM</span> huge_table
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">GROUP</span> <span style="color:#ff79c6">BY</span> high_cardinality_col;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 优化方案
</span></span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 1. 增加预聚合
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">CREATE</span> <span style="color:#ff79c6">TABLE</span> pre_aggregated <span style="color:#ff79c6">AS</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">SELECT</span>
</span></span><span style="display:flex;"><span>  partition_col,
</span></span><span style="display:flex;"><span>  high_cardinality_col,
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">COUNT</span>(<span style="color:#ff79c6">*</span>) <span style="color:#ff79c6">as</span> cnt,
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">SUM</span>(large_numeric_col) <span style="color:#ff79c6">as</span> sum_val
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">FROM</span> huge_table
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">WHERE</span> date_col <span style="color:#ff79c6">&gt;=</span> <span style="color:#f1fa8c">&#39;2024-01-01&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">GROUP</span> <span style="color:#ff79c6">BY</span> partition_col, high_cardinality_col;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">-- 2. 使用窗口函数替代
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">SELECT</span>
</span></span><span style="display:flex;"><span>  high_cardinality_col,
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">COUNT</span>(<span style="color:#ff79c6">*</span>) OVER (PARTITION <span style="color:#ff79c6">BY</span> high_cardinality_col) <span style="color:#ff79c6">as</span> cnt
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">FROM</span> huge_table
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">WHERE</span> sample_condition;
</span></span></code></pre></div><h3 id="2-应急处理流程">2. 应急处理流程</h3>
<p><strong>紧急情况处理脚本：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#ff79c6">#!/</span>bin<span style="color:#ff79c6">/</span>bash
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">#</span> Impala OOM 应急处理脚本
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;=== Impala OOM 应急处理 ===&#34;</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;时间: $(date)&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">#</span> <span style="color:#bd93f9">1</span>. 检查当前运行的查询
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;\n1. 检查当前运行查询...&#34;</span>
</span></span><span style="display:flex;"><span>impala<span style="color:#ff79c6">-</span>shell <span style="color:#ff79c6">-</span>q <span style="color:#f1fa8c">&#34;SHOW QUERIES&#34;</span> <span style="color:#ff79c6">|</span> head <span style="color:#ff79c6">-</span><span style="color:#bd93f9">20</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">#</span> <span style="color:#bd93f9">2</span>. 检查资源使用情况
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;\n2. 检查资源使用情况...&#34;</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;内存使用:&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">free</span> <span style="color:#ff79c6">-</span>h
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;\nCPU 使用:&#34;</span>
</span></span><span style="display:flex;"><span>top <span style="color:#ff79c6">-</span>bn1 <span style="color:#ff79c6">|</span> head <span style="color:#ff79c6">-</span><span style="color:#bd93f9">10</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">#</span> <span style="color:#bd93f9">3</span>. 检查 YARN 队列状态
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;\n3. 检查 YARN 队列状态...&#34;</span>
</span></span><span style="display:flex;"><span>yarn queue <span style="color:#ff79c6">-</span>status interactive
</span></span><span style="display:flex;"><span>yarn queue <span style="color:#ff79c6">-</span>status batch
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">#</span> <span style="color:#bd93f9">4</span>. 取消长时间运行的查询
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;\n4. 检查长时间运行的查询...&#34;</span>
</span></span><span style="display:flex;"><span>impala<span style="color:#ff79c6">-</span>shell <span style="color:#ff79c6">-</span>q <span style="color:#f1fa8c">&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">SELECT query_id, user, duration_ms/1000 as duration_sec, statement
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">FROM sys.impala_query_log
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">WHERE end_time IS NULL
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">  AND start_time &lt; NOW() - INTERVAL 10 MINUTES
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">ORDER BY start_time
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;</span> <span style="color:#ff79c6">|</span> while <span style="color:#ff79c6">read</span> query_id <span style="color:#ff79c6">user</span> duration <span style="color:#ff79c6">statement</span>; <span style="color:#ff79c6">do</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> [ <span style="color:#f1fa8c">&#34;$duration&#34;</span> <span style="color:#ff79c6">-</span>gt <span style="color:#bd93f9">600</span> ]; <span style="color:#ff79c6">then</span>  <span style="color:#ff79c6">#</span> 超过<span style="color:#bd93f9">10</span>分钟
</span></span><span style="display:flex;"><span>        echo <span style="color:#f1fa8c">&#34;发现长时间运行查询: $query_id (用户: $user, 时长: ${duration}秒)&#34;</span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#f1fa8c">&#34;语句: ${statement:0:100}...&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">read</span> <span style="color:#ff79c6">-</span>p <span style="color:#f1fa8c">&#34;是否取消此查询? (y/N): &#34;</span> confirm
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> [ <span style="color:#f1fa8c">&#34;$confirm&#34;</span> <span style="color:#ff79c6">=</span> <span style="color:#f1fa8c">&#34;y&#34;</span> ] <span style="color:#ff79c6">||</span> [ <span style="color:#f1fa8c">&#34;$confirm&#34;</span> <span style="color:#ff79c6">=</span> <span style="color:#f1fa8c">&#34;Y&#34;</span> ]; <span style="color:#ff79c6">then</span>
</span></span><span style="display:flex;"><span>            impala<span style="color:#ff79c6">-</span>shell <span style="color:#ff79c6">-</span>q <span style="color:#f1fa8c">&#34;CANCEL &#39;$query_id&#39;&#34;</span>
</span></span><span style="display:flex;"><span>            echo <span style="color:#f1fa8c">&#34;已取消查询: $query_id&#34;</span>
</span></span><span style="display:flex;"><span>        fi
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>done
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">#</span> <span style="color:#bd93f9">5</span>. 临时调整资源限制
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;\n5. 临时调整资源限制...&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">read</span> <span style="color:#ff79c6">-</span>p <span style="color:#f1fa8c">&#34;是否临时降低内存限制? (y/N): &#34;</span> adjust_mem
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> [ <span style="color:#f1fa8c">&#34;$adjust_mem&#34;</span> <span style="color:#ff79c6">=</span> <span style="color:#f1fa8c">&#34;y&#34;</span> ] <span style="color:#ff79c6">||</span> [ <span style="color:#f1fa8c">&#34;$adjust_mem&#34;</span> <span style="color:#ff79c6">=</span> <span style="color:#f1fa8c">&#34;Y&#34;</span> ]; <span style="color:#ff79c6">then</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;设置临时内存限制为 4GB...&#34;</span>
</span></span><span style="display:flex;"><span>    impala<span style="color:#ff79c6">-</span>shell <span style="color:#ff79c6">-</span>q <span style="color:#f1fa8c">&#34;SET MEM_LIMIT=4GB&#34;</span>
</span></span><span style="display:flex;"><span>fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">#</span> <span style="color:#bd93f9">6</span>. 重启 Impala 服务（最后手段）
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;\n6. 服务重启选项...&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">read</span> <span style="color:#ff79c6">-</span>p <span style="color:#f1fa8c">&#34;是否需要重启 Impala 服务? (y/N): &#34;</span> restart_service
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> [ <span style="color:#f1fa8c">&#34;$restart_service&#34;</span> <span style="color:#ff79c6">=</span> <span style="color:#f1fa8c">&#34;y&#34;</span> ] <span style="color:#ff79c6">||</span> [ <span style="color:#f1fa8c">&#34;$restart_service&#34;</span> <span style="color:#ff79c6">=</span> <span style="color:#f1fa8c">&#34;Y&#34;</span> ]; <span style="color:#ff79c6">then</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;重启 Impala 服务...&#34;</span>
</span></span><span style="display:flex;"><span>    sudo systemctl <span style="color:#ff79c6">restart</span> impala<span style="color:#ff79c6">-</span>server
</span></span><span style="display:flex;"><span>    sudo systemctl <span style="color:#ff79c6">restart</span> impala<span style="color:#ff79c6">-</span><span style="color:#ff79c6">state</span><span style="color:#ff79c6">-</span>store
</span></span><span style="display:flex;"><span>    sudo systemctl <span style="color:#ff79c6">restart</span> impala<span style="color:#ff79c6">-</span><span style="color:#ff79c6">catalog</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;Impala 服务已重启&#34;</span>
</span></span><span style="display:flex;"><span>fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;\n=== 应急处理完成 ===&#34;</span>
</span></span></code></pre></div><h3 id="3-预防性维护">3. 预防性维护</h3>
<p><strong>定期维护脚本：</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>#!/bin/bash
</span></span><span style="display:flex;"><span># Impala 预防性维护脚本
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo &#34;=== Impala 预防性维护 ===&#34;
</span></span><span style="display:flex;"><span>echo &#34;开始时间: $(date)&#34;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 1. 更新表统计信息
</span></span><span style="display:flex;"><span>echo &#34;\n1. 更新表统计信息...&#34;
</span></span><span style="display:flex;"><span>impala-shell -f - &lt;&lt;EOF
</span></span><span style="display:flex;"><span>-- 更新所有表的统计信息
</span></span><span style="display:flex;"><span>SHOW DATABASES;
</span></span><span style="display:flex;"><span>EOF
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 获取所有数据库和表
</span></span><span style="display:flex;"><span>impala-shell -q &#34;SHOW DATABASES&#34; --delimited | while read db; do
</span></span><span style="display:flex;"><span>    if [ &#34;$db&#34; != &#34;_impala_builtins&#34; ]; then
</span></span><span style="display:flex;"><span>        echo &#34;处理数据库: $db&#34;
</span></span><span style="display:flex;"><span>        impala-shell -q &#34;USE $db; SHOW TABLES&#34; --delimited | while read table; do
</span></span><span style="display:flex;"><span>            echo &#34;  更新表统计: $db.$table&#34;
</span></span><span style="display:flex;"><span>            impala-shell -q &#34;COMPUTE STATS $db.$table&#34; 2&gt;/dev/null || echo &#34;    跳过: $db.$table&#34;
</span></span><span style="display:flex;"><span>        done
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>done
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 2. 清理查询日志
</span></span><span style="display:flex;"><span>echo &#34;\n2. 清理历史查询日志...&#34;
</span></span><span style="display:flex;"><span>impala-shell -q &#34;
</span></span><span style="display:flex;"><span>DELETE FROM sys.impala_query_log
</span></span><span style="display:flex;"><span>WHERE start_time &lt; NOW() - INTERVAL 30 DAYS
</span></span><span style="display:flex;"><span>&#34; 2&gt;/dev/null || echo &#34;查询日志清理跳过&#34;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 3. 检查磁盘空间
</span></span><span style="display:flex;"><span>echo &#34;\n3. 检查磁盘空间...&#34;
</span></span><span style="display:flex;"><span>df -h | grep -E &#39;(hdfs|/var|/tmp)&#39;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 4. 检查服务状态
</span></span><span style="display:flex;"><span>echo &#34;\n4. 检查服务状态...&#34;
</span></span><span style="display:flex;"><span>sudo systemctl status impala-server impala-state-store impala-catalog
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 5. 生成维护报告
</span></span><span style="display:flex;"><span>echo &#34;\n5. 生成维护报告...&#34;
</span></span><span style="display:flex;"><span>cat &gt; /tmp/impala_maintenance_report_$(date +%Y%m%d).txt &lt;&lt;EOF
</span></span><span style="display:flex;"><span>Impala 维护报告
</span></span><span style="display:flex;"><span>生成时间: $(date)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>=== 系统资源状态 ===
</span></span><span style="display:flex;"><span>$(free -h)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>=== 磁盘使用情况 ===
</span></span><span style="display:flex;"><span>$(df -h)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>=== 服务状态 ===
</span></span><span style="display:flex;"><span>$(sudo systemctl status impala-server --no-pager -l)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>=== 最近查询统计 ===
</span></span><span style="display:flex;"><span>$(impala-shell -q &#34;SELECT COUNT(*) as total_queries, AVG(duration_ms)/1000 as avg_duration_sec FROM sys.impala_query_log WHERE start_time &gt;= NOW() - INTERVAL 1 DAY&#34; --delimited)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>EOF
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo &#34;维护报告已生成: /tmp/impala_maintenance_report_$(date +%Y%m%d).txt&#34;
</span></span><span style="display:flex;"><span>echo &#34;\n=== 预防性维护完成 ===&#34;
</span></span></code></pre></div><hr>
<h2 id="总结">总结</h2>
<p>通过 YARN 队列隔离、cGroup 限制、Impala Admission Control、资源池及单查询内存上限等多维度设置，可以在集群层面与服务层面同时发力，实现 Hive 批处理与 Impala 交互式查询的平衡，避免 Impala 查询因资源竞争而 OOM。</p>
<h3 id="关键要点总结">关键要点总结：</h3>
<ol>
<li><strong>集群级隔离</strong>：使用 YARN 队列和 cGroup 进行资源隔离</li>
<li><strong>服务级控制</strong>：配置 Admission Control 和资源池</li>
<li><strong>查询级优化</strong>：设置内存限制和超时参数</li>
<li><strong>监控告警</strong>：建立完善的监控和告警体系</li>
<li><strong>定期维护</strong>：执行预防性维护和性能优化</li>
</ol>
<h3 id="最佳实践建议">最佳实践建议：</h3>
<ul>
<li><strong>渐进式调优</strong>：从保守配置开始，逐步优化</li>
<li><strong>监控驱动</strong>：基于监控数据进行调整</li>
<li><strong>文档记录</strong>：记录所有配置变更和效果</li>
<li><strong>应急预案</strong>：制定完善的故障处理流程</li>
<li><strong>定期评估</strong>：定期评估和调整资源配置</li>
</ul>
<p>通过系统性的资源管理和优化，可以有效避免 Impala 查询 OOM 问题，提升整体集群的稳定性和性能。</p>
]]></content:encoded></item><item><title>Docker Hue 时区修改完整指南</title><link>https://blog.heyaohua.com/posts/2025/09/docker-hue-timezone/</link><pubDate>Mon, 08 Sep 2025 10:00:00 +0800</pubDate><guid>https://blog.heyaohua.com/posts/2025/09/docker-hue-timezone/</guid><description>使用Docker启动Hue后，发现时区不正确，显示UTC时间而不是中国标准时间(CST)。具体表现为： - HDFS文件时间显示为UTC时间（如06:00-06:01） - 实际文件创建时间为中国时间（如14:00-14:01） - Hue日志时间格式混乱</description><content:encoded><![CDATA[<h2 id="问题描述">问题描述</h2>
<p>使用Docker启动Hue后，发现时区不正确，显示UTC时间而不是中国标准时间(CST)。具体表现为：</p>
<ul>
<li>HDFS文件时间显示为UTC时间（如06:00-06:01）</li>
<li>实际文件创建时间为中国时间（如14:00-14:01）</li>
<li>Hue日志时间格式混乱</li>
</ul>
<h2 id="解决方案概述">解决方案概述</h2>
<p>需要从多个层面修改时区设置：</p>
<ol>
<li>容器系统时区设置</li>
<li>Hue配置文件时区设置</li>
<li>Django时区设置</li>
<li>文件浏览器模块时区处理</li>
</ol>
<h2 id="详细修改步骤">详细修改步骤</h2>
<h3 id="1-检查当前容器状态">1. 检查当前容器状态</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 查看运行中的Hue容器</span>
</span></span><span style="display:flex;"><span>docker ps -a | grep hue
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查容器时区</span>
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> &lt;container_name&gt; date
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查Hue日志时间格式</span>
</span></span><span style="display:flex;"><span>docker logs &lt;container_name&gt; --tail <span style="color:#bd93f9">10</span>
</span></span></code></pre></div><h3 id="2-备份原始配置">2. 备份原始配置</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 备份Hue配置文件</span>
</span></span><span style="display:flex;"><span>cp /data/server/hue-server/config/hue.ini /data/server/hue-server/config/hue.ini.backup.<span style="color:#ff79c6">$(</span>date +%Y%m%d_%H%M%S<span style="color:#ff79c6">)</span>
</span></span><span style="display:flex;"><span>cp /data/server/hue-server/config/z-hue-overrides.ini /data/server/hue-server/config/z-hue-overrides.ini.backup.<span style="color:#ff79c6">$(</span>date +%Y%m%d_%H%M%S<span style="color:#ff79c6">)</span>
</span></span></code></pre></div><h3 id="3-修改hue配置文件中的时区设置">3. 修改Hue配置文件中的时区设置</h3>
<h4 id="31-修改主配置文件">3.1 修改主配置文件</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 修改 hue.ini 中的时区设置
</span></span><span style="display:flex;"><span>sed -i &#39;s/time_zone=America\/Los_Angeles/time_zone=Asia\/Shanghai/g&#39; /data/server/hue-server/config/hue.ini
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 添加Django时区设置
</span></span><span style="display:flex;"><span>sed -i &#39;/time_zone=Asia\/Shanghai/a use_tz=true&#39; /data/server/hue-server/config/hue.ini
</span></span></code></pre></div><h4 id="32-修改覆盖配置文件">3.2 修改覆盖配置文件</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 修改 z-hue-overrides.ini 中的时区设置
</span></span><span style="display:flex;"><span>sed -i &#39;s/time_zone=America\/Los_Angeles/time_zone=Asia\/Shanghai/g&#39; /data/server/hue-server/config/z-hue-overrides.ini
</span></span></code></pre></div><h3 id="4-重新创建容器包含时区和dns设置">4. 重新创建容器（包含时区和DNS设置）</h3>
<h4 id="41-停止并删除旧容器">4.1 停止并删除旧容器</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>docker stop &lt;old_container_name&gt;
</span></span><span style="display:flex;"><span>docker rm &lt;old_container_name&gt;
</span></span></code></pre></div><h4 id="42-创建新容器">4.2 创建新容器</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>docker run -d --name hue_new <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  -p 8888:8888 <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  -e <span style="color:#8be9fd;font-style:italic">TZ</span><span style="color:#ff79c6">=</span>Asia/Shanghai <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  -v /etc/localtime:/etc/localtime:ro <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  -v /usr/share/zoneinfo/Asia/Shanghai:/etc/timezone:ro <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  -v /data/server/hue-server/config:/usr/share/hue/desktop/conf <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  --dns<span style="color:#ff79c6">=</span>100.100.2.136 <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  --dns<span style="color:#ff79c6">=</span>8.8.8.8 <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  gethue/hue:latest
</span></span></code></pre></div><h3 id="5-修改文件浏览器模块时区处理">5. 修改文件浏览器模块时区处理</h3>
<h4 id="51-备份原始文件">5.1 备份原始文件</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> hue_new cp /usr/share/hue/apps/filebrowser/src/filebrowser/views.py /usr/share/hue/apps/filebrowser/src/filebrowser/views.py.backup
</span></span></code></pre></div><h4 id="52-修改时区处理代码">5.2 修改时区处理代码</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 添加Django时区导入</span>
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> hue_new sed -i <span style="color:#f1fa8c">&#34;s/from datetime import datetime/from datetime import datetime, timezone, timedelta\nfrom django.utils import timezone as django_timezone/g&#34;</span> /usr/share/hue/apps/filebrowser/src/filebrowser/views.py
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 修改时间格式化代码</span>
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> hue_new sed -i <span style="color:#f1fa8c">&#34;s/datetime.fromtimestamp(stats.mtime).strftime(&#39;%B %d, %Y %I:%M %p&#39;)/django_timezone.make_aware(datetime.fromtimestamp(stats.mtime)).strftime(&#39;%B %d, %Y %I:%M %p&#39;)/g&#34;</span> /usr/share/hue/apps/filebrowser/src/filebrowser/views.py
</span></span></code></pre></div><h4 id="53-清除python缓存">5.3 清除Python缓存</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> hue_new find /usr/share/hue -name <span style="color:#f1fa8c">&#34;*.pyc&#34;</span> -path <span style="color:#f1fa8c">&#34;*/filebrowser/*&#34;</span> -delete
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> hue_new find /usr/share/hue -name <span style="color:#f1fa8c">&#34;__pycache__&#34;</span> -path <span style="color:#f1fa8c">&#34;*/filebrowser/*&#34;</span> -exec rm -rf <span style="color:#ff79c6">{}</span> <span style="color:#f1fa8c">\;</span> 2&gt;/dev/null <span style="color:#ff79c6">||</span> <span style="color:#8be9fd;font-style:italic">true</span>
</span></span></code></pre></div><h3 id="6-重启容器应用修改">6. 重启容器应用修改</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>docker restart hue_new
</span></span></code></pre></div><h3 id="7-验证修改结果">7. 验证修改结果</h3>
<h4 id="71-检查系统时区">7.1 检查系统时区</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查容器系统时间</span>
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> hue_new date
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查时区环境变量</span>
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> hue_new env | grep TZ
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查时区文件</span>
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> hue_new cat /etc/timezone
</span></span></code></pre></div><h4 id="72-检查hue应用时区">7.2 检查Hue应用时区</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4"># 查看Hue日志，确认时间格式</span>
</span></span><span style="display:flex;"><span>docker logs hue_new <span style="color:#ff79c6">--</span>tail <span style="color:#bd93f9">10</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查Django时区设置</span>
</span></span><span style="display:flex;"><span>docker exec hue_new <span style="color:#ff79c6">/</span>usr<span style="color:#ff79c6">/</span>share<span style="color:#ff79c6">/</span>hue<span style="color:#ff79c6">/</span>build<span style="color:#ff79c6">/</span>env<span style="color:#ff79c6">/</span><span style="color:#8be9fd;font-style:italic">bin</span><span style="color:#ff79c6">/</span>python3 <span style="color:#ff79c6">-</span>c <span style="color:#f1fa8c">&#34;import os; os.environ.setdefault(&#39;DJANGO_SETTINGS_MODULE&#39;, &#39;desktop.settings&#39;); import django; django.setup(); from django.utils import timezone; print(&#39;Django timezone:&#39;, timezone.get_current_timezone())&#34;</span>
</span></span></code></pre></div><h4 id="73-检查文件浏览器时间显示">7.3 检查文件浏览器时间显示</h4>
<p>访问Hue文件浏览器，查看HDFS文件的时间显示是否正确。</p>
<h2 id="关键配置说明">关键配置说明</h2>
<h3 id="1-环境变量设置">1. 环境变量设置</h3>
<ul>
<li><code>TZ=Asia/Shanghai</code>: 设置容器系统时区</li>
<li><code>-v /etc/localtime:/etc/localtime:ro</code>: 挂载主机时区文件</li>
<li><code>-v /usr/share/zoneinfo/Asia/Shanghai:/etc/timezone:ro</code>: 挂载时区信息文件</li>
</ul>
<h3 id="2-dns设置">2. DNS设置</h3>
<ul>
<li><code>--dns=100.100.2.136</code>: 内网DNS服务器</li>
<li><code>--dns=8.8.8.8</code>: 公共DNS服务器</li>
</ul>
<h3 id="3-配置文件修改">3. 配置文件修改</h3>
<ul>
<li><code>hue.ini</code>: 主配置文件中的 <code>time_zone=Asia/Shanghai</code> 和 <code>use_tz=true</code></li>
<li><code>z-hue-overrides.ini</code>: 覆盖配置文件中的 <code>time_zone=Asia/Shanghai</code></li>
</ul>
<h3 id="4-代码修改">4. 代码修改</h3>
<ul>
<li>文件：<code>/usr/share/hue/apps/filebrowser/src/filebrowser/views.py</code></li>
<li>修改：使用Django的时区设置处理文件时间显示</li>
</ul>
<h2 id="完整的一键脚本">完整的一键脚本</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#ff79c6">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#6272a4"># Hue时区修改完整脚本</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">CONTAINER_NAME</span><span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;hue_new&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">CONFIG_PATH</span><span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;/data/server/hue-server/config&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;开始修改Hue时区设置...&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 1. 备份配置</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;备份原始配置...&#34;</span>
</span></span><span style="display:flex;"><span>cp <span style="color:#8be9fd;font-style:italic">$CONFIG_PATH</span>/hue.ini <span style="color:#8be9fd;font-style:italic">$CONFIG_PATH</span>/hue.ini.backup.<span style="color:#ff79c6">$(</span>date +%Y%m%d_%H%M%S<span style="color:#ff79c6">)</span>
</span></span><span style="display:flex;"><span>cp <span style="color:#8be9fd;font-style:italic">$CONFIG_PATH</span>/z-hue-overrides.ini <span style="color:#8be9fd;font-style:italic">$CONFIG_PATH</span>/z-hue-overrides.ini.backup.<span style="color:#ff79c6">$(</span>date +%Y%m%d_%H%M%S<span style="color:#ff79c6">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 2. 修改配置文件</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;修改时区配置...&#34;</span>
</span></span><span style="display:flex;"><span>sed -i <span style="color:#f1fa8c">&#39;s/time_zone=America\/Los_Angeles/time_zone=Asia\/Shanghai/g&#39;</span> <span style="color:#8be9fd;font-style:italic">$CONFIG_PATH</span>/hue.ini
</span></span><span style="display:flex;"><span>sed -i <span style="color:#f1fa8c">&#39;s/time_zone=America\/Los_Angeles/time_zone=Asia\/Shanghai/g&#39;</span> <span style="color:#8be9fd;font-style:italic">$CONFIG_PATH</span>/z-hue-overrides.ini
</span></span><span style="display:flex;"><span>sed -i <span style="color:#f1fa8c">&#39;/time_zone=Asia\/Shanghai/a use_tz=true&#39;</span> <span style="color:#8be9fd;font-style:italic">$CONFIG_PATH</span>/hue.ini
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 3. 停止旧容器</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;停止旧容器...&#34;</span>
</span></span><span style="display:flex;"><span>docker stop <span style="color:#8be9fd;font-style:italic">$CONTAINER_NAME</span> 2&gt;/dev/null <span style="color:#ff79c6">||</span> <span style="color:#8be9fd;font-style:italic">true</span>
</span></span><span style="display:flex;"><span>docker rm <span style="color:#8be9fd;font-style:italic">$CONTAINER_NAME</span> 2&gt;/dev/null <span style="color:#ff79c6">||</span> <span style="color:#8be9fd;font-style:italic">true</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 4. 创建新容器</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;创建新容器...&#34;</span>
</span></span><span style="display:flex;"><span>docker run -d --name <span style="color:#8be9fd;font-style:italic">$CONTAINER_NAME</span> <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  -p 8888:8888 <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  -e <span style="color:#8be9fd;font-style:italic">TZ</span><span style="color:#ff79c6">=</span>Asia/Shanghai <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  -v /etc/localtime:/etc/localtime:ro <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  -v /usr/share/zoneinfo/Asia/Shanghai:/etc/timezone:ro <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  -v <span style="color:#8be9fd;font-style:italic">$CONFIG_PATH</span>:/usr/share/hue/desktop/conf <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  --dns<span style="color:#ff79c6">=</span>100.100.2.136 <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  --dns<span style="color:#ff79c6">=</span>8.8.8.8 <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>  gethue/hue:latest
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 5. 等待启动</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;等待容器启动...&#34;</span>
</span></span><span style="display:flex;"><span>sleep <span style="color:#bd93f9">20</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 6. 修改文件浏览器代码</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;修改文件浏览器时区处理...&#34;</span>
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> <span style="color:#8be9fd;font-style:italic">$CONTAINER_NAME</span> cp /usr/share/hue/apps/filebrowser/src/filebrowser/views.py /usr/share/hue/apps/filebrowser/src/filebrowser/views.py.backup
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> <span style="color:#8be9fd;font-style:italic">$CONTAINER_NAME</span> sed -i <span style="color:#f1fa8c">&#34;s/from datetime import datetime/from datetime import datetime, timezone, timedelta\nfrom django.utils import timezone as django_timezone/g&#34;</span> /usr/share/hue/apps/filebrowser/src/filebrowser/views.py
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> <span style="color:#8be9fd;font-style:italic">$CONTAINER_NAME</span> sed -i <span style="color:#f1fa8c">&#34;s/datetime.fromtimestamp(stats.mtime).strftime(&#39;%B %d, %Y %I:%M %p&#39;)/django_timezone.make_aware(datetime.fromtimestamp(stats.mtime)).strftime(&#39;%B %d, %Y %I:%M %p&#39;)/g&#34;</span> /usr/share/hue/apps/filebrowser/src/filebrowser/views.py
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 7. 清除缓存</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;清除Python缓存...&#34;</span>
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> <span style="color:#8be9fd;font-style:italic">$CONTAINER_NAME</span> find /usr/share/hue -name <span style="color:#f1fa8c">&#34;*.pyc&#34;</span> -path <span style="color:#f1fa8c">&#34;*/filebrowser/*&#34;</span> -delete
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> <span style="color:#8be9fd;font-style:italic">$CONTAINER_NAME</span> find /usr/share/hue -name <span style="color:#f1fa8c">&#34;__pycache__&#34;</span> -path <span style="color:#f1fa8c">&#34;*/filebrowser/*&#34;</span> -exec rm -rf <span style="color:#ff79c6">{}</span> <span style="color:#f1fa8c">\;</span> 2&gt;/dev/null <span style="color:#ff79c6">||</span> <span style="color:#8be9fd;font-style:italic">true</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 8. 重启容器</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;重启容器应用修改...&#34;</span>
</span></span><span style="display:flex;"><span>docker restart <span style="color:#8be9fd;font-style:italic">$CONTAINER_NAME</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 9. 等待重启</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;等待容器重启...&#34;</span>
</span></span><span style="display:flex;"><span>sleep <span style="color:#bd93f9">25</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 10. 验证结果</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;验证时区设置...&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;系统时间:&#34;</span>
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> <span style="color:#8be9fd;font-style:italic">$CONTAINER_NAME</span> date
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;Hue日志时间格式:&#34;</span>
</span></span><span style="display:flex;"><span>docker logs <span style="color:#8be9fd;font-style:italic">$CONTAINER_NAME</span> --tail <span style="color:#bd93f9">3</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;Django时区设置:&#34;</span>
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> <span style="color:#8be9fd;font-style:italic">$CONTAINER_NAME</span> /usr/share/hue/build/env/bin/python3 -c <span style="color:#f1fa8c">&#34;import os; os.environ.setdefault(&#39;DJANGO_SETTINGS_MODULE&#39;, &#39;desktop.settings&#39;); import django; django.setup(); from django.utils import timezone; print(&#39;Django timezone:&#39;, timezone.get_current_timezone())&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;时区修改完成！&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;请访问 http://localhost:8888 查看文件浏览器中的时间显示是否正确。&#34;</span>
</span></span></code></pre></div><h2 id="常见问题排查">常见问题排查</h2>
<h3 id="1-dns解析问题">1. DNS解析问题</h3>
<p>如果出现 <code>Name or service not known</code> 错误：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查DNS配置</span>
</span></span><span style="display:flex;"><span>docker inspect &lt;container_name&gt; | grep -A <span style="color:#bd93f9">5</span> -B <span style="color:#bd93f9">5</span> -i dns
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 确保容器有正确的DNS设置</span>
</span></span><span style="display:flex;"><span>--dns<span style="color:#ff79c6">=</span>100.100.2.136 --dns<span style="color:#ff79c6">=</span>8.8.8.8
</span></span></code></pre></div><h3 id="2-时区仍然不正确">2. 时区仍然不正确</h3>
<p>检查所有配置文件中的时区设置：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>grep -r -i &#34;time_zone\|timezone&#34; /data/server/hue-server/config/ | grep -v &#34;.backup&#34;
</span></span></code></pre></div><h3 id="3-文件浏览器时间显示不正确">3. 文件浏览器时间显示不正确</h3>
<p>检查代码修改是否正确：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> &lt;container_name&gt; grep -n -A <span style="color:#bd93f9">2</span> -B <span style="color:#bd93f9">2</span> <span style="color:#f1fa8c">&#34;mtime.*datetime&#34;</span> /usr/share/hue/apps/filebrowser/src/filebrowser/views.py
</span></span></code></pre></div><h3 id="4-容器无法启动">4. 容器无法启动</h3>
<p>检查挂载路径是否正确：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 确保配置文件路径存在
</span></span><span style="display:flex;"><span>ls -la /data/server/hue-server/config/
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 检查挂载权限
</span></span><span style="display:flex;"><span>ls -la /data/server/hue-server/config/hue.ini
</span></span></code></pre></div><h2 id="验证成功标志">验证成功标志</h2>
<ul>
<li>容器系统时间显示：<code>Mon Sep  8 15:08:02 Asia 2025</code></li>
<li>Hue日志时间格式：<code>[08/Sep/2025 15:08:02 +0800]</code></li>
<li>环境变量：<code>TZ=Asia/Shanghai</code></li>
<li>Django时区：<code>Asia/Shanghai</code></li>
<li>HDFS文件时间显示：正确的中国时间（如14:00-14:01）</li>
<li>HDFS连接正常，无DNS解析错误</li>
</ul>
<h2 id="注意事项">注意事项</h2>
<ol>
<li><strong>备份重要</strong>: 修改前务必备份原始配置文件和代码文件</li>
<li><strong>DNS设置</strong>: 确保容器有正确的DNS配置，否则无法连接HDFS</li>
<li><strong>配置文件</strong>: 需要修改两个配置文件：<code>hue.ini</code> 和 <code>z-hue-overrides.ini</code></li>
<li><strong>代码修改</strong>: 需要修改文件浏览器模块的时区处理代码</li>
<li><strong>重启生效</strong>: 修改配置和代码后需要重启容器才能生效</li>
<li><strong>权限检查</strong>: 确保挂载的配置文件有正确的读写权限</li>
<li><strong>缓存清理</strong>: 修改Python代码后需要清除缓存</li>
</ol>
<h2 id="回滚方法">回滚方法</h2>
<p>如果修改后出现问题，可以按以下步骤回滚：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 1. 恢复配置文件</span>
</span></span><span style="display:flex;"><span>cp /data/server/hue-server/config/hue.ini.backup.* /data/server/hue-server/config/hue.ini
</span></span><span style="display:flex;"><span>cp /data/server/hue-server/config/z-hue-overrides.ini.backup.* /data/server/hue-server/config/z-hue-overrides.ini
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 2. 恢复代码文件</span>
</span></span><span style="display:flex;"><span>docker <span style="color:#8be9fd;font-style:italic">exec</span> &lt;container_name&gt; cp /usr/share/hue/apps/filebrowser/src/filebrowser/views.py.backup /usr/share/hue/apps/filebrowser/src/filebrowser/views.py
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 3. 重启容器</span>
</span></span><span style="display:flex;"><span>docker restart &lt;container_name&gt;
</span></span></code></pre></div>]]></content:encoded></item><item><title>HDFS均衡操作快速参考</title><link>https://blog.heyaohua.com/posts/2024/05/hdfs-balancer-fast/</link><pubDate>Wed, 01 May 2024 11:00:00 +0800</pubDate><guid>https://blog.heyaohua.com/posts/2024/05/hdfs-balancer-fast/</guid><description>查看日志：</description><content:encoded><![CDATA[<h2 id="快速判断是否需要均衡">快速判断是否需要均衡</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4"># 计算当前均衡度（标准差）</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#ff79c6">|</span> python3 <span style="color:#ff79c6">-</span>c <span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> sys<span style="color:#ff79c6">,</span> re
</span></span><span style="display:flex;"><span>used_percents <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> sys<span style="color:#ff79c6">.</span>stdin:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span> <span style="color:#ff79c6">in</span> line:
</span></span><span style="display:flex;"><span>        percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;(\d+\.?\d*)%&#39;</span>, line)<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>))
</span></span><span style="display:flex;"><span>        used_percents<span style="color:#ff79c6">.</span>append(percent)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> used_percents:
</span></span><span style="display:flex;"><span>    avg <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;标准差: </span><span style="color:#f1fa8c">{</span>std_dev<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">%&#39;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> std_dev <span style="color:#ff79c6">&gt;</span> <span style="color:#bd93f9">15</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;⚠️  需要立即均衡&#39;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">elif</span> std_dev <span style="color:#ff79c6">&gt;</span> <span style="color:#bd93f9">10</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;⚠️  建议进行均衡&#39;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">else</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;✅ 集群已均衡&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;</span>
</span></span></code></pre></div><h2 id="常用均衡命令">常用均衡命令</h2>
<h3 id="基本均衡">基本均衡</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 标准均衡（推荐）
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 -policy datanode &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 严格均衡
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 5 -policy datanode &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 宽松均衡
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 15 -policy datanode &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span></code></pre></div><h3 id="高级均衡">高级均衡</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 排除特定节点
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 -exclude 192.168.1.100,192.168.1.101 &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 只均衡特定节点
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 -include 192.168.1.102,192.168.1.103 &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 指定源节点
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 -source 192.168.1.100,192.168.1.101 &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span></code></pre></div><h2 id="参数说明">参数说明</h2>
<table>
  <thead>
      <tr>
          <th>参数</th>
          <th>用途</th>
          <th>默认值</th>
          <th>推荐值</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>-threshold</code></td>
          <td>均衡阈值(%)</td>
          <td>10</td>
          <td>5-15</td>
      </tr>
      <tr>
          <td><code>-policy</code></td>
          <td>均衡策略</td>
          <td>datanode</td>
          <td>datanode</td>
      </tr>
      <tr>
          <td><code>-exclude</code></td>
          <td>排除节点</td>
          <td>-</td>
          <td>维护节点</td>
      </tr>
      <tr>
          <td><code>-include</code></td>
          <td>包含节点</td>
          <td>-</td>
          <td>特定节点</td>
      </tr>
      <tr>
          <td><code>-source</code></td>
          <td>源节点</td>
          <td>-</td>
          <td>高负载节点</td>
      </tr>
      <tr>
          <td><code>-idleiterations</code></td>
          <td>空闲迭代次数</td>
          <td>5</td>
          <td>3-5</td>
      </tr>
  </tbody>
</table>
<h2 id="监控命令">监控命令</h2>
<h3 id="检查均衡状态">检查均衡状态</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查均衡进程</span>
</span></span><span style="display:flex;"><span>ps aux | grep balancer
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 查看均衡日志</span>
</span></span><span style="display:flex;"><span>tail -f /tmp/balancer.log
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 实时监控均衡进度</span>
</span></span><span style="display:flex;"><span>python3 /tmp/monitor_hdfs_balancer.py
</span></span></code></pre></div><h3 id="停止均衡">停止均衡</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 查找并停止均衡进程
</span></span><span style="display:flex;"><span>pkill -f &#34;hdfs.*balancer&#34;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 或者通过PID停止
</span></span><span style="display:flex;"><span>kill $(cat /tmp/balancer.pid)
</span></span></code></pre></div><h2 id="性能优化">性能优化</h2>
<h3 id="调整均衡带宽">调整均衡带宽</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span><span style="color:#6272a4">&lt;!-- 在hdfs-site.xml中添加 --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>dfs.datanode.balance.bandwidthPerSec<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>52428800<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 50MB/s --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span></code></pre></div><h3 id="系统优化">系统优化</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 网络优化
</span></span><span style="display:flex;"><span>echo &#39;net.core.rmem_max = 134217728&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>echo &#39;net.core.wmem_max = 134217728&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>sysctl -p
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 磁盘优化
</span></span><span style="display:flex;"><span>echo noop &gt; /sys/block/sda/queue/scheduler
</span></span></code></pre></div><h2 id="故障排除">故障排除</h2>
<h3 id="常见问题">常见问题</h3>
<ol>
<li><strong>均衡进程无法启动</strong></li>
<li>检查HDFS服务状态：<code>hdfs dfsadmin -report</code></li>
<li>检查权限：<code>whoami</code></li>
<li></li>
</ol>
<p>查看日志：<code>tail -f $HADOOP_LOG_DIR/hadoop-*-balancer-*.log</code></p>
<ol start="5">
<li></li>
</ol>
<p><strong>均衡速度过慢</strong></p>
<ol start="6">
<li>检查网络：<code>iperf3 -c &lt;target_node&gt;</code></li>
<li>检查磁盘I/O：<code>iostat -x 1 5</code></li>
<li></li>
</ol>
<p>调整均衡带宽</p>
<ol start="9">
<li></li>
</ol>
<p><strong>均衡进程异常退出</strong></p>
<ol start="10">
<li>检查系统资源：<code>free -h</code>, <code>df -h</code></li>
<li>查看系统日志：<code>dmesg | tail -50</code></li>
<li>重新启动均衡</li>
</ol>
<h2 id="最佳实践">最佳实践</h2>
<ol>
<li><strong>时间选择</strong>：在业务低峰期进行均衡</li>
<li><strong>参数设置</strong>：生产环境使用5-10%阈值</li>
<li><strong>监控告警</strong>：设置自动化监控和告警</li>
<li><strong>分批进行</strong>：大型集群可以分批均衡</li>
<li><strong>数据验证</strong>：均衡后检查数据完整性</li>
</ol>
<h2 id="自动化脚本">自动化脚本</h2>
<h3 id="一键均衡脚本">一键均衡脚本</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4">#!/bin/bash</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查均衡度并自动启动均衡</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>STD_DEV<span style="color:#ff79c6">=</span>$(hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#ff79c6">|</span> python3 <span style="color:#ff79c6">-</span>c <span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> sys<span style="color:#ff79c6">,</span> re
</span></span><span style="display:flex;"><span>used_percents <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> sys<span style="color:#ff79c6">.</span>stdin:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span> <span style="color:#ff79c6">in</span> line:
</span></span><span style="display:flex;"><span>        percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;(\d+\.?\d*)%&#39;</span>, line)<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>))
</span></span><span style="display:flex;"><span>        used_percents<span style="color:#ff79c6">.</span>append(percent)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> used_percents:
</span></span><span style="display:flex;"><span>    avg <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;</span><span style="color:#f1fa8c">{</span>std_dev<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">else</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;0&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;当前均衡度: $</span><span style="color:#f1fa8c">{STD_DEV}</span><span style="color:#f1fa8c">%&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> (( $(echo <span style="color:#f1fa8c">&#34;$STD_DEV &gt; 10&#34;</span> <span style="color:#ff79c6">|</span> bc <span style="color:#ff79c6">-</span>l) )); then
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;启动均衡...&#34;</span>
</span></span><span style="display:flex;"><span>    nohup hdfs balancer <span style="color:#ff79c6">-</span>threshold <span style="color:#bd93f9">10</span> <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>log <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;&amp;</span><span style="color:#bd93f9">1</span> <span style="color:#ff79c6">&amp;</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;均衡进程已启动，PID: $!&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">else</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;集群已均衡，无需操作&#34;</span>
</span></span><span style="display:flex;"><span>fi
</span></span></code></pre></div><h2 id="监控脚本">监控脚本</h2>
<h3 id="简化监控脚本">简化监控脚本</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#ff79c6">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#6272a4"># 简化版均衡监控</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">while</span> true; <span style="color:#ff79c6">do</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;=== </span><span style="color:#ff79c6">$(</span>date<span style="color:#ff79c6">)</span><span style="color:#f1fa8c"> ===&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 检查均衡进程</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> pgrep -f <span style="color:#f1fa8c">&#34;hdfs.*balancer&#34;</span> &gt; /dev/null; <span style="color:#ff79c6">then</span>
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;✅ 均衡进程正在运行&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">else</span>
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;❌ 均衡进程未运行&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 显示各节点使用率</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;各节点使用率:&#34;</span>
</span></span><span style="display:flex;"><span>    hdfs dfsadmin -report | grep -E <span style="color:#f1fa8c">&#34;(Name:|DFS Used%:)&#34;</span> | <span style="color:#f1fa8c">\
</span></span></span><span style="display:flex;"><span>        awk <span style="color:#f1fa8c">&#39;NR%2==1{name=$0} NR%2==0{print name &#34; &#34; $0}&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;----------------------------------------&#34;</span>
</span></span><span style="display:flex;"><span>    sleep <span style="color:#bd93f9">60</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">done</span>
</span></span></code></pre></div><hr>
<p><strong>注意</strong>：本快速参考适用于日常运维，详细操作请参考完整版文档。</p>
]]></content:encoded></item><item><title>HDFS均衡操作完整指南</title><link>https://blog.heyaohua.com/posts/2024/05/hdfs-balancer/</link><pubDate>Wed, 01 May 2024 10:00:00 +0800</pubDate><guid>https://blog.heyaohua.com/posts/2024/05/hdfs-balancer/</guid><description>HDFS均衡器（Balancer）是Hadoop分布式文件系统中的一个重要工具，用于重新分布数据块，确保集群中所有DataNode的存储使用率保持相对均衡。当集群中添加新节点或删除节点后，数据分布可能会变得不均匀，这时就需要使用均衡器来重新分布数据。</description><content:encoded><![CDATA[<h2 id="目录">目录</h2>
<ul>
<li><a href="#%E6%A6%82%E8%BF%B0">概述</a></li>
<li><a href="#%E4%BB%80%E4%B9%88%E6%97%B6%E5%80%99%E9%9C%80%E8%A6%81hdfs%E5%9D%87%E8%A1%A1">什么时候需要HDFS均衡</a></li>
<li><a href="#hdfs%E5%9D%87%E8%A1%A1%E5%8E%9F%E7%90%86">HDFS均衡原理</a></li>
<li><a href="#%E5%9D%87%E8%A1%A1%E5%8F%82%E6%95%B0%E8%AF%A6%E8%A7%A3">均衡参数详解</a></li>
<li><a href="#%E6%93%8D%E4%BD%9C%E6%AD%A5%E9%AA%A4">操作步骤</a></li>
<li><a href="#%E7%9B%91%E6%8E%A7%E5%92%8C%E7%AE%A1%E7%90%86">监控和管理</a></li>
<li><a href="#%E6%95%85%E9%9A%9C%E6%8E%92%E9%99%A4">故障排除</a></li>
<li><a href="#%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5">最佳实践</a></li>
<li><a href="#%E6%80%A7%E8%83%BD%E4%BC%98%E5%8C%96%E5%BB%BA%E8%AE%AE">性能优化建议</a></li>
</ul>
<h2 id="概述">概述</h2>
<p>HDFS均衡器（Balancer）是Hadoop分布式文件系统中的一个重要工具，用于重新分布数据块，确保集群中所有DataNode的存储使用率保持相对均衡。当集群中添加新节点或删除节点后，数据分布可能会变得不均匀，这时就需要使用均衡器来重新分布数据。</p>
<h2 id="什么时候需要hdfs均衡">什么时候需要HDFS均衡</h2>
<h3 id="1-集群扩容后">1. 集群扩容后</h3>
<ul>
<li><strong>新增DataNode节点</strong>：新节点加入集群后，存储使用率为0%，而原有节点可能已经接近满载</li>
<li><strong>添加存储设备</strong>：为现有DataNode添加新的磁盘后</li>
</ul>
<h3 id="2-集群缩容后">2. 集群缩容后</h3>
<ul>
<li><strong>移除DataNode节点</strong>：节点下线前需要将其数据迁移到其他节点</li>
<li><strong>磁盘故障</strong>：某个磁盘故障后，需要重新分布数据</li>
</ul>
<h3 id="3-数据倾斜">3. 数据倾斜</h3>
<ul>
<li><strong>节点间使用率差异过大</strong>：标准差超过10-15%</li>
<li><strong>热点数据</strong>：某些节点存储了过多的热点数据</li>
<li><strong>写入模式不均</strong>：应用写入模式导致的数据分布不均</li>
</ul>
<h3 id="4-性能优化">4. 性能优化</h3>
<ul>
<li><strong>负载均衡</strong>：提高集群整体I/O性能</li>
<li><strong>故障恢复</strong>：确保数据副本分布合理</li>
</ul>
<h3 id="判断标准">判断标准</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 计算节点使用率标准差
</span></span><span style="display:flex;"><span># 标准差 &gt; 10%：建议进行均衡
</span></span><span style="display:flex;"><span># 标准差 &gt; 20%：强烈建议立即均衡
</span></span><span style="display:flex;"><span># 标准差 &lt; 5%：认为已均衡
</span></span></code></pre></div><h2 id="hdfs均衡原理">HDFS均衡原理</h2>
<h3 id="1-均衡策略">1. 均衡策略</h3>
<ul>
<li><strong>DataNode策略</strong>：基于整个DataNode的使用率进行均衡</li>
<li><strong>BlockPool策略</strong>：基于命名空间的使用率进行均衡（适用于Federation）</li>
</ul>
<h3 id="2-均衡算法">2. 均衡算法</h3>
<ol>
<li><strong>识别源节点</strong>：使用率高于平均值的节点</li>
<li><strong>识别目标节点</strong>：使用率低于平均值的节点</li>
<li><strong>选择数据块</strong>：从源节点选择合适的数据块</li>
<li><strong>数据迁移</strong>：通过三阶段复制进行数据迁移</li>
<li><strong>验证完整性</strong>：确保数据迁移成功</li>
</ol>
<h3 id="3-数据迁移过程">3. 数据迁移过程</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>源节点 → 中间节点 → 目标节点
</span></span></code></pre></div><ul>
<li>避免直接复制，减少网络压力</li>
<li>通过中间节点进行数据转发</li>
<li>确保数据完整性和一致性</li>
</ul>
<h2 id="均衡参数详解">均衡参数详解</h2>
<h3 id="1-基本参数">1. 基本参数</h3>
<h4 id="-threshold-threshold"><code>-threshold &lt;threshold&gt;</code></h4>
<ul>
<li><strong>用途</strong>：设置均衡阈值，单位为百分比</li>
<li><strong>默认值</strong>：10%</li>
<li><strong>说明</strong>：只有当节点使用率差异超过此阈值时才开始均衡</li>
<li><strong>推荐值</strong>：</li>
<li>生产环境：5-10%</li>
<li>测试环境：10-15%</li>
<li>紧急情况：20%</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 示例</span>
</span></span><span style="display:flex;"><span>hdfs balancer -threshold <span style="color:#bd93f9">5</span>    <span style="color:#6272a4"># 5%阈值，更严格的均衡</span>
</span></span><span style="display:flex;"><span>hdfs balancer -threshold <span style="color:#bd93f9">15</span>   <span style="color:#6272a4"># 15%阈值，更宽松的均衡</span>
</span></span></code></pre></div><h4 id="-policy-policy"><code>-policy &lt;policy&gt;</code></h4>
<ul>
<li><strong>用途</strong>：指定均衡策略</li>
<li><strong>可选值</strong>：</li>
<li><code>datanode</code>：基于DataNode使用率（默认）</li>
<li><code>blockpool</code>：基于BlockPool使用率</li>
<li><strong>推荐</strong>：一般使用<code>datanode</code>策略</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 示例</span>
</span></span><span style="display:flex;"><span>hdfs balancer -policy datanode    <span style="color:#6272a4"># DataNode策略</span>
</span></span><span style="display:flex;"><span>hdfs balancer -policy blockpool   <span style="color:#6272a4"># BlockPool策略</span>
</span></span></code></pre></div><h3 id="2-节点选择参数">2. 节点选择参数</h3>
<h4 id="-exclude--f-hosts-file--comma-separated-list-of-hosts"><code>-exclude [-f &lt;hosts-file&gt; | &lt;comma-separated list of hosts&gt;]</code></h4>
<ul>
<li><strong>用途</strong>：排除指定的DataNode节点</li>
<li><strong>使用场景</strong>：</li>
<li>节点维护期间</li>
<li>性能较差的节点</li>
<li>网络不稳定的节点</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 排除单个节点</span>
</span></span><span style="display:flex;"><span>hdfs balancer -exclude 192.168.1.100
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 排除多个节点</span>
</span></span><span style="display:flex;"><span>hdfs balancer -exclude 192.168.1.100,192.168.1.101
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 从文件读取排除列表</span>
</span></span><span style="display:flex;"><span>hdfs balancer -exclude -f /path/to/exclude_hosts.txt
</span></span></code></pre></div><h4 id="-include--f-hosts-file--comma-separated-list-of-hosts"><code>-include [-f &lt;hosts-file&gt; | &lt;comma-separated list of hosts&gt;]</code></h4>
<ul>
<li><strong>用途</strong>：只对指定的DataNode节点进行均衡</li>
<li><strong>使用场景</strong>：</li>
<li>只均衡特定节点</li>
<li>测试环境</li>
<li>部分节点维护</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 只均衡指定节点</span>
</span></span><span style="display:flex;"><span>hdfs balancer -include 192.168.1.100,192.168.1.101
</span></span></code></pre></div><h4 id="-source--f-hosts-file--comma-separated-list-of-hosts"><code>-source [-f &lt;hosts-file&gt; | &lt;comma-separated list of hosts&gt;]</code></h4>
<ul>
<li><strong>用途</strong>：指定源节点（数据来源）</li>
<li><strong>使用场景</strong>：</li>
<li>特定节点需要减少负载</li>
<li>节点即将下线</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 指定源节点</span>
</span></span><span style="display:flex;"><span>hdfs balancer -source 192.168.1.100,192.168.1.101
</span></span></code></pre></div><h3 id="3-性能控制参数">3. 性能控制参数</h3>
<h4 id="-idleiterations-idleiterations"><code>-idleiterations &lt;idleiterations&gt;</code></h4>
<ul>
<li><strong>用途</strong>：设置连续空闲迭代次数</li>
<li><strong>默认值</strong>：5</li>
<li><strong>说明</strong>：连续N次迭代没有数据移动时退出</li>
<li><strong>推荐值</strong>：</li>
<li>生产环境：3-5</li>
<li>测试环境：1-2</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 示例</span>
</span></span><span style="display:flex;"><span>hdfs balancer -idleiterations <span style="color:#bd93f9">3</span>  <span style="color:#6272a4"># 连续3次无移动则退出</span>
</span></span></code></pre></div><h4 id="-runduringupgrade"><code>-runDuringUpgrade</code></h4>
<ul>
<li><strong>用途</strong>：在HDFS升级期间运行均衡器</li>
<li><strong>默认值</strong>：false</li>
<li><strong>说明</strong>：通常不建议在升级期间运行</li>
</ul>
<h3 id="4-高级参数">4. 高级参数</h3>
<h4 id="-blockpools-comma-separated-list-of-blockpool-ids"><code>-blockpools &lt;comma-separated list of blockpool ids&gt;</code></h4>
<ul>
<li><strong>用途</strong>：指定要均衡的BlockPool ID</li>
<li><strong>适用场景</strong>：Federation环境</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 示例</span>
</span></span><span style="display:flex;"><span>hdfs balancer -blockpools BP-REPLACE_WITH_NEW_PASSWORD789-192.168.1.100-REPLACE_WITH_NEW_PASSWORD7890123
</span></span></code></pre></div><h2 id="操作步骤">操作步骤</h2>
<h3 id="1-环境检查">1. 环境检查</h3>
<h4 id="检查集群状态">检查集群状态</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查HDFS状态</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin -report
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查NameNode状态</span>
</span></span><span style="display:flex;"><span>hdfs haadmin -getServiceState nn1
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查DataNode状态</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin -printTopology
</span></span></code></pre></div><h4 id="检查磁盘空间">检查磁盘空间</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-php" data-lang="php"><span style="display:flex;"><span><span style="color:#6272a4"># 检查各节点磁盘使用情况
</span></span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> node in $(hdfs dfsadmin <span style="color:#ff79c6">-</span>printTopology <span style="color:#ff79c6">|</span> grep <span style="color:#ff79c6">-</span>o <span style="color:#f1fa8c">&#39;[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+&#39;</span>); <span style="color:#ff79c6">do</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">echo</span> <span style="color:#f1fa8c">&#34;=== </span><span style="color:#f1fa8c">$node</span><span style="color:#f1fa8c"> ===&#34;</span>
</span></span><span style="display:flex;"><span>    ssh <span style="color:#8be9fd;font-style:italic">$node</span> <span style="color:#f1fa8c">&#34;df -h&#34;</span>
</span></span><span style="display:flex;"><span>done
</span></span></code></pre></div><h4 id="检查网络状况">检查网络状况</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查节点间网络延迟</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin -printTopology | grep -o <span style="color:#f1fa8c">&#39;[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+&#39;</span> | <span style="color:#ff79c6">while</span> <span style="color:#8be9fd;font-style:italic">read</span> node; <span style="color:#ff79c6">do</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;Testing </span><span style="color:#8be9fd;font-style:italic">$node</span><span style="color:#f1fa8c">...&#34;</span>
</span></span><span style="display:flex;"><span>    ping -c <span style="color:#bd93f9">3</span> <span style="color:#8be9fd;font-style:italic">$node</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">done</span>
</span></span></code></pre></div><h3 id="2-均衡前准备">2. 均衡前准备</h3>
<h4 id="备份重要配置">备份重要配置</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 备份HDFS配置</span>
</span></span><span style="display:flex;"><span>cp -r <span style="color:#8be9fd;font-style:italic">$HADOOP_CONF_DIR</span> /backup/hdfs_conf_<span style="color:#ff79c6">$(</span>date +%Y%m%d<span style="color:#ff79c6">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 记录当前状态</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin -report &gt; /backup/hdfs_report_<span style="color:#ff79c6">$(</span>date +%Y%m%d_%H%M%S<span style="color:#ff79c6">)</span>.txt
</span></span></code></pre></div><h4 id="设置均衡参数">设置均衡参数</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span># 设置均衡带宽（可选）
</span></span><span style="display:flex;"><span># 在hdfs-site.xml中添加：
</span></span><span style="display:flex;"><span># <span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>#   <span style="color:#ff79c6">&lt;name&gt;</span>dfs.datanode.balance.bandwidthPerSec<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>#   <span style="color:#ff79c6">&lt;value&gt;</span>10485760<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 10MB/s --&gt;</span>
</span></span><span style="display:flex;"><span># <span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span></code></pre></div><h3 id="3-执行均衡">3. 执行均衡</h3>
<h4 id="基本均衡命令">基本均衡命令</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 标准均衡命令
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 -policy datanode &gt; /tmp/hdfs_balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 获取进程ID
</span></span><span style="display:flex;"><span>BALANCER_PID=$!
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 记录PID
</span></span><span style="display:flex;"><span>echo $BALANCER_PID &gt; /tmp/hdfs_balancer.pid
</span></span></code></pre></div><h4 id="高级均衡命令">高级均衡命令</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 排除特定节点的均衡
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 5 -policy datanode \
</span></span><span style="display:flex;"><span>    -exclude 192.168.1.100,192.168.1.101 \
</span></span><span style="display:flex;"><span>    -idleiterations 3 &gt; /tmp/hdfs_balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 只对特定节点进行均衡
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 -policy datanode \
</span></span><span style="display:flex;"><span>    -include 192.168.1.102,192.168.1.103 &gt; /tmp/hdfs_balancer.log 2&gt;&amp;1 &amp;
</span></span></code></pre></div><h3 id="4-监控均衡进度">4. 监控均衡进度</h3>
<h4 id="实时监控脚本">实时监控脚本</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4">#!/usr/bin/env python3</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># monitor_hdfs_balancer.py</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> subprocess
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> time
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> re
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">from</span> datetime <span style="color:#ff79c6">import</span> datetime
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">get_hdfs_report</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">try</span>:
</span></span><span style="display:flex;"><span>        result <span style="color:#ff79c6">=</span> subprocess<span style="color:#ff79c6">.</span>run([<span style="color:#f1fa8c">&#39;hdfs&#39;</span>, <span style="color:#f1fa8c">&#39;dfsadmin&#39;</span>, <span style="color:#f1fa8c">&#39;-report&#39;</span>],
</span></span><span style="display:flex;"><span>                              stdout<span style="color:#ff79c6">=</span>subprocess<span style="color:#ff79c6">.</span>PIPE, stderr<span style="color:#ff79c6">=</span>subprocess<span style="color:#ff79c6">.</span>PIPE,
</span></span><span style="display:flex;"><span>                              universal_newlines<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>, check<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> result<span style="color:#ff79c6">.</span>stdout
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">except</span> subprocess<span style="color:#ff79c6">.</span>CalledProcessError <span style="color:#ff79c6">as</span> e:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;获取HDFS报告失败: </span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">&#34;</span><span style="color:#ff79c6">.</span>format(e))
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#ff79c6">None</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">parse_datanode_info</span>(report):
</span></span><span style="display:flex;"><span>    datanodes <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span>    lines <span style="color:#ff79c6">=</span> report<span style="color:#ff79c6">.</span>split(<span style="color:#f1fa8c">&#39;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">&#39;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    current_node <span style="color:#ff79c6">=</span> {}
</span></span><span style="display:flex;"><span>    in_datanode_section <span style="color:#ff79c6">=</span> <span style="color:#ff79c6">False</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> lines:
</span></span><span style="display:flex;"><span>        line <span style="color:#ff79c6">=</span> line<span style="color:#ff79c6">.</span>strip()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> line<span style="color:#ff79c6">.</span>startswith(<span style="color:#f1fa8c">&#39;Name:&#39;</span>):
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">if</span> current_node:
</span></span><span style="display:flex;"><span>                datanodes<span style="color:#ff79c6">.</span>append(current_node)
</span></span><span style="display:flex;"><span>            current_node <span style="color:#ff79c6">=</span> {<span style="color:#f1fa8c">&#39;name&#39;</span>: line<span style="color:#ff79c6">.</span>split(<span style="color:#f1fa8c">&#39;:&#39;</span>, <span style="color:#bd93f9">1</span>)[<span style="color:#bd93f9">1</span>]<span style="color:#ff79c6">.</span>strip()}
</span></span><span style="display:flex;"><span>            in_datanode_section <span style="color:#ff79c6">=</span> <span style="color:#ff79c6">True</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">elif</span> in_datanode_section <span style="color:#ff79c6">and</span> line<span style="color:#ff79c6">.</span>startswith(<span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span>):
</span></span><span style="display:flex;"><span>            current_node[<span style="color:#f1fa8c">&#39;used_percent&#39;</span>] <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(line<span style="color:#ff79c6">.</span>split(<span style="color:#f1fa8c">&#39;:&#39;</span>)[<span style="color:#bd93f9">1</span>]<span style="color:#ff79c6">.</span>strip()<span style="color:#ff79c6">.</span>replace(<span style="color:#f1fa8c">&#39;%&#39;</span>, <span style="color:#f1fa8c">&#39;&#39;</span>))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">elif</span> in_datanode_section <span style="color:#ff79c6">and</span> line<span style="color:#ff79c6">.</span>startswith(<span style="color:#f1fa8c">&#39;DFS Remaining%:&#39;</span>):
</span></span><span style="display:flex;"><span>            current_node[<span style="color:#f1fa8c">&#39;remaining_percent&#39;</span>] <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(line<span style="color:#ff79c6">.</span>split(<span style="color:#f1fa8c">&#39;:&#39;</span>)[<span style="color:#bd93f9">1</span>]<span style="color:#ff79c6">.</span>strip()<span style="color:#ff79c6">.</span>replace(<span style="color:#f1fa8c">&#39;%&#39;</span>, <span style="color:#f1fa8c">&#39;&#39;</span>))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> current_node:
</span></span><span style="display:flex;"><span>        datanodes<span style="color:#ff79c6">.</span>append(current_node)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">return</span> datanodes
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">calculate_balance_metrics</span>(datanodes):
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#ff79c6">not</span> datanodes:
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#ff79c6">None</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    used_percents <span style="color:#ff79c6">=</span> [node<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>) <span style="color:#ff79c6">for</span> node <span style="color:#ff79c6">in</span> datanodes]
</span></span><span style="display:flex;"><span>    avg_used_percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 计算标准差</span>
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg_used_percent) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 找出最高和最低使用率节点</span>
</span></span><span style="display:flex;"><span>    max_used_node <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">max</span>(datanodes, key<span style="color:#ff79c6">=</span><span style="color:#ff79c6">lambda</span> x: x<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>))
</span></span><span style="display:flex;"><span>    min_used_node <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">min</span>(datanodes, key<span style="color:#ff79c6">=</span><span style="color:#ff79c6">lambda</span> x: x<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">return</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;avg_used_percent&#39;</span>: avg_used_percent,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;std_dev&#39;</span>: std_dev,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;max_used_node&#39;</span>: max_used_node,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;min_used_node&#39;</span>: min_used_node,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#39;datanodes&#39;</span>: datanodes
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">monitor_balancer</span>():
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;HDFS均衡监控开始...&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;=&#34;</span> <span style="color:#ff79c6">*</span> <span style="color:#bd93f9">80</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    start_time <span style="color:#ff79c6">=</span> datetime<span style="color:#ff79c6">.</span>now()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">try</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">while</span> <span style="color:#ff79c6">True</span>:
</span></span><span style="display:flex;"><span>            current_time <span style="color:#ff79c6">=</span> datetime<span style="color:#ff79c6">.</span>now()
</span></span><span style="display:flex;"><span>            elapsed <span style="color:#ff79c6">=</span> current_time <span style="color:#ff79c6">-</span> start_time
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            report <span style="color:#ff79c6">=</span> get_hdfs_report()
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">if</span> <span style="color:#ff79c6">not</span> report:
</span></span><span style="display:flex;"><span>                <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;[</span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">] 无法获取HDFS报告&#34;</span><span style="color:#ff79c6">.</span>format(current_time<span style="color:#ff79c6">.</span>strftime(<span style="color:#f1fa8c">&#39;%H:%M:%S&#39;</span>)))
</span></span><span style="display:flex;"><span>                time<span style="color:#ff79c6">.</span>sleep(<span style="color:#bd93f9">30</span>)
</span></span><span style="display:flex;"><span>                <span style="color:#ff79c6">continue</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            datanodes <span style="color:#ff79c6">=</span> parse_datanode_info(report)
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">if</span> <span style="color:#ff79c6">not</span> datanodes:
</span></span><span style="display:flex;"><span>                <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;[</span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">] 无法解析datanode信息&#34;</span><span style="color:#ff79c6">.</span>format(current_time<span style="color:#ff79c6">.</span>strftime(<span style="color:#f1fa8c">&#39;%H:%M:%S&#39;</span>)))
</span></span><span style="display:flex;"><span>                time<span style="color:#ff79c6">.</span>sleep(<span style="color:#bd93f9">30</span>)
</span></span><span style="display:flex;"><span>                <span style="color:#ff79c6">continue</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            metrics <span style="color:#ff79c6">=</span> calculate_balance_metrics(datanodes)
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">if</span> <span style="color:#ff79c6">not</span> metrics:
</span></span><span style="display:flex;"><span>                <span style="color:#ff79c6">continue</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            <span style="color:#6272a4"># 显示当前状态</span>
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">[</span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">] 运行时间: </span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">&#34;</span><span style="color:#ff79c6">.</span>format(current_time<span style="color:#ff79c6">.</span>strftime(<span style="color:#f1fa8c">&#39;%H:%M:%S&#39;</span>), elapsed))
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;平均使用率: </span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">%&#34;</span><span style="color:#ff79c6">.</span>format(metrics[<span style="color:#f1fa8c">&#39;avg_used_percent&#39;</span>]))
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;均衡度(标准差): </span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">%&#34;</span><span style="color:#ff79c6">.</span>format(metrics[<span style="color:#f1fa8c">&#39;std_dev&#39;</span>]))
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;最高使用率节点: </span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c"> (</span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">%)&#34;</span><span style="color:#ff79c6">.</span>format(
</span></span><span style="display:flex;"><span>                metrics[<span style="color:#f1fa8c">&#39;max_used_node&#39;</span>]<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;name&#39;</span>, <span style="color:#f1fa8c">&#39;N/A&#39;</span>),
</span></span><span style="display:flex;"><span>                metrics[<span style="color:#f1fa8c">&#39;max_used_node&#39;</span>]<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>)))
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;最低使用率节点: </span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c"> (</span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">%)&#34;</span><span style="color:#ff79c6">.</span>format(
</span></span><span style="display:flex;"><span>                metrics[<span style="color:#f1fa8c">&#39;min_used_node&#39;</span>]<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;name&#39;</span>, <span style="color:#f1fa8c">&#39;N/A&#39;</span>),
</span></span><span style="display:flex;"><span>                metrics[<span style="color:#f1fa8c">&#39;min_used_node&#39;</span>]<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>)))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">各节点使用率:&#34;</span>)
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">for</span> node <span style="color:#ff79c6">in</span> <span style="color:#8be9fd;font-style:italic">sorted</span>(metrics[<span style="color:#f1fa8c">&#39;datanodes&#39;</span>], key<span style="color:#ff79c6">=</span><span style="color:#ff79c6">lambda</span> x: x<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>), reverse<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>):
</span></span><span style="display:flex;"><span>                name <span style="color:#ff79c6">=</span> node<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;name&#39;</span>, <span style="color:#f1fa8c">&#39;N/A&#39;</span>)
</span></span><span style="display:flex;"><span>                used_pct <span style="color:#ff79c6">=</span> node<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;used_percent&#39;</span>, <span style="color:#bd93f9">0</span>)
</span></span><span style="display:flex;"><span>                remaining_pct <span style="color:#ff79c6">=</span> node<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;remaining_percent&#39;</span>, <span style="color:#bd93f9">0</span>)
</span></span><span style="display:flex;"><span>                <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;  </span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">: </span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">% (剩余: </span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">%)&#34;</span><span style="color:#ff79c6">.</span>format(name, used_pct, remaining_pct))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            <span style="color:#6272a4"># 检查是否达到均衡</span>
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">if</span> metrics[<span style="color:#f1fa8c">&#39;std_dev&#39;</span>] <span style="color:#ff79c6">&lt;</span> <span style="color:#bd93f9">5.0</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">🎉 均衡完成! 标准差: </span><span style="color:#f1fa8c">{:.2f}</span><span style="color:#f1fa8c">%&#34;</span><span style="color:#ff79c6">.</span>format(metrics[<span style="color:#f1fa8c">&#39;std_dev&#39;</span>]))
</span></span><span style="display:flex;"><span>                <span style="color:#ff79c6">break</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;=&#34;</span> <span style="color:#ff79c6">*</span> <span style="color:#bd93f9">80</span>)
</span></span><span style="display:flex;"><span>            time<span style="color:#ff79c6">.</span>sleep(<span style="color:#bd93f9">60</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">except</span> KeyboardInterrupt:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">\n\n</span><span style="color:#f1fa8c">监控已停止&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">except</span> Exception <span style="color:#ff79c6">as</span> e:
</span></span><span style="display:flex;"><span>        <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">监控出错: </span><span style="color:#f1fa8c">{}</span><span style="color:#f1fa8c">&#34;</span><span style="color:#ff79c6">.</span>format(e))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> <span style="color:#8be9fd;font-style:italic">__name__</span> <span style="color:#ff79c6">==</span> <span style="color:#f1fa8c">&#34;__main__&#34;</span>:
</span></span><span style="display:flex;"><span>    monitor_balancer()
</span></span></code></pre></div><h4 id="手动检查命令">手动检查命令</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4"># 检查均衡进程状态</span>
</span></span><span style="display:flex;"><span>ps aux <span style="color:#ff79c6">|</span> grep balancer
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 查看均衡日志</span>
</span></span><span style="display:flex;"><span>tail <span style="color:#ff79c6">-</span>f <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>hdfs_balancer<span style="color:#ff79c6">.</span>log
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查集群状态</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#ff79c6">|</span> grep <span style="color:#ff79c6">-</span>A <span style="color:#bd93f9">20</span> <span style="color:#f1fa8c">&#34;Live datanodes&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 计算当前均衡度</span>
</span></span><span style="display:flex;"><span>python3 <span style="color:#ff79c6">-</span>c <span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> subprocess
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> re
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>result <span style="color:#ff79c6">=</span> subprocess<span style="color:#ff79c6">.</span>run([<span style="color:#f1fa8c">&#39;hdfs&#39;</span>, <span style="color:#f1fa8c">&#39;dfsadmin&#39;</span>, <span style="color:#f1fa8c">&#39;-report&#39;</span>],
</span></span><span style="display:flex;"><span>                       stdout<span style="color:#ff79c6">=</span>subprocess<span style="color:#ff79c6">.</span>PIPE, universal_newlines<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>)
</span></span><span style="display:flex;"><span>report <span style="color:#ff79c6">=</span> result<span style="color:#ff79c6">.</span>stdout
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>used_percents <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> report<span style="color:#ff79c6">.</span>split(<span style="color:#f1fa8c">&#39;</span><span style="color:#f1fa8c">\n</span><span style="color:#f1fa8c">&#39;</span>):
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span> <span style="color:#ff79c6">in</span> line:
</span></span><span style="display:flex;"><span>        percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;(\d+\.?\d*)%&#39;</span>, line)<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>))
</span></span><span style="display:flex;"><span>        used_percents<span style="color:#ff79c6">.</span>append(percent)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> used_percents:
</span></span><span style="display:flex;"><span>    avg <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;平均使用率: </span><span style="color:#f1fa8c">{</span>avg<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">%&#39;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;标准差: </span><span style="color:#f1fa8c">{</span>std_dev<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">%&#39;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;最高使用率: </span><span style="color:#f1fa8c">{</span><span style="color:#8be9fd;font-style:italic">max</span>(used_percents)<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">%&#39;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;最低使用率: </span><span style="color:#f1fa8c">{</span><span style="color:#8be9fd;font-style:italic">min</span>(used_percents)<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">%&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;</span>
</span></span></code></pre></div><h2 id="监控和管理">监控和管理</h2>
<h3 id="1-均衡状态监控">1. 均衡状态监控</h3>
<h4 id="实时监控">实时监控</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 启动监控脚本</span>
</span></span><span style="display:flex;"><span>python3 /tmp/monitor_hdfs_balancer.py
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 后台运行监控</span>
</span></span><span style="display:flex;"><span>nohup python3 /tmp/monitor_hdfs_balancer.py &gt; /tmp/balancer_monitor.log 2&gt;&amp;<span style="color:#bd93f9">1</span> &amp;
</span></span></code></pre></div><h4 id="定期检查">定期检查</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 创建定期检查脚本</span>
</span></span><span style="display:flex;"><span>cat &gt; /tmp/check_balance.sh <span style="color:#f1fa8c">&lt;&lt; &#39;EOF&#39;
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">LOG_FILE=&#34;/tmp/balance_check.log&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">DATE=$(date &#39;+%Y-%m-%d %H:%M:%S&#39;)
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">echo &#34;[$DATE] 开始检查HDFS均衡状态&#34; &gt;&gt; $LOG_FILE
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c"># 检查均衡进程
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">if pgrep -f &#34;hdfs.*balancer&#34; &gt; /dev/null; then
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    echo &#34;[$DATE] 均衡进程正在运行&#34; &gt;&gt; $LOG_FILE
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">else
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">    echo &#34;[$DATE] 警告: 均衡进程未运行&#34; &gt;&gt; $LOG_FILE
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">fi
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c"># 检查集群状态
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">hdfs dfsadmin -report | grep -A 20 &#34;Live datanodes&#34; &gt;&gt; $LOG_FILE
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">echo &#34;[$DATE] 检查完成&#34; &gt;&gt; $LOG_FILE
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">echo &#34;----------------------------------------&#34; &gt;&gt; $LOG_FILE
</span></span></span><span style="display:flex;"><span><span style="color:#f1fa8c">EOF</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>chmod +x /tmp/check_balance.sh
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 添加到crontab，每10分钟检查一次</span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">echo</span> <span style="color:#f1fa8c">&#34;*/10 * * * * /tmp/check_balance.sh&#34;</span> | crontab -
</span></span></code></pre></div><h3 id="2-均衡进程管理">2. 均衡进程管理</h3>
<h4 id="启动均衡">启动均衡</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 基本启动
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 高级启动
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 5 -policy datanode \
</span></span><span style="display:flex;"><span>    -exclude 192.168.1.100 -idleiterations 3 \
</span></span><span style="display:flex;"><span>    &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span></code></pre></div><h4 id="停止均衡">停止均衡</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 查找均衡进程
</span></span><span style="display:flex;"><span>BALANCER_PID=$(pgrep -f &#34;hdfs.*balancer&#34;)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 停止均衡进程
</span></span><span style="display:flex;"><span>if [ ! -z &#34;$BALANCER_PID&#34; ]; then
</span></span><span style="display:flex;"><span>    kill $BALANCER_PID
</span></span><span style="display:flex;"><span>    echo &#34;均衡进程 $BALANCER_PID 已停止&#34;
</span></span><span style="display:flex;"><span>else
</span></span><span style="display:flex;"><span>    echo &#34;未找到运行中的均衡进程&#34;
</span></span><span style="display:flex;"><span>fi
</span></span></code></pre></div><h4 id="重启均衡">重启均衡</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 停止现有均衡
</span></span><span style="display:flex;"><span>pkill -f &#34;hdfs.*balancer&#34;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 等待进程完全停止
</span></span><span style="display:flex;"><span>sleep 5
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 重新启动均衡
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 &gt; /tmp/balancer.log 2&gt;&amp;1 &amp;
</span></span></code></pre></div><h3 id="3-日志分析">3. 日志分析</h3>
<h4 id="均衡日志分析">均衡日志分析</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 统计移动的数据块数量
</span></span><span style="display:flex;"><span>grep &#34;Successfully moved&#34; /tmp/hdfs_balancer.log | wc -l
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 统计移动的数据量
</span></span><span style="display:flex;"><span>grep &#34;Successfully moved&#34; /tmp/hdfs_balancer.log | \
</span></span><span style="display:flex;"><span>    awk &#39;{sum += $NF} END {print &#34;总移动数据量: &#34; sum/1024/1024/1024 &#34; GB&#34;}&#39;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 分析移动速度
</span></span><span style="display:flex;"><span>grep &#34;Successfully moved&#34; /tmp/hdfs_balancer.log | \
</span></span><span style="display:flex;"><span>    awk &#39;{print $1, $2, $NF}&#39; | \
</span></span><span style="display:flex;"><span>    tail -100 | \
</span></span><span style="display:flex;"><span>    awk &#39;BEGIN{prev_time=&#34;&#34;} {
</span></span><span style="display:flex;"><span>        if(prev_time != &#34;&#34;) {
</span></span><span style="display:flex;"><span>            split($1&#34; &#34;$2, time_arr, &#34;:&#34;)
</span></span><span style="display:flex;"><span>            current_sec = time_arr[1]*3600 + time_arr[2]*60 + time_arr[3]
</span></span><span style="display:flex;"><span>            split(prev_time, prev_arr, &#34;:&#34;)
</span></span><span style="display:flex;"><span>            prev_sec = prev_arr[1]*3600 + prev_arr[2]*60 + prev_arr[3]
</span></span><span style="display:flex;"><span>            if(current_sec &gt; prev_sec) {
</span></span><span style="display:flex;"><span>                speed = $3 / (current_sec - prev_sec)
</span></span><span style="display:flex;"><span>                print &#34;移动速度: &#34; speed/1024/1024 &#34; MB/s&#34;
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>        prev_time = $1&#34; &#34;$2
</span></span><span style="display:flex;"><span>    }&#39;
</span></span></code></pre></div><h2 id="故障排除">故障排除</h2>
<h3 id="1-常见问题">1. 常见问题</h3>
<h4 id="均衡进程无法启动">均衡进程无法启动</h4>
<p><strong>症状</strong>：执行均衡命令后立即退出
<strong>可能原因</strong>：</p>
<ul>
<li>HDFS服务未正常运行</li>
<li>权限不足</li>
<li>配置错误</li>
</ul>
<p><strong>解决方法</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查HDFS服务状态</span>
</span></span><span style="display:flex;"><span>hdfs dfsadmin -report
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查权限</span>
</span></span><span style="display:flex;"><span>whoami
</span></span><span style="display:flex;"><span>groups
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查配置</span>
</span></span><span style="display:flex;"><span>hdfs getconf -confKey dfs.namenode.rpc-address
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 查看错误日志</span>
</span></span><span style="display:flex;"><span>tail -f <span style="color:#8be9fd;font-style:italic">$HADOOP_LOG_DIR</span>/hadoop-*-balancer-*.log
</span></span></code></pre></div><h4 id="均衡速度过慢">均衡速度过慢</h4>
<p><strong>症状</strong>：数据移动速度很慢，均衡时间过长
<strong>可能原因</strong>：</p>
<ul>
<li>网络带宽限制</li>
<li>磁盘I/O性能差</li>
<li>均衡带宽设置过低</li>
</ul>
<p><strong>解决方法</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查网络带宽</span>
</span></span><span style="display:flex;"><span>iperf3 -s &amp;  <span style="color:#6272a4"># 在源节点启动服务器</span>
</span></span><span style="display:flex;"><span>iperf3 -c &lt;source_node&gt; -t <span style="color:#bd93f9">60</span>  <span style="color:#6272a4"># 在目标节点测试</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查磁盘I/O</span>
</span></span><span style="display:flex;"><span>iostat -x <span style="color:#bd93f9">1</span> <span style="color:#bd93f9">5</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 调整均衡带宽</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 在hdfs-site.xml中设置：</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># &lt;property&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">#   &lt;name&gt;dfs.datanode.balance.bandwidthPerSec&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">#   &lt;value&gt;52428800&lt;/value&gt;  &lt;!-- 50MB/s --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># &lt;/property&gt;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 重启DataNode服务</span>
</span></span><span style="display:flex;"><span>sudo systemctl restart hadoop-datanode
</span></span></code></pre></div><h4 id="均衡进程异常退出">均衡进程异常退出</h4>
<p><strong>症状</strong>：均衡进程运行一段时间后自动退出
<strong>可能原因</strong>：</p>
<ul>
<li>内存不足</li>
<li>网络中断</li>
<li>磁盘空间不足</li>
</ul>
<p><strong>解决方法</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 检查系统资源
</span></span><span style="display:flex;"><span>free -h
</span></span><span style="display:flex;"><span>df -h
</span></span><span style="display:flex;"><span>dmesg | tail -50
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 检查均衡日志
</span></span><span style="display:flex;"><span>tail -100 /tmp/hdfs_balancer.log
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 检查HDFS日志
</span></span><span style="display:flex;"><span>tail -100 $HADOOP_LOG_DIR/hadoop-*-balancer-*.log
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 重新启动均衡
</span></span><span style="display:flex;"><span>nohup hdfs balancer -threshold 10 &gt; /tmp/balancer_retry.log 2&gt;&amp;1 &amp;
</span></span></code></pre></div><h3 id="2-性能问题">2. 性能问题</h3>
<h4 id="网络瓶颈">网络瓶颈</h4>
<p><strong>诊断</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 检查网络使用情况
</span></span><span style="display:flex;"><span>iftop -i eth0
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 检查网络延迟
</span></span><span style="display:flex;"><span>ping -c 10 &lt;target_node&gt;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 检查网络丢包
</span></span><span style="display:flex;"><span>mtr -r -c 10 &lt;target_node&gt;
</span></span></code></pre></div><p><strong>优化</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 调整网络参数
</span></span><span style="display:flex;"><span>echo &#39;net.core.rmem_max = 134217728&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>echo &#39;net.core.wmem_max = 134217728&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>sysctl -p
</span></span></code></pre></div><h4 id="磁盘io瓶颈">磁盘I/O瓶颈</h4>
<p><strong>诊断</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 检查磁盘使用情况
</span></span><span style="display:flex;"><span>iostat -x 1 10
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 检查磁盘队列
</span></span><span style="display:flex;"><span>iostat -x 1 10 | grep -E &#34;(Device|sd)&#34;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 检查磁盘错误
</span></span><span style="display:flex;"><span>dmesg | grep -i error
</span></span></code></pre></div><p><strong>优化</strong>：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 调整I/O调度器
</span></span><span style="display:flex;"><span>echo noop &gt; /sys/block/sda/queue/scheduler
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 调整I/O参数
</span></span><span style="display:flex;"><span>echo 1024 &gt; /sys/block/sda/queue/nr_requests
</span></span></code></pre></div><h3 id="3-数据完整性检查">3. 数据完整性检查</h3>
<h4 id="均衡后验证">均衡后验证</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 检查数据块完整性</span>
</span></span><span style="display:flex;"><span>hdfs fsck / -files -blocks -locations
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查副本数量</span>
</span></span><span style="display:flex;"><span>hdfs fsck / -files -blocks | grep -E <span style="color:#f1fa8c">&#34;(Missing|Under-replicated)&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查损坏的数据块</span>
</span></span><span style="display:flex;"><span>hdfs fsck / -files -blocks | grep -i corrupt
</span></span></code></pre></div><h4 id="数据恢复">数据恢复</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 修复损坏的数据块</span>
</span></span><span style="display:flex;"><span>hdfs fsck / -delete
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 重新平衡副本</span>
</span></span><span style="display:flex;"><span>hdfs balancer -threshold <span style="color:#bd93f9">1</span>
</span></span></code></pre></div><h2 id="最佳实践">最佳实践</h2>
<h3 id="1-均衡策略-1">1. 均衡策略</h3>
<h4 id="时间选择">时间选择</h4>
<ul>
<li><strong>业务低峰期</strong>：选择业务访问量最低的时间段</li>
<li><strong>维护窗口</strong>：在计划维护期间进行</li>
<li><strong>分批进行</strong>：对于大型集群，可以分批进行均衡</li>
</ul>
<h4 id="参数设置">参数设置</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 生产环境推荐参数</span>
</span></span><span style="display:flex;"><span>hdfs balancer -threshold <span style="color:#bd93f9">5</span> -policy datanode -idleiterations <span style="color:#bd93f9">3</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 测试环境参数</span>
</span></span><span style="display:flex;"><span>hdfs balancer -threshold <span style="color:#bd93f9">10</span> -policy datanode -idleiterations <span style="color:#bd93f9">1</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 紧急情况参数</span>
</span></span><span style="display:flex;"><span>hdfs balancer -threshold <span style="color:#bd93f9">20</span> -policy datanode
</span></span></code></pre></div><h3 id="2-监控策略">2. 监控策略</h3>
<h4 id="实时监控-1">实时监控</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4"># 创建监控脚本</span>
</span></span><span style="display:flex;"><span>cat <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_monitor<span style="color:#ff79c6">.</span>sh <span style="color:#ff79c6">&lt;&lt;</span> <span style="color:#f1fa8c">&#39;EOF&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">#!/bin/bash</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查均衡进程</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> ! pgrep <span style="color:#ff79c6">-</span>f <span style="color:#f1fa8c">&#34;hdfs.*balancer&#34;</span> <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>dev<span style="color:#ff79c6">/</span>null; then
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;$(date): 均衡进程未运行，尝试重启&#34;</span> <span style="color:#ff79c6">&gt;&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_monitor<span style="color:#ff79c6">.</span>log
</span></span><span style="display:flex;"><span>    nohup hdfs balancer <span style="color:#ff79c6">-</span>threshold <span style="color:#bd93f9">10</span> <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>log <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;&amp;</span><span style="color:#bd93f9">1</span> <span style="color:#ff79c6">&amp;</span>
</span></span><span style="display:flex;"><span>fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查集群状态</span>
</span></span><span style="display:flex;"><span>REPORT<span style="color:#ff79c6">=</span>$(hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;/</span>dev<span style="color:#ff79c6">/</span>null)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> [ $? <span style="color:#ff79c6">-</span>ne <span style="color:#bd93f9">0</span> ]; then
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;$(date): HDFS服务异常&#34;</span> <span style="color:#ff79c6">&gt;&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_monitor<span style="color:#ff79c6">.</span>log
</span></span><span style="display:flex;"><span>    exit <span style="color:#bd93f9">1</span>
</span></span><span style="display:flex;"><span>fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 计算均衡度</span>
</span></span><span style="display:flex;"><span>STD_DEV<span style="color:#ff79c6">=</span>$(echo <span style="color:#f1fa8c">&#34;$REPORT&#34;</span> <span style="color:#ff79c6">|</span> python3 <span style="color:#ff79c6">-</span>c <span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> sys
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> re
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>used_percents <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> sys<span style="color:#ff79c6">.</span>stdin:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span> <span style="color:#ff79c6">in</span> line:
</span></span><span style="display:flex;"><span>        percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;(\d+\.?\d*)%&#39;</span>, line)<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>))
</span></span><span style="display:flex;"><span>        used_percents<span style="color:#ff79c6">.</span>append(percent)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> used_percents:
</span></span><span style="display:flex;"><span>    avg <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;</span><span style="color:#f1fa8c">{</span>std_dev<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">else</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;0&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;$(date): 当前均衡度: $</span><span style="color:#f1fa8c">{STD_DEV}</span><span style="color:#f1fa8c">%&#34;</span> <span style="color:#ff79c6">&gt;&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_monitor<span style="color:#ff79c6">.</span>log
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 如果均衡度过高，启动均衡</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> (( $(echo <span style="color:#f1fa8c">&#34;$STD_DEV &gt; 15&#34;</span> <span style="color:#ff79c6">|</span> bc <span style="color:#ff79c6">-</span>l) )); then
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;$(date): 均衡度过高，启动均衡&#34;</span> <span style="color:#ff79c6">&gt;&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_monitor<span style="color:#ff79c6">.</span>log
</span></span><span style="display:flex;"><span>    nohup hdfs balancer <span style="color:#ff79c6">-</span>threshold <span style="color:#bd93f9">10</span> <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>log <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;&amp;</span><span style="color:#bd93f9">1</span> <span style="color:#ff79c6">&amp;</span>
</span></span><span style="display:flex;"><span>fi
</span></span><span style="display:flex;"><span>EOF
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>chmod <span style="color:#ff79c6">+</span>x <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_monitor<span style="color:#ff79c6">.</span>sh
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 添加到crontab，每30分钟检查一次</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;*/30 * * * * /tmp/balance_monitor.sh&#34;</span> <span style="color:#ff79c6">|</span> crontab <span style="color:#ff79c6">-</span>
</span></span></code></pre></div><h3 id="3-自动化脚本">3. 自动化脚本</h3>
<h4 id="完整均衡脚本">完整均衡脚本</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>cat <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>auto_balance<span style="color:#ff79c6">.</span>sh <span style="color:#ff79c6">&lt;&lt;</span> <span style="color:#f1fa8c">&#39;EOF&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">#!/bin/bash</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 配置参数</span>
</span></span><span style="display:flex;"><span>THRESHOLD<span style="color:#ff79c6">=</span><span style="color:#bd93f9">10</span>
</span></span><span style="display:flex;"><span>LOG_FILE<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;/tmp/auto_balance.log&#34;</span>
</span></span><span style="display:flex;"><span>BALANCE_LOG<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;/tmp/hdfs_balancer.log&#34;</span>
</span></span><span style="display:flex;"><span>MAX_RUNTIME<span style="color:#ff79c6">=</span><span style="color:#bd93f9">7200</span>  <span style="color:#6272a4"># 最大运行时间（秒）</span>
</span></span><span style="display:flex;"><span>CHECK_INTERVAL<span style="color:#ff79c6">=</span><span style="color:#bd93f9">300</span>  <span style="color:#6272a4"># 检查间隔（秒）</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 日志函数</span>
</span></span><span style="display:flex;"><span>log() {
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;[$(date &#39;+%Y-%m-</span><span style="color:#f1fa8c">%d</span><span style="color:#f1fa8c"> %H:%M:%S&#39;)] $1&#34;</span> <span style="color:#ff79c6">|</span> tee <span style="color:#ff79c6">-</span>a $LOG_FILE
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查HDFS状态</span>
</span></span><span style="display:flex;"><span>check_hdfs_status() {
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> ! hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>dev<span style="color:#ff79c6">/</span>null <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;&amp;</span><span style="color:#bd93f9">1</span>; then
</span></span><span style="display:flex;"><span>        log <span style="color:#f1fa8c">&#34;错误: HDFS服务不可用&#34;</span>
</span></span><span style="display:flex;"><span>        exit <span style="color:#bd93f9">1</span>
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;HDFS服务状态正常&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 计算均衡度</span>
</span></span><span style="display:flex;"><span>calculate_balance_degree() {
</span></span><span style="display:flex;"><span>    local report<span style="color:#ff79c6">=</span>$(hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;/</span>dev<span style="color:#ff79c6">/</span>null)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> [ $? <span style="color:#ff79c6">-</span>ne <span style="color:#bd93f9">0</span> ]; then
</span></span><span style="display:flex;"><span>        echo <span style="color:#f1fa8c">&#34;0&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span>
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;$report&#34;</span> <span style="color:#ff79c6">|</span> python3 <span style="color:#ff79c6">-</span>c <span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> sys
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> re
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>used_percents <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> sys<span style="color:#ff79c6">.</span>stdin:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span> <span style="color:#ff79c6">in</span> line:
</span></span><span style="display:flex;"><span>        percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;(\d+\.?\d*)%&#39;</span>, line)<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>))
</span></span><span style="display:flex;"><span>        used_percents<span style="color:#ff79c6">.</span>append(percent)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> used_percents:
</span></span><span style="display:flex;"><span>    avg <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;</span><span style="color:#f1fa8c">{</span>std_dev<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">else</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;0&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 启动均衡</span>
</span></span><span style="display:flex;"><span>start_balancer() {
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;启动HDFS均衡，阈值: $</span><span style="color:#f1fa8c">{THRESHOLD}</span><span style="color:#f1fa8c">%&#34;</span>
</span></span><span style="display:flex;"><span>    nohup hdfs balancer <span style="color:#ff79c6">-</span>threshold $THRESHOLD <span style="color:#ff79c6">-</span>policy datanode <span style="color:#ff79c6">&gt;</span> $BALANCE_LOG <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;&amp;</span><span style="color:#bd93f9">1</span> <span style="color:#ff79c6">&amp;</span>
</span></span><span style="display:flex;"><span>    BALANCER_PID<span style="color:#ff79c6">=</span>$!
</span></span><span style="display:flex;"><span>    echo $BALANCER_PID <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>pid
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;均衡进程已启动，PID: $BALANCER_PID&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 停止均衡</span>
</span></span><span style="display:flex;"><span>stop_balancer() {
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> [ <span style="color:#ff79c6">-</span>f <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>pid ]; then
</span></span><span style="display:flex;"><span>        local pid<span style="color:#ff79c6">=</span>$(cat <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>pid)
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> kill <span style="color:#ff79c6">-</span><span style="color:#bd93f9">0</span> $pid <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;/</span>dev<span style="color:#ff79c6">/</span>null; then
</span></span><span style="display:flex;"><span>            kill $pid
</span></span><span style="display:flex;"><span>            log <span style="color:#f1fa8c">&#34;均衡进程已停止，PID: $pid&#34;</span>
</span></span><span style="display:flex;"><span>        fi
</span></span><span style="display:flex;"><span>        rm <span style="color:#ff79c6">-</span>f <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>pid
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查均衡进度</span>
</span></span><span style="display:flex;"><span>check_balance_progress() {
</span></span><span style="display:flex;"><span>    local current_degree<span style="color:#ff79c6">=</span>$(calculate_balance_degree)
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;当前均衡度: $</span><span style="color:#f1fa8c">{current_degree}</span><span style="color:#f1fa8c">%&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> (( $(echo <span style="color:#f1fa8c">&#34;$current_degree &lt; 5&#34;</span> <span style="color:#ff79c6">|</span> bc <span style="color:#ff79c6">-</span>l) )); then
</span></span><span style="display:flex;"><span>        log <span style="color:#f1fa8c">&#34;均衡完成！&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#bd93f9">0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">else</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#bd93f9">1</span>
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 主函数</span>
</span></span><span style="display:flex;"><span>main() {
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;开始自动均衡流程&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 检查HDFS状态</span>
</span></span><span style="display:flex;"><span>    check_hdfs_status
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 计算初始均衡度</span>
</span></span><span style="display:flex;"><span>    initial_degree<span style="color:#ff79c6">=</span>$(calculate_balance_degree)
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;初始均衡度: $</span><span style="color:#f1fa8c">{initial_degree}</span><span style="color:#f1fa8c">%&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 如果已经均衡，退出</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> (( $(echo <span style="color:#f1fa8c">&#34;$initial_degree &lt; $THRESHOLD&#34;</span> <span style="color:#ff79c6">|</span> bc <span style="color:#ff79c6">-</span>l) )); then
</span></span><span style="display:flex;"><span>        log <span style="color:#f1fa8c">&#34;集群已经均衡，无需执行均衡操作&#34;</span>
</span></span><span style="display:flex;"><span>        exit <span style="color:#bd93f9">0</span>
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 启动均衡</span>
</span></span><span style="display:flex;"><span>    start_balancer
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 监控均衡进度</span>
</span></span><span style="display:flex;"><span>    start_time<span style="color:#ff79c6">=</span>$(date <span style="color:#ff79c6">+%</span>s)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">while</span> true; do
</span></span><span style="display:flex;"><span>        current_time<span style="color:#ff79c6">=</span>$(date <span style="color:#ff79c6">+%</span>s)
</span></span><span style="display:flex;"><span>        elapsed<span style="color:#ff79c6">=</span>$((current_time <span style="color:#ff79c6">-</span> start_time))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 检查是否超时</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> [ $elapsed <span style="color:#ff79c6">-</span>gt $MAX_RUNTIME ]; then
</span></span><span style="display:flex;"><span>            log <span style="color:#f1fa8c">&#34;均衡超时，停止均衡进程&#34;</span>
</span></span><span style="display:flex;"><span>            stop_balancer
</span></span><span style="display:flex;"><span>            exit <span style="color:#bd93f9">1</span>
</span></span><span style="display:flex;"><span>        fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 检查均衡进程是否还在运行</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> [ <span style="color:#ff79c6">-</span>f <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>pid ]; then
</span></span><span style="display:flex;"><span>            local pid<span style="color:#ff79c6">=</span>$(cat <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>pid)
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">if</span> ! kill <span style="color:#ff79c6">-</span><span style="color:#bd93f9">0</span> $pid <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;/</span>dev<span style="color:#ff79c6">/</span>null; then
</span></span><span style="display:flex;"><span>                log <span style="color:#f1fa8c">&#34;均衡进程异常退出&#34;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#ff79c6">break</span>
</span></span><span style="display:flex;"><span>            fi
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">else</span>
</span></span><span style="display:flex;"><span>            log <span style="color:#f1fa8c">&#34;均衡进程PID文件不存在&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">break</span>
</span></span><span style="display:flex;"><span>        fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 检查均衡进度</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> check_balance_progress; then
</span></span><span style="display:flex;"><span>            stop_balancer
</span></span><span style="display:flex;"><span>            log <span style="color:#f1fa8c">&#34;均衡成功完成&#34;</span>
</span></span><span style="display:flex;"><span>            exit <span style="color:#bd93f9">0</span>
</span></span><span style="display:flex;"><span>        fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 等待下次检查</span>
</span></span><span style="display:flex;"><span>        sleep $CHECK_INTERVAL
</span></span><span style="display:flex;"><span>    done
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 清理</span>
</span></span><span style="display:flex;"><span>    stop_balancer
</span></span><span style="display:flex;"><span>    log <span style="color:#f1fa8c">&#34;均衡流程结束&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 信号处理</span>
</span></span><span style="display:flex;"><span>trap <span style="color:#f1fa8c">&#39;log &#34;收到中断信号，停止均衡&#34;; stop_balancer; exit 1&#39;</span> INT TERM
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 执行主函数</span>
</span></span><span style="display:flex;"><span>main <span style="color:#f1fa8c">&#34;$@&#34;</span>
</span></span><span style="display:flex;"><span>EOF
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>chmod <span style="color:#ff79c6">+</span>x <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>auto_balance<span style="color:#ff79c6">.</span>sh
</span></span></code></pre></div><h2 id="性能优化建议">性能优化建议</h2>
<h3 id="1-系统级优化">1. 系统级优化</h3>
<h4 id="网络优化">网络优化</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 调整网络缓冲区
</span></span><span style="display:flex;"><span>echo &#39;net.core.rmem_max = 134217728&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>echo &#39;net.core.wmem_max = 134217728&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>echo &#39;net.core.rmem_default = 65536&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>echo &#39;net.core.wmem_default = 65536&#39; &gt;&gt; /etc/sysctl.conf
</span></span><span style="display:flex;"><span>sysctl -p
</span></span></code></pre></div><h4 id="磁盘优化">磁盘优化</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 调整I/O调度器
</span></span><span style="display:flex;"><span>echo noop &gt; /sys/block/sda/queue/scheduler
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 调整I/O参数
</span></span><span style="display:flex;"><span>echo 1024 &gt; /sys/block/sda/queue/nr_requests
</span></span><span style="display:flex;"><span>echo 0 &gt; /sys/block/sda/queue/add_random
</span></span></code></pre></div><h3 id="2-hdfs配置优化">2. HDFS配置优化</h3>
<h4 id="均衡带宽设置">均衡带宽设置</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span><span style="color:#6272a4">&lt;!-- hdfs-site.xml --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>dfs.datanode.balance.bandwidthPerSec<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>52428800<span style="color:#ff79c6">&lt;/value&gt;</span>  <span style="color:#6272a4">&lt;!-- 50MB/s --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span></code></pre></div><h4 id="复制参数优化">复制参数优化</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span><span style="color:#6272a4">&lt;!-- hdfs-site.xml --&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>dfs.replication<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>3<span style="color:#ff79c6">&lt;/value&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;property&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;name&gt;</span>dfs.namenode.replication.work.multiplier.per.iteration<span style="color:#ff79c6">&lt;/name&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">&lt;value&gt;</span>2<span style="color:#ff79c6">&lt;/value&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">&lt;/property&gt;</span>
</span></span></code></pre></div><h3 id="3-监控和告警">3. 监控和告警</h3>
<h4 id="设置告警">设置告警</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4"># 创建告警脚本</span>
</span></span><span style="display:flex;"><span>cat <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_alert<span style="color:#ff79c6">.</span>sh <span style="color:#ff79c6">&lt;&lt;</span> <span style="color:#f1fa8c">&#39;EOF&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4">#!/bin/bash</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 配置参数</span>
</span></span><span style="display:flex;"><span>ALERT_THRESHOLD<span style="color:#ff79c6">=</span><span style="color:#bd93f9">20</span>
</span></span><span style="display:flex;"><span>EMAIL_LIST<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;admin@company.com&#34;</span>
</span></span><span style="display:flex;"><span>LOG_FILE<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;/tmp/balance_alert.log&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 计算均衡度</span>
</span></span><span style="display:flex;"><span>STD_DEV<span style="color:#ff79c6">=</span>$(hdfs dfsadmin <span style="color:#ff79c6">-</span>report <span style="color:#ff79c6">|</span> python3 <span style="color:#ff79c6">-</span>c <span style="color:#f1fa8c">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> sys
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> re
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>used_percents <span style="color:#ff79c6">=</span> []
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> line <span style="color:#ff79c6">in</span> sys<span style="color:#ff79c6">.</span>stdin:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#f1fa8c">&#39;DFS Used%:&#39;</span> <span style="color:#ff79c6">in</span> line:
</span></span><span style="display:flex;"><span>        percent <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">float</span>(re<span style="color:#ff79c6">.</span>search(<span style="color:#f1fa8c">r</span><span style="color:#f1fa8c">&#39;(\d+\.?\d*)%&#39;</span>, line)<span style="color:#ff79c6">.</span>group(<span style="color:#bd93f9">1</span>))
</span></span><span style="display:flex;"><span>        used_percents<span style="color:#ff79c6">.</span>append(percent)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> used_percents:
</span></span><span style="display:flex;"><span>    avg <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>(used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    variance <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">sum</span>((x <span style="color:#ff79c6">-</span> avg) <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">for</span> x <span style="color:#ff79c6">in</span> used_percents) <span style="color:#ff79c6">/</span> <span style="color:#8be9fd;font-style:italic">len</span>(used_percents)
</span></span><span style="display:flex;"><span>    std_dev <span style="color:#ff79c6">=</span> variance <span style="color:#ff79c6">**</span> <span style="color:#bd93f9">0.5</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;</span><span style="color:#f1fa8c">{</span>std_dev<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.2f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">else</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">&#39;0&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 检查是否需要告警</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> (( $(echo <span style="color:#f1fa8c">&#34;$STD_DEV &gt; $ALERT_THRESHOLD&#34;</span> <span style="color:#ff79c6">|</span> bc <span style="color:#ff79c6">-</span>l) )); then
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;$(date): 警告: HDFS均衡度过高 ($</span><span style="color:#f1fa8c">{STD_DEV}</span><span style="color:#f1fa8c">%)&#34;</span> <span style="color:#ff79c6">&gt;&gt;</span> $LOG_FILE
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 发送邮件告警</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#f1fa8c">&#34;HDFS集群均衡度过高: $</span><span style="color:#f1fa8c">{STD_DEV}</span><span style="color:#f1fa8c">%&#34;</span> <span style="color:#ff79c6">|</span> \
</span></span><span style="display:flex;"><span>        mail <span style="color:#ff79c6">-</span>s <span style="color:#f1fa8c">&#34;HDFS均衡告警&#34;</span> $EMAIL_LIST
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 自动启动均衡</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> ! pgrep <span style="color:#ff79c6">-</span>f <span style="color:#f1fa8c">&#34;hdfs.*balancer&#34;</span> <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>dev<span style="color:#ff79c6">/</span>null; then
</span></span><span style="display:flex;"><span>        nohup hdfs balancer <span style="color:#ff79c6">-</span>threshold <span style="color:#bd93f9">10</span> <span style="color:#ff79c6">&gt;</span> <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balancer<span style="color:#ff79c6">.</span>log <span style="color:#bd93f9">2</span><span style="color:#ff79c6">&gt;&amp;</span><span style="color:#bd93f9">1</span> <span style="color:#ff79c6">&amp;</span>
</span></span><span style="display:flex;"><span>        echo <span style="color:#f1fa8c">&#34;$(date): 已自动启动均衡进程&#34;</span> <span style="color:#ff79c6">&gt;&gt;</span> $LOG_FILE
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>fi
</span></span><span style="display:flex;"><span>EOF
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>chmod <span style="color:#ff79c6">+</span>x <span style="color:#ff79c6">/</span>tmp<span style="color:#ff79c6">/</span>balance_alert<span style="color:#ff79c6">.</span>sh
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 添加到crontab，每小时检查一次</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#f1fa8c">&#34;0 * * * * /tmp/balance_alert.sh&#34;</span> <span style="color:#ff79c6">|</span> crontab <span style="color:#ff79c6">-</span>
</span></span></code></pre></div><h2 id="总结">总结</h2>
<p>HDFS均衡是维护集群健康状态的重要操作。通过合理使用均衡参数、建立完善的监控体系、遵循最佳实践，可以确保集群始终保持良好的数据分布和性能表现。</p>
<h3 id="关键要点">关键要点：</h3>
<ol>
<li><strong>及时均衡</strong>：在节点使用率差异超过10%时及时进行均衡</li>
<li><strong>合理参数</strong>：根据集群规模和环境选择合适的均衡参数</li>
<li><strong>持续监控</strong>：建立自动化监控和告警机制</li>
<li><strong>性能优化</strong>：从系统、网络、存储等多个层面进行优化</li>
<li><strong>安全操作</strong>：在业务低峰期进行均衡，确保数据安全</li>
</ol>
<p>通过遵循本指南，您可以有效地管理和维护HDFS集群的数据均衡，确保集群的高可用性和高性能。</p>
]]></content:encoded></item></channel></rss>