<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Python on heyaohua's Blog</title><link>https://blog.heyaohua.com/tags/python/</link><description>Recent content in Python on heyaohua's Blog</description><image><title>heyaohua's Blog</title><url>https://blog.heyaohua.com/og-image.png</url><link>https://blog.heyaohua.com/og-image.png</link></image><generator>Hugo</generator><language>zh-cn</language><lastBuildDate>Fri, 26 Sep 2025 14:00:00 +0800</lastBuildDate><atom:link href="https://blog.heyaohua.com/tags/python/index.xml" rel="self" type="application/rss+xml"/><item><title>淘宝自动化框架选择方案</title><link>https://blog.heyaohua.com/posts/2025/09/taobao-automation-framework/</link><pubDate>Fri, 26 Sep 2025 14:00:00 +0800</pubDate><guid>https://blog.heyaohua.com/posts/2025/09/taobao-automation-framework/</guid><description>国产框架，中文文档完善</description><content:encoded><![CDATA[<h1 id="淘宝自动化框架选择方案">淘宝自动化框架选择方案</h1>
<h2 id="-推荐方案drissionpage--现有架构">🎯 推荐方案：DrissionPage + 现有架构</h2>
<h3 id="为什么选择-drissionpage">为什么选择 DrissionPage？</h3>
<ol>
<li><strong>专为中国网站设计</strong></li>
<li>针对淘宝、京东等电商网站优化</li>
<li>内置常见反爬虫机制绕过</li>
<li></li>
</ol>
<p>国产框架，中文文档完善</p>
<ol start="5">
<li></li>
</ol>
<p><strong>与现有架构完美融合</strong></p>
<ol start="6">
<li>可以直接使用现有的 requests session</li>
<li>支持与 mitmproxy 代理集成</li>
<li></li>
</ol>
<p>兼容现有的数据处理管道</p>
<ol start="9">
<li></li>
</ol>
<p><strong>性能与易用性并存</strong></p>
<ol start="10">
<li>基于 Chromium 内核，性能优秀</li>
<li>API 设计简洁直观</li>
<li>支持页面模式和 requests 模式切换</li>
</ol>
<h2 id="-框架对比分析">📊 框架对比分析</h2>
<table>
  <thead>
      <tr>
          <th>特性</th>
          <th>DrissionPage</th>
          <th>Playwright</th>
          <th>Selenium</th>
          <th>Requests-HTML</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>性能</strong></td>
          <td>很快</td>
          <td>最快</td>
          <td>中等</td>
          <td>快</td>
      </tr>
      <tr>
          <td><strong>反爬虫能力</strong></td>
          <td>优秀</td>
          <td>优秀</td>
          <td>一般</td>
          <td>较弱</td>
      </tr>
      <tr>
          <td><strong>淘宝适配</strong></td>
          <td>优秀</td>
          <td>好</td>
          <td>一般</td>
          <td>较弱</td>
      </tr>
      <tr>
          <td><strong>学习成本</strong></td>
          <td>低</td>
          <td>中</td>
          <td>中</td>
          <td>低</td>
      </tr>
      <tr>
          <td><strong>中文文档</strong></td>
          <td>优秀</td>
          <td>一般</td>
          <td>好</td>
          <td>一般</td>
      </tr>
      <tr>
          <td><strong>社区支持</strong></td>
          <td>活跃</td>
          <td>活跃</td>
          <td>最大</td>
          <td>较小</td>
      </tr>
  </tbody>
</table>
<h2 id="-技术实施路线">🛠️ 技术实施路线</h2>
<h3 id="阶段一环境准备">阶段一：环境准备</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 安装 DrissionPage</span>
</span></span><span style="display:flex;"><span>pip install DrissionPage
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 安装备选方案（可选）</span>
</span></span><span style="display:flex;"><span>pip install playwright
</span></span><span style="display:flex;"><span>pip install selenium
</span></span></code></pre></div><h3 id="阶段二基础集成">阶段二：基础集成</h3>
<ol>
<li>创建 <code>TaobaoAutomator</code> 类</li>
<li>集成现有的代理服务器</li>
<li>实现基础的搜索和数据提取功能</li>
</ol>
<h3 id="阶段三高级功能">阶段三：高级功能</h3>
<ol>
<li>反爬虫策略优化</li>
<li>数据清洗和存储</li>
<li>错误处理和重试机制</li>
</ol>
<h3 id="阶段四性能优化">阶段四：性能优化</h3>
<ol>
<li>并发处理</li>
<li>资源管理</li>
<li>监控和日志</li>
</ol>
<h2 id="-备选方案">💡 备选方案</h2>
<h3 id="方案-a纯-playwright如果团队技术能力强">方案 A：纯 Playwright（如果团队技术能力强）</h3>
<ul>
<li>性能最佳</li>
<li>功能最全面</li>
<li>需要较多学习时间</li>
</ul>
<h3 id="方案-bselenium如果需要最大兼容性">方案 B：Selenium（如果需要最大兼容性）</h3>
<ul>
<li>社区资源最丰富</li>
<li>兼容性最好</li>
<li>性能相对较慢</li>
</ul>
<h3 id="方案-c混合方案">方案 C：混合方案</h3>
<ul>
<li>DrissionPage 处理复杂交互</li>
<li>requests 处理简单API调用</li>
<li>mitmproxy 处理数据截取</li>
</ul>
<h2 id="-具体实现示例">🎪 具体实现示例</h2>
<h3 id="drissionpage-基础用法">DrissionPage 基础用法</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">from</span> DrissionPage <span style="color:#ff79c6">import</span> ChromiumPage
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 创建页面对象</span>
</span></span><span style="display:flex;"><span>page <span style="color:#ff79c6">=</span> ChromiumPage()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 访问淘宝</span>
</span></span><span style="display:flex;"><span>page<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;https://www.taobao.com&#39;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 搜索商品</span>
</span></span><span style="display:flex;"><span>search_box <span style="color:#ff79c6">=</span> page<span style="color:#ff79c6">.</span>ele(<span style="color:#f1fa8c">&#39;#q&#39;</span>)
</span></span><span style="display:flex;"><span>search_box<span style="color:#ff79c6">.</span>input(<span style="color:#f1fa8c">&#39;手机&#39;</span>)
</span></span><span style="display:flex;"><span>search_box<span style="color:#ff79c6">.</span>after()<span style="color:#ff79c6">.</span>click()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 获取商品信息</span>
</span></span><span style="display:flex;"><span>products <span style="color:#ff79c6">=</span> page<span style="color:#ff79c6">.</span>eles(<span style="color:#f1fa8c">&#39;.item&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> product <span style="color:#ff79c6">in</span> products:
</span></span><span style="display:flex;"><span>    title <span style="color:#ff79c6">=</span> product<span style="color:#ff79c6">.</span>ele(<span style="color:#f1fa8c">&#39;.title&#39;</span>)<span style="color:#ff79c6">.</span>text
</span></span><span style="display:flex;"><span>    price <span style="color:#ff79c6">=</span> product<span style="color:#ff79c6">.</span>ele(<span style="color:#f1fa8c">&#39;.price&#39;</span>)<span style="color:#ff79c6">.</span>text
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">{</span>title<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">: </span><span style="color:#f1fa8c">{</span>price<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#34;</span>)
</span></span></code></pre></div><h3 id="与现有架构集成">与现有架构集成</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">from</span> DrissionPage <span style="color:#ff79c6">import</span> ChromiumPage
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">from</span> crawler.gateway.proxy_server <span style="color:#ff79c6">import</span> ProxyServer
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">class</span> <span style="color:#50fa7b">TaobaoAutomator</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> <span style="color:#50fa7b">__init__</span>(<span style="font-style:italic">self</span>):
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 启动代理服务器</span>
</span></span><span style="display:flex;"><span>        <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>proxy_server <span style="color:#ff79c6">=</span> ProxyServer()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 配置 DrissionPage 使用代理</span>
</span></span><span style="display:flex;"><span>        <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>page <span style="color:#ff79c6">=</span> ChromiumPage()
</span></span><span style="display:flex;"><span>        <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>page<span style="color:#ff79c6">.</span>set<span style="color:#ff79c6">.</span>proxy(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;127.0.0.1:</span><span style="color:#f1fa8c">{</span><span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>proxy_server<span style="color:#ff79c6">.</span>port<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#39;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> <span style="color:#50fa7b">search_products</span>(<span style="font-style:italic">self</span>, keyword):
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 实现搜索逻辑</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">pass</span>
</span></span></code></pre></div><h2 id="-技术要点">🔧 技术要点</h2>
<ol>
<li><strong>代理集成</strong>：确保自动化框架使用现有的代理服务器</li>
<li><strong>数据同步</strong>：截取的API数据与页面数据关联</li>
<li><strong>反爬虫</strong>：实现用户行为模拟和请求间隔控制</li>
<li><strong>错误处理</strong>：网络异常、页面变化等情况的处理</li>
</ol>
<h2 id="-预期效果">📈 预期效果</h2>
<ul>
<li><strong>开发效率提升 50%</strong>：相比从零开始</li>
<li><strong>数据质量提升</strong>：结合API和页面数据</li>
<li><strong>稳定性增强</strong>：多重反爬虫策略</li>
<li><strong>维护成本降低</strong>：统一的架构设计</li>
</ul>
]]></content:encoded></item><item><title>我用Python开发了一个淘宝图片搜索自动化系统</title><link>https://blog.heyaohua.com/posts/2025/05/taobao-image-search-automation/</link><pubDate>Mon, 26 May 2025 16:00:00 +0800</pubDate><guid>https://blog.heyaohua.com/posts/2025/05/taobao-image-search-automation/</guid><description>在电商时代，图片搜索已经成为用户发现商品的重要方式。作为开发者，我经常需要为客户批量搜索相似商品并生成报告。手动操作不仅效率低下，还容易出错。于是，我决定开发一个自动化系统来解决这个问题。</description><content:encoded><![CDATA[<p>在电商时代，图片搜索已经成为用户发现商品的重要方式。作为开发者，我经常需要为客户批量搜索相似商品并生成报告。手动操作不仅效率低下，还容易出错。于是，我决定开发一个自动化系统来解决这个问题。</p>
<h2 id="项目目标">项目目标</h2>
<ul>
<li>批量处理图片搜索</li>
<li>自动提取商品数据</li>
<li>生成包含图片的Excel报告</li>
<li>自动发送邮件通知</li>
<li>完整的错误处理和日志记录</li>
</ul>
<h2 id="技术选型">技术选型</h2>
<h3 id="自动化框架drissionpage">自动化框架：DrissionPage</h3>
<p>经过对比Selenium、Playwright等框架，我选择了DrissionPage：</p>
<ul>
<li>专为中国网站优化</li>
<li>反爬虫能力强</li>
<li>对淘宝等国内电商支持好</li>
</ul>
<h3 id="数据拦截mitmproxy">数据拦截：mitmproxy</h3>
<ul>
<li>能够拦截HTTPS流量</li>
<li>支持自定义插件</li>
<li>适合API数据提取</li>
</ul>
<h3 id="数据处理">数据处理</h3>
<ul>
<li>Pandas：数据处理</li>
<li>openpyxl：Excel操作</li>
<li>Pillow：图片处理</li>
</ul>
<h2 id="核心功能实现">核心功能实现</h2>
<h3 id="1-图片搜索自动化">1. 图片搜索自动化</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">search_by_image</span>(<span style="font-style:italic">self</span>, image_path: <span style="color:#8be9fd;font-style:italic">str</span>):
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;&#34;&#34;图片搜索功能&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 1. 打开淘宝首页</span>
</span></span><span style="display:flex;"><span>    <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>browser<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;https://www.taobao.com&#39;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 2. 点击搜同款按钮</span>
</span></span><span style="display:flex;"><span>    search_button <span style="color:#ff79c6">=</span> <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>browser<span style="color:#ff79c6">.</span>ele(<span style="color:#f1fa8c">&#39;css:.image-search-icon-wrapper&#39;</span>)
</span></span><span style="display:flex;"><span>    search_button<span style="color:#ff79c6">.</span>click()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 3. 上传图片</span>
</span></span><span style="display:flex;"><span>    file_input <span style="color:#ff79c6">=</span> <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>browser<span style="color:#ff79c6">.</span>ele(<span style="color:#f1fa8c">&#39;css:#image-search-custom-file-input&#39;</span>)
</span></span><span style="display:flex;"><span>    file_input<span style="color:#ff79c6">.</span>input(image_path)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 4. 等待上传完成并搜索</span>
</span></span><span style="display:flex;"><span>    <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>_wait_for_upload_complete()
</span></span><span style="display:flex;"><span>    search_btn <span style="color:#ff79c6">=</span> <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>browser<span style="color:#ff79c6">.</span>ele(<span style="color:#f1fa8c">&#39;css:#image-search-upload-button&#39;</span>)
</span></span><span style="display:flex;"><span>    search_btn<span style="color:#ff79c6">.</span>click()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 5. 提取商品数据</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">return</span> <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>_extract_products_from_page()
</span></span></code></pre></div><h3 id="2-数据拦截与提取">2. 数据拦截与提取</h3>
<p>通过mitmproxy拦截淘宝API响应，提取商品信息：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">response</span>(flow: http<span style="color:#ff79c6">.</span>HTTPFlow) <span style="color:#ff79c6">-&gt;</span> <span style="color:#ff79c6">None</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;&#34;&#34;拦截API响应&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#f1fa8c">&#39;h5api.m.taobao.com&#39;</span> <span style="color:#ff79c6">in</span> flow<span style="color:#ff79c6">.</span>request<span style="color:#ff79c6">.</span>pretty_url:
</span></span><span style="display:flex;"><span>        content <span style="color:#ff79c6">=</span> flow<span style="color:#ff79c6">.</span>response<span style="color:#ff79c6">.</span>text
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 解析JSONP响应，提取商品数据</span>
</span></span><span style="display:flex;"><span>        data <span style="color:#ff79c6">=</span> parse_jsonp_response(content)
</span></span><span style="display:flex;"><span>        save_to_file(data)
</span></span></code></pre></div><h3 id="3-excel报告生成">3. Excel报告生成</h3>
<p>生成多Sheet的Excel文件，包含压缩图片：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">generate_excel_report</span>(<span style="font-style:italic">self</span>, products_data):
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;&#34;&#34;生成Excel报告&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    workbook <span style="color:#ff79c6">=</span> openpyxl<span style="color:#ff79c6">.</span>Workbook()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">for</span> sheet_data <span style="color:#ff79c6">in</span> products_data:
</span></span><span style="display:flex;"><span>        worksheet <span style="color:#ff79c6">=</span> workbook<span style="color:#ff79c6">.</span>create_sheet(sheet_data[<span style="color:#f1fa8c">&#39;name&#39;</span>])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 添加商品数据</span>
</span></span><span style="display:flex;"><span>        <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>add_product_data(worksheet, sheet_data[<span style="color:#f1fa8c">&#39;products&#39;</span>])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 下载并添加商品图片</span>
</span></span><span style="display:flex;"><span>        <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>add_product_images(worksheet, sheet_data[<span style="color:#f1fa8c">&#39;products&#39;</span>])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    workbook<span style="color:#ff79c6">.</span>save(<span style="color:#f1fa8c">&#39;report.xlsx&#39;</span>)
</span></span></code></pre></div><h3 id="4-图片压缩优化">4. 图片压缩优化</h3>
<p>解决Excel文件过大的问题：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">compress_image</span>(<span style="font-style:italic">self</span>, image_path: <span style="color:#8be9fd;font-style:italic">str</span>) <span style="color:#ff79c6">-&gt;</span> <span style="color:#8be9fd;font-style:italic">str</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;&#34;&#34;智能图片压缩&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">with</span> Image<span style="color:#ff79c6">.</span>open(image_path) <span style="color:#ff79c6">as</span> img:
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 调整尺寸到400x400</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> img<span style="color:#ff79c6">.</span>width <span style="color:#ff79c6">&gt;</span> <span style="color:#bd93f9">400</span> <span style="color:#ff79c6">or</span> img<span style="color:#ff79c6">.</span>height <span style="color:#ff79c6">&gt;</span> <span style="color:#bd93f9">400</span>:
</span></span><span style="display:flex;"><span>            img<span style="color:#ff79c6">.</span>thumbnail((<span style="color:#bd93f9">400</span>, <span style="color:#bd93f9">400</span>), Image<span style="color:#ff79c6">.</span>Resampling<span style="color:#ff79c6">.</span>LANCZOS)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 压缩质量到40%</span>
</span></span><span style="display:flex;"><span>        img<span style="color:#ff79c6">.</span>save(compressed_path, <span style="color:#f1fa8c">&#34;JPEG&#34;</span>, quality<span style="color:#ff79c6">=</span><span style="color:#bd93f9">40</span>, optimize<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">return</span> compressed_path
</span></span></code></pre></div><h2 id="技术难点与解决方案">技术难点与解决方案</h2>
<h3 id="1-反爬虫对抗">1. 反爬虫对抗</h3>
<p><strong>问题</strong>：淘宝有完善的反爬虫机制</p>
<p><strong>解决方案</strong>：</p>
<ul>
<li>使用DrissionPage框架</li>
<li>设置随机延迟</li>
<li>模拟真实用户行为</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4"># 随机延迟模拟人类行为</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> random
</span></span><span style="display:flex;"><span>time<span style="color:#ff79c6">.</span>sleep(random<span style="color:#ff79c6">.</span>uniform(<span style="color:#bd93f9">1</span>, <span style="color:#bd93f9">3</span>))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 滚动页面</span>
</span></span><span style="display:flex;"><span><span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>browser<span style="color:#ff79c6">.</span>scroll_to_bottom()
</span></span></code></pre></div><h3 id="2-图片上传处理">2. 图片上传处理</h3>
<p><strong>问题</strong>：淘宝使用隐藏的file input</p>
<p><strong>解决方案</strong>：</p>
<ul>
<li>使用JavaScript直接设置文件路径</li>
<li>监听上传进度事件</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 直接设置文件路径
</span></span><span style="display:flex;"><span>file_input = self.browser.ele(&#39;css:#image-search-custom-file-input&#39;)
</span></span><span style="display:flex;"><span>self.browser.run_js(f&#34;arguments[0].value = &#39;{image_path}&#39;&#34;, file_input)
</span></span></code></pre></div><h3 id="3-数据解析复杂性">3. 数据解析复杂性</h3>
<p><strong>问题</strong>：淘宝API返回JSONP格式，结构复杂</p>
<p><strong>解决方案</strong>：</p>
<ul>
<li>递归解析JSON结构</li>
<li>使用多种字段别名匹配</li>
<li>建立数据质量评分</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">find_items_recursively</span>(<span style="font-style:italic">self</span>, obj):
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;&#34;&#34;递归查找商品数据&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#8be9fd;font-style:italic">isinstance</span>(obj, <span style="color:#8be9fd;font-style:italic">dict</span>) <span style="color:#ff79c6">and</span> <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>_is_product_item(obj):
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> [<span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>_extract_product_info(obj)]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> <span style="color:#8be9fd;font-style:italic">isinstance</span>(obj, <span style="color:#8be9fd;font-style:italic">list</span>):
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">for</span> item <span style="color:#ff79c6">in</span> obj:
</span></span><span style="display:flex;"><span>            result <span style="color:#ff79c6">=</span> <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>find_items_recursively(item)
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">if</span> result:
</span></span><span style="display:flex;"><span>                <span style="color:#ff79c6">return</span> result
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">return</span> []
</span></span></code></pre></div><h2 id="项目成果">项目成果</h2>
<h3 id="功能实现">功能实现</h3>
<ul>
<li>✅ 批量图片搜索（15张图片/批次）</li>
<li>✅ 自动数据提取和解析</li>
<li>✅ 多Sheet Excel报告生成</li>
<li>✅ 邮件自动发送</li>
<li>✅ 数据自动清理</li>
</ul>
<h3 id="性能指标">性能指标</h3>
<ul>
<li><strong>处理速度</strong>：15张图片约3分钟</li>
<li><strong>文件大小</strong>：从326MB压缩到16MB</li>
<li><strong>成功率</strong>：95%以上</li>
<li><strong>稳定性</strong>：支持错误重试</li>
</ul>
<h3 id="用户体验">用户体验</h3>
<ul>
<li><strong>一键运行</strong>：<code>python run.py</code></li>
<li><strong>配置简单</strong>：只需配置邮件信息</li>
<li><strong>日志详细</strong>：完整的执行日志</li>
</ul>
<h2 id="项目结构">项目结构</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>taobao-search/
</span></span><span style="display:flex;"><span>├── run.py                    # 主启动脚本
</span></span><span style="display:flex;"><span>├── src/                      # 源代码
</span></span><span style="display:flex;"><span>│   ├── automation/           # 自动化模块
</span></span><span style="display:flex;"><span>│   ├── email/               # 邮件服务
</span></span><span style="display:flex;"><span>│   ├── excel/               # Excel处理
</span></span><span style="display:flex;"><span>│   └── workflow/            # 工作流程
</span></span><span style="display:flex;"><span>├── config/                  # 配置文件
</span></span><span style="display:flex;"><span>├── IMG_LIST/                # 图片目录
</span></span><span style="display:flex;"><span>└── data/                    # 数据目录
</span></span></code></pre></div><h2 id="使用方法">使用方法</h2>
<ol>
<li><strong>准备图片</strong>：将图片放入<code>IMG_LIST</code>目录</li>
<li><strong>配置邮件</strong>：编辑<code>config/email_config.json</code></li>
<li><strong>一键运行</strong>：<code>python run.py</code></li>
<li><strong>查看结果</strong>：Excel文件在<code>data/exports/</code>目录</li>
</ol>
<h2 id="技术总结">技术总结</h2>
<h3 id="收获">收获</h3>
<ol>
<li><strong>自动化框架选择</strong>：DrissionPage在反爬虫方面表现优秀</li>
<li><strong>数据拦截技术</strong>：mitmproxy是API数据提取的有效方案</li>
<li><strong>图片处理优化</strong>：合理的压缩策略能显著减小文件大小</li>
<li><strong>工作流程设计</strong>：模块化设计便于维护和扩展</li>
</ol>
<h3 id="价值">价值</h3>
<ul>
<li><strong>效率提升</strong>：从手动操作到全自动化，效率提升10倍</li>
<li><strong>质量保证</strong>：自动化处理减少人为错误</li>
<li><strong>可扩展性</strong>：模块化设计便于功能扩展</li>
</ul>
<h2 id="未来优化">未来优化</h2>
<ol>
<li><strong>支持更多平台</strong>：扩展到京东、拼多多等</li>
<li><strong>增加数据分析</strong>：价格趋势分析、竞品对比</li>
<li><strong>优化用户体验</strong>：Web界面、实时进度显示</li>
<li><strong>增强稳定性</strong>：更完善的错误处理</li>
</ol>
<h2 id="结语">结语</h2>
<p>这个项目从需求分析到最终实现，经历了完整的产品开发周期。通过合理的技术选型、模块化的架构设计和完善的错误处理，最终实现了一个稳定可靠的自动化系统。</p>
<p>最大的挑战是反爬虫对抗和数据解析的复杂性，通过不断调试和优化，最终找到了有效的解决方案。</p>
<p>这个项目不仅解决了实际的业务问题，也让我在自动化测试、数据处理、系统架构等方面有了更深入的理解和实践经验。</p>
]]></content:encoded></item><item><title>MySQL与PostgreSQL全面对比与压测方案</title><link>https://blog.heyaohua.com/posts/2024/12/mysql-postgresql-comparison/</link><pubDate>Sun, 15 Dec 2024 10:00:00 +0800</pubDate><guid>https://blog.heyaohua.com/posts/2024/12/mysql-postgresql-comparison/</guid><description>MySQL与PostgreSQL全面对比分析，包括核心技术差异、性能测试、Python操作示例及压测方案。帮助开发者选择适合业务场景的数据库系统。</description><content:encoded><![CDATA[<h2 id="一mysql与postgresql对比分析">一、MySQL与PostgreSQL对比分析</h2>
<h3 id="背景与概述">背景与概述</h3>
<p>MySQL长期因轻量和高性能占市场主导，PostgreSQL凭借先进特性和稳定性近年快速崛起，尤其在云原生和复杂业务需求场景中表现出色。</p>
<h3 id="核心技术对比">核心技术对比</h3>
<table>
  <thead>
      <tr>
          <th>特性</th>
          <th>MySQL</th>
          <th>PostgreSQL</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>数据一致性</td>
          <td>MVCC，读已提交隔离，异步复制</td>
          <td>完备MVCC，默认可重复读，逻辑/流复制</td>
      </tr>
      <tr>
          <td>SQL标准兼容与扩展</td>
          <td>支持有限，扩展围绕存储引擎</td>
          <td>几乎完整支持SQL标准，支持丰富扩展</td>
      </tr>
      <tr>
          <td>性能优化</td>
          <td>读写分离，高并发读优势</td>
          <td>并行查询，分区表，分布式支持日益完善</td>
      </tr>
  </tbody>
</table>
<h3 id="postgresql使用度超mysql原因">PostgreSQL使用度超MySQL原因</h3>
<ul>
<li>业务需求提升，复杂事务、分析需求增多</li>
<li>社区活跃，插件和扩展丰富</li>
<li>云服务快速支持，官方生态发展强劲</li>
<li>大型企业和专业领域采用增多</li>
</ul>
<h3 id="未来前景">未来前景</h3>
<table>
  <thead>
      <tr>
          <th>数据库</th>
          <th>优势场景</th>
          <th>未来发展方向</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>MySQL</td>
          <td>简单高并发读写，Web应用</td>
          <td>高可用分布式，云原生集成</td>
      </tr>
      <tr>
          <td>PostgreSQL</td>
          <td>复杂事务，BI报表，地理空间</td>
          <td>原生分布式，多模扩展，SQL标准领先</td>
      </tr>
  </tbody>
</table>
<h2 id="二python数据库操作用例">二、Python数据库操作用例</h2>
<h3 id="原生驱动">原生驱动</h3>
<h4 id="mysql-pymysql">MySQL (PyMySQL)</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">import</span> pymysql
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>conn <span style="color:#ff79c6">=</span> pymysql<span style="color:#ff79c6">.</span>connect(host<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;localhost&#39;</span>, user<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;user&#39;</span>, password<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;password&#39;</span>, database<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;testdb&#39;</span>)
</span></span><span style="display:flex;"><span>cursor <span style="color:#ff79c6">=</span> conn<span style="color:#ff79c6">.</span>cursor()
</span></span><span style="display:flex;"><span>cursor<span style="color:#ff79c6">.</span>execute(<span style="color:#f1fa8c">&#34;CREATE TABLE IF NOT EXISTS users (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100), email VARCHAR(100) UNIQUE)&#34;</span>)
</span></span><span style="display:flex;"><span>cursor<span style="color:#ff79c6">.</span>execute(<span style="color:#f1fa8c">&#34;INSERT INTO users (name, email) VALUES (</span><span style="color:#f1fa8c">%s</span><span style="color:#f1fa8c">, </span><span style="color:#f1fa8c">%s</span><span style="color:#f1fa8c">)&#34;</span>, (<span style="color:#f1fa8c">&#39;Alice&#39;</span>, <span style="color:#f1fa8c">&#39;alice@example.com&#39;</span>))
</span></span><span style="display:flex;"><span>conn<span style="color:#ff79c6">.</span>commit()
</span></span><span style="display:flex;"><span>cursor<span style="color:#ff79c6">.</span>execute(<span style="color:#f1fa8c">&#34;SELECT * FROM users&#34;</span>)
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">print</span>(cursor<span style="color:#ff79c6">.</span>fetchall())
</span></span><span style="display:flex;"><span>cursor<span style="color:#ff79c6">.</span>close()
</span></span><span style="display:flex;"><span>conn<span style="color:#ff79c6">.</span>close()
</span></span></code></pre></div><h4 id="postgresql-psycopg2">PostgreSQL (psycopg2)</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">import</span> psycopg2
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>conn <span style="color:#ff79c6">=</span> psycopg2<span style="color:#ff79c6">.</span>connect(host<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;localhost&#39;</span>, user<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;user&#39;</span>, password<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;password&#39;</span>, dbname<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;testdb&#39;</span>)
</span></span><span style="display:flex;"><span>cursor <span style="color:#ff79c6">=</span> conn<span style="color:#ff79c6">.</span>cursor()
</span></span><span style="display:flex;"><span>cursor<span style="color:#ff79c6">.</span>execute(<span style="color:#f1fa8c">&#34;CREATE TABLE IF NOT EXISTS users (id SERIAL PRIMARY KEY, name TEXT, email TEXT UNIQUE)&#34;</span>)
</span></span><span style="display:flex;"><span>cursor<span style="color:#ff79c6">.</span>execute(<span style="color:#f1fa8c">&#34;INSERT INTO users (name, email) VALUES (</span><span style="color:#f1fa8c">%s</span><span style="color:#f1fa8c">, </span><span style="color:#f1fa8c">%s</span><span style="color:#f1fa8c">)&#34;</span>, (<span style="color:#f1fa8c">&#39;Bob&#39;</span>, <span style="color:#f1fa8c">&#39;bob@example.com&#39;</span>))
</span></span><span style="display:flex;"><span>conn<span style="color:#ff79c6">.</span>commit()
</span></span><span style="display:flex;"><span>cursor<span style="color:#ff79c6">.</span>execute(<span style="color:#f1fa8c">&#34;SELECT * FROM users&#34;</span>)
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">print</span>(cursor<span style="color:#ff79c6">.</span>fetchall())
</span></span><span style="display:flex;"><span>cursor<span style="color:#ff79c6">.</span>close()
</span></span><span style="display:flex;"><span>conn<span style="color:#ff79c6">.</span>close()
</span></span></code></pre></div><h3 id="orm-示例sqlalchemy">ORM 示例：SQLAlchemy</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">from</span> sqlalchemy <span style="color:#ff79c6">import</span> create_engine, Column, Integer, String
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">from</span> sqlalchemy.ext.declarative <span style="color:#ff79c6">import</span> declarative_base
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">from</span> sqlalchemy.orm <span style="color:#ff79c6">import</span> sessionmaker
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Base <span style="color:#ff79c6">=</span> declarative_base()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">class</span> <span style="color:#50fa7b">User</span>(Base):
</span></span><span style="display:flex;"><span>    __tablename__ <span style="color:#ff79c6">=</span> <span style="color:#f1fa8c">&#39;users&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">id</span> <span style="color:#ff79c6">=</span> Column(Integer, primary_key<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>)
</span></span><span style="display:flex;"><span>    name <span style="color:#ff79c6">=</span> Column(String(<span style="color:#bd93f9">100</span>))
</span></span><span style="display:flex;"><span>    email <span style="color:#ff79c6">=</span> Column(String(<span style="color:#bd93f9">100</span>), unique<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>DATABASE_URL <span style="color:#ff79c6">=</span> <span style="color:#f1fa8c">&#39;postgresql+psycopg2://user:password@localhost:5432/testdb&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>engine <span style="color:#ff79c6">=</span> create_engine(DATABASE_URL)
</span></span><span style="display:flex;"><span>Session <span style="color:#ff79c6">=</span> sessionmaker(bind<span style="color:#ff79c6">=</span>engine)
</span></span><span style="display:flex;"><span>session <span style="color:#ff79c6">=</span> Session()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Base<span style="color:#ff79c6">.</span>metadata<span style="color:#ff79c6">.</span>create_all(engine)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>new_user <span style="color:#ff79c6">=</span> User(name<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;Carol&#39;</span>, email<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;carol@example.com&#39;</span>)
</span></span><span style="display:flex;"><span>session<span style="color:#ff79c6">.</span>add(new_user)
</span></span><span style="display:flex;"><span>session<span style="color:#ff79c6">.</span>commit()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>users <span style="color:#ff79c6">=</span> session<span style="color:#ff79c6">.</span>query(User)<span style="color:#ff79c6">.</span>all()
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> u <span style="color:#ff79c6">in</span> users:
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(u<span style="color:#ff79c6">.</span>id, u<span style="color:#ff79c6">.</span>name, u<span style="color:#ff79c6">.</span>email)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>session<span style="color:#ff79c6">.</span>close()
</span></span></code></pre></div><h2 id="三数据库压测方案">三、数据库压测方案</h2>
<h3 id="压测总体流程">压测总体流程</h3>
<ol>
<li>环境准备：部署数据库和测试客户端</li>
<li>场景设计：包含OLTP、高并发读写、复杂查询等</li>
<li>基线测试：默认配置性能测量</li>
<li>参数调优：调整配置重复测试</li>
<li>结果分析：整理吞吐、延迟和资源利用数据</li>
<li>自动化脚本：支持快速复用</li>
</ol>
<h3 id="主要压测工具与命令示例">主要压测工具与命令示例</h3>
<h4 id="sysbenchmysql">Sysbench（MySQL）</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>sysbench oltp_read_write \
</span></span><span style="display:flex;"><span>  --db-driver=mysql \
</span></span><span style="display:flex;"><span>  --mysql-host=DB_HOST \
</span></span><span style="display:flex;"><span>  --mysql-user=testuser \
</span></span><span style="display:flex;"><span>  --mysql-password=secret \
</span></span><span style="display:flex;"><span>  --mysql-db=testdb \
</span></span><span style="display:flex;"><span>  --tables=10 \
</span></span><span style="display:flex;"><span>  --table-size=1000000 \
</span></span><span style="display:flex;"><span>  prepare
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>sysbench oltp_read_write --threads=100 --time=300 run
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>sysbench oltp_read_write cleanup
</span></span></code></pre></div><h4 id="pgbenchpostgresql">pgbench（PostgreSQL）</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>pgbench -h DB_HOST -U testuser -d testdb -i -s 1000
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>pgbench -h DB_HOST -U testuser -d testdb -c 100 -T 300 -j 4
</span></span></code></pre></div><h3 id="自定义python异步压测示例">自定义Python异步压测示例</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">import</span> asyncio
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> aiomysql
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>DB_CONFIG <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">dict</span>(host<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;DB_HOST&#39;</span>, user<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;testuser&#39;</span>, password<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;secret&#39;</span>, db<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#39;testdb&#39;</span>, minsize<span style="color:#ff79c6">=</span><span style="color:#bd93f9">10</span>, maxsize<span style="color:#ff79c6">=</span><span style="color:#bd93f9">100</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">async</span> <span style="color:#ff79c6">def</span> <span style="color:#50fa7b">task</span>(pool):
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">async</span> <span style="color:#ff79c6">with</span> pool<span style="color:#ff79c6">.</span>acquire() <span style="color:#ff79c6">as</span> conn:
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">async</span> <span style="color:#ff79c6">with</span> conn<span style="color:#ff79c6">.</span>cursor() <span style="color:#ff79c6">as</span> cur:
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">await</span> cur<span style="color:#ff79c6">.</span>execute(<span style="color:#f1fa8c">&#34;SELECT COUNT(*) FROM orders WHERE status=&#39;pending&#39;&#34;</span>)
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">await</span> cur<span style="color:#ff79c6">.</span>fetchone()
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">await</span> cur<span style="color:#ff79c6">.</span>execute(<span style="color:#f1fa8c">&#34;UPDATE users SET last_login=NOW() WHERE id=</span><span style="color:#f1fa8c">%s</span><span style="color:#f1fa8c">&#34;</span>, (<span style="color:#bd93f9">1</span>,))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">async</span> <span style="color:#ff79c6">def</span> <span style="color:#50fa7b">run_load</span>(concurrency, duration):
</span></span><span style="display:flex;"><span>    pool <span style="color:#ff79c6">=</span> <span style="color:#ff79c6">await</span> aiomysql<span style="color:#ff79c6">.</span>create_pool(<span style="color:#ff79c6">**</span>DB_CONFIG)
</span></span><span style="display:flex;"><span>    end_time <span style="color:#ff79c6">=</span> asyncio<span style="color:#ff79c6">.</span>get_event_loop()<span style="color:#ff79c6">.</span>time() <span style="color:#ff79c6">+</span> duration
</span></span><span style="display:flex;"><span>    sem <span style="color:#ff79c6">=</span> asyncio<span style="color:#ff79c6">.</span>Semaphore(concurrency)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">async</span> <span style="color:#ff79c6">def</span> <span style="color:#50fa7b">worker</span>():
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">async</span> <span style="color:#ff79c6">with</span> sem:
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">while</span> asyncio<span style="color:#ff79c6">.</span>get_event_loop()<span style="color:#ff79c6">.</span>time() <span style="color:#ff79c6">&lt;</span> end_time:
</span></span><span style="display:flex;"><span>                <span style="color:#ff79c6">await</span> task(pool)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">await</span> asyncio<span style="color:#ff79c6">.</span>gather(<span style="color:#ff79c6">*</span>[worker() <span style="color:#ff79c6">for</span> _ <span style="color:#ff79c6">in</span> <span style="color:#8be9fd;font-style:italic">range</span>(concurrency)])
</span></span><span style="display:flex;"><span>    pool<span style="color:#ff79c6">.</span>close()
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">await</span> pool<span style="color:#ff79c6">.</span>wait_closed()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>asyncio<span style="color:#ff79c6">.</span>run(run_load(concurrency<span style="color:#ff79c6">=</span><span style="color:#bd93f9">50</span>, duration<span style="color:#ff79c6">=</span><span style="color:#bd93f9">300</span>))
</span></span></code></pre></div><h2 id="四一体化压测脚本模板">四、一体化压测脚本模板</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-php" data-lang="php"><span style="display:flex;"><span><span style="color:#6272a4">#!/bin/bash
</span></span></span><span style="display:flex;"><span>DB_TYPE<span style="color:#ff79c6">=</span>${<span style="color:#bd93f9">1</span><span style="color:#ff79c6">:-</span>mysql}
</span></span><span style="display:flex;"><span>DB_HOST<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;127.0.0.1&#34;</span>
</span></span><span style="display:flex;"><span>DB_PORT_MYSQL<span style="color:#ff79c6">=</span><span style="color:#bd93f9">3306</span>
</span></span><span style="display:flex;"><span>DB_PORT_PG<span style="color:#ff79c6">=</span><span style="color:#bd93f9">5432</span>
</span></span><span style="display:flex;"><span>DB_USER<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;testuser&#34;</span>
</span></span><span style="display:flex;"><span>DB_PASS<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;secret&#34;</span>
</span></span><span style="display:flex;"><span>DB_NAME<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;testdb&#34;</span>
</span></span><span style="display:flex;"><span>CONCURRENCY<span style="color:#ff79c6">=</span>(<span style="color:#bd93f9">10</span> <span style="color:#bd93f9">50</span> <span style="color:#bd93f9">100</span>)
</span></span><span style="display:flex;"><span>DURATION<span style="color:#ff79c6">=</span><span style="color:#bd93f9">120</span>
</span></span><span style="display:flex;"><span>SCALE<span style="color:#ff79c6">=</span><span style="color:#bd93f9">100</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">function</span> <span style="color:#50fa7b">bench_mysql</span>() {
</span></span><span style="display:flex;"><span>    sysbench oltp_read_write \
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">--</span>threads<span style="color:#ff79c6">=</span>$<span style="color:#bd93f9">1</span> \
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">--</span>time<span style="color:#ff79c6">=</span><span style="color:#8be9fd;font-style:italic">$DURATION</span> \
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">--</span>db<span style="color:#ff79c6">-</span>driver<span style="color:#ff79c6">=</span>mysql \
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">--</span>mysql<span style="color:#ff79c6">-</span>host<span style="color:#ff79c6">=</span><span style="color:#8be9fd;font-style:italic">$DB_HOST</span> \
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">--</span>mysql<span style="color:#ff79c6">-</span>port<span style="color:#ff79c6">=</span><span style="color:#8be9fd;font-style:italic">$DB_PORT_MYSQL</span> \
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">--</span>mysql<span style="color:#ff79c6">-</span>user<span style="color:#ff79c6">=</span><span style="color:#8be9fd;font-style:italic">$DB_USER</span> \
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">--</span>mysql<span style="color:#ff79c6">-</span>password<span style="color:#ff79c6">=</span><span style="color:#8be9fd;font-style:italic">$DB_PASS</span> \
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">--</span>mysql<span style="color:#ff79c6">-</span>db<span style="color:#ff79c6">=</span><span style="color:#8be9fd;font-style:italic">$DB_NAME</span> run <span style="color:#ff79c6">|</span> tee mysql_${<span style="color:#bd93f9">1</span>}c<span style="color:#ff79c6">.</span>log
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">function</span> <span style="color:#50fa7b">bench_pgsql</span>() {
</span></span><span style="display:flex;"><span>    pgbench <span style="color:#ff79c6">-</span>h <span style="color:#8be9fd;font-style:italic">$DB_HOST</span> <span style="color:#ff79c6">-</span>p <span style="color:#8be9fd;font-style:italic">$DB_PORT_PG</span> <span style="color:#ff79c6">-</span>U <span style="color:#8be9fd;font-style:italic">$DB_USER</span> <span style="color:#ff79c6">-</span>d <span style="color:#8be9fd;font-style:italic">$DB_NAME</span> <span style="color:#ff79c6">-</span>c $<span style="color:#bd93f9">1</span> <span style="color:#ff79c6">-</span>T <span style="color:#8be9fd;font-style:italic">$DURATION</span> <span style="color:#ff79c6">-</span>j $(nproc) <span style="color:#ff79c6">|</span> tee pg_${<span style="color:#bd93f9">1</span>}c<span style="color:#ff79c6">.</span>log
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">if</span> [[ <span style="color:#8be9fd;font-style:italic">$DB_TYPE</span> <span style="color:#ff79c6">==</span> <span style="color:#f1fa8c">&#34;pgsql&#34;</span> ]]; then
</span></span><span style="display:flex;"><span>    pgbench <span style="color:#ff79c6">-</span>h <span style="color:#8be9fd;font-style:italic">$DB_HOST</span> <span style="color:#ff79c6">-</span>p <span style="color:#8be9fd;font-style:italic">$DB_PORT_PG</span> <span style="color:#ff79c6">-</span>U <span style="color:#8be9fd;font-style:italic">$DB_USER</span> <span style="color:#ff79c6">-</span>d <span style="color:#8be9fd;font-style:italic">$DB_NAME</span> <span style="color:#ff79c6">-</span>i <span style="color:#ff79c6">-</span>s <span style="color:#8be9fd;font-style:italic">$SCALE</span>
</span></span><span style="display:flex;"><span>fi
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> c in <span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">${</span><span style="color:#8be9fd;font-style:italic">CONCURRENCY[@]</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#34;</span>; <span style="color:#ff79c6">do</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> [[ <span style="color:#8be9fd;font-style:italic">$DB_TYPE</span> <span style="color:#ff79c6">==</span> <span style="color:#f1fa8c">&#34;mysql&#34;</span> ]]; then
</span></span><span style="display:flex;"><span>        bench_mysql <span style="color:#8be9fd;font-style:italic">$c</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">else</span>
</span></span><span style="display:flex;"><span>        bench_pgsql <span style="color:#8be9fd;font-style:italic">$c</span>
</span></span><span style="display:flex;"><span>    fi
</span></span><span style="display:flex;"><span>done
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">echo</span> <span style="color:#f1fa8c">&#34;压测完成，查看 *_c.log 文件&#34;</span>
</span></span></code></pre></div><h2 id="五压测结果分析与扩展建议">五、压测结果分析与扩展建议</h2>
<ul>
<li>提取日志中的TPS、延迟等数据，生成CSV</li>
<li>使用图表工具（Excel，Grafana，Matplotlib）绘制性能曲线</li>
<li>根据业务需求调整读写比、复杂查询和分布式架构测试</li>
<li>可集成监控系统，实现实时资源指标采样</li>
</ul>
<p>以上内容系统梳理了MySQL与PostgreSQL的技术对比，Python数据库操作示例，以及一整套可执行的数据库压测方案与脚本，便于用户快速搭建测试环境，评估性能，指导优化与选型。</p>
]]></content:encoded></item><item><title>Python开发小技巧分享</title><link>https://blog.heyaohua.com/posts/2024/01/python-tips/</link><pubDate>Mon, 15 Jan 2024 14:30:00 +0800</pubDate><guid>https://blog.heyaohua.com/posts/2024/01/python-tips/</guid><description>日常开发中，掌握一些高频技巧能够明显提升代码质量与效率。本文整理了五个常用的小窍门，并配以示例代码，便于在项目中直接应用。</description><content:encoded><![CDATA[<p>日常开发中，掌握一些高频技巧能够明显提升代码质量与效率。本文整理了五个常用的小窍门，并配以示例代码，便于在项目中直接应用。</p>
<h2 id="1-善用列表推导式">1. 善用列表推导式</h2>
<p>列表推导式可以将循环与条件判断浓缩到一行，既简洁又易读：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 传统写法
</span></span><span style="display:flex;"><span>squares = []
</span></span><span style="display:flex;"><span>for x in range(10):
</span></span><span style="display:flex;"><span>    squares.append(x**2)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 列表推导式
</span></span><span style="display:flex;"><span>squares = [x**2 for x in range(10)]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 搭配条件过滤
</span></span><span style="display:flex;"><span>even_squares = [x**2 for x in range(10) if x % 2 == 0]
</span></span></code></pre></div><h2 id="2-使用-dictget-提升容错性">2. 使用 <code>dict.get</code> 提升容错性</h2>
<p>通过 <code>dict.get</code> 读取字典时，可定义默认值，避免 KeyError 并简化分支逻辑：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># 可能抛出 KeyError
</span></span><span style="display:flex;"><span>user_name = user_dict[&#39;name&#39;]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span># 更稳健的写法
</span></span><span style="display:flex;"><span>user_name = user_dict.get(&#39;name&#39;, &#39;Unknown&#39;)
</span></span></code></pre></div><h2 id="3-搭配-enumerate-获取索引">3. 搭配 <code>enumerate</code> 获取索引</h2>
<p><code>enumerate</code> 能在遍历序列时同时获得索引和值，避免手动维护计数器：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>fruits <span style="color:#ff79c6">=</span> [<span style="color:#f1fa8c">&#39;apple&#39;</span>, <span style="color:#f1fa8c">&#39;banana&#39;</span>, <span style="color:#f1fa8c">&#39;orange&#39;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> index, fruit <span style="color:#ff79c6">in</span> <span style="color:#8be9fd;font-style:italic">enumerate</span>(fruits, start<span style="color:#ff79c6">=</span><span style="color:#bd93f9">1</span>):
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">{</span>index<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">. </span><span style="color:#f1fa8c">{</span>fruit<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#34;</span>)
</span></span></code></pre></div><h2 id="4-用-zip-打包多组数据">4. 用 <code>zip</code> 打包多组数据</h2>
<p>当需要并行遍历多个可迭代对象时，<code>zip</code> 能有效避免索引操作：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>names <span style="color:#ff79c6">=</span> [<span style="color:#f1fa8c">&#39;Alice&#39;</span>, <span style="color:#f1fa8c">&#39;Bob&#39;</span>, <span style="color:#f1fa8c">&#39;Charlie&#39;</span>]
</span></span><span style="display:flex;"><span>ages <span style="color:#ff79c6">=</span> [<span style="color:#bd93f9">25</span>, <span style="color:#bd93f9">30</span>, <span style="color:#bd93f9">35</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> name, age <span style="color:#ff79c6">in</span> <span style="color:#8be9fd;font-style:italic">zip</span>(names, ages):
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">{</span>name<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c"> 的年龄是 </span><span style="color:#f1fa8c">{</span>age<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c"> 岁&#34;</span>)
</span></span></code></pre></div><h2 id="5-借助-f-string-优雅格式化">5. 借助 f-string 优雅格式化</h2>
<p>Python 3.6 及以上版本推荐使用 f-string 进行字符串拼接，可读性高、性能更优：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>name = &#34;World&#34;
</span></span><span style="display:flex;"><span>age = 25
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>message = f&#34;Hello, {name}! You are {age} years old.&#34;
</span></span></code></pre></div><hr>
<p>这些技巧虽然简单，却能在编写脚本、数据处理或后端服务时显著提升编码体验。欢迎在评论中分享你常用的 Python 诀窍。</p>
]]></content:encoded></item></channel></rss>