<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Selenium on heyaohua's Blog</title><link>https://blog.heyaohua.com/tags/selenium/</link><description>Recent content in Selenium on heyaohua's Blog</description><image><title>heyaohua's Blog</title><url>https://blog.heyaohua.com/og-image.png</url><link>https://blog.heyaohua.com/og-image.png</link></image><generator>Hugo</generator><language>zh-cn</language><lastBuildDate>Fri, 26 Sep 2025 14:00:00 +0800</lastBuildDate><atom:link href="https://blog.heyaohua.com/tags/selenium/index.xml" rel="self" type="application/rss+xml"/><item><title>淘宝自动化框架选择方案</title><link>https://blog.heyaohua.com/posts/2025/09/taobao-automation-framework/</link><pubDate>Fri, 26 Sep 2025 14:00:00 +0800</pubDate><guid>https://blog.heyaohua.com/posts/2025/09/taobao-automation-framework/</guid><description>国产框架，中文文档完善</description><content:encoded><![CDATA[<h1 id="淘宝自动化框架选择方案">淘宝自动化框架选择方案</h1>
<h2 id="-推荐方案drissionpage--现有架构">🎯 推荐方案：DrissionPage + 现有架构</h2>
<h3 id="为什么选择-drissionpage">为什么选择 DrissionPage？</h3>
<ol>
<li><strong>专为中国网站设计</strong></li>
<li>针对淘宝、京东等电商网站优化</li>
<li>内置常见反爬虫机制绕过</li>
<li></li>
</ol>
<p>国产框架，中文文档完善</p>
<ol start="5">
<li></li>
</ol>
<p><strong>与现有架构完美融合</strong></p>
<ol start="6">
<li>可以直接使用现有的 requests session</li>
<li>支持与 mitmproxy 代理集成</li>
<li></li>
</ol>
<p>兼容现有的数据处理管道</p>
<ol start="9">
<li></li>
</ol>
<p><strong>性能与易用性并存</strong></p>
<ol start="10">
<li>基于 Chromium 内核，性能优秀</li>
<li>API 设计简洁直观</li>
<li>支持页面模式和 requests 模式切换</li>
</ol>
<h2 id="-框架对比分析">📊 框架对比分析</h2>
<table>
  <thead>
      <tr>
          <th>特性</th>
          <th>DrissionPage</th>
          <th>Playwright</th>
          <th>Selenium</th>
          <th>Requests-HTML</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>性能</strong></td>
          <td>很快</td>
          <td>最快</td>
          <td>中等</td>
          <td>快</td>
      </tr>
      <tr>
          <td><strong>反爬虫能力</strong></td>
          <td>优秀</td>
          <td>优秀</td>
          <td>一般</td>
          <td>较弱</td>
      </tr>
      <tr>
          <td><strong>淘宝适配</strong></td>
          <td>优秀</td>
          <td>好</td>
          <td>一般</td>
          <td>较弱</td>
      </tr>
      <tr>
          <td><strong>学习成本</strong></td>
          <td>低</td>
          <td>中</td>
          <td>中</td>
          <td>低</td>
      </tr>
      <tr>
          <td><strong>中文文档</strong></td>
          <td>优秀</td>
          <td>一般</td>
          <td>好</td>
          <td>一般</td>
      </tr>
      <tr>
          <td><strong>社区支持</strong></td>
          <td>活跃</td>
          <td>活跃</td>
          <td>最大</td>
          <td>较小</td>
      </tr>
  </tbody>
</table>
<h2 id="-技术实施路线">🛠️ 技术实施路线</h2>
<h3 id="阶段一环境准备">阶段一：环境准备</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># 安装 DrissionPage</span>
</span></span><span style="display:flex;"><span>pip install DrissionPage
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 安装备选方案（可选）</span>
</span></span><span style="display:flex;"><span>pip install playwright
</span></span><span style="display:flex;"><span>pip install selenium
</span></span></code></pre></div><h3 id="阶段二基础集成">阶段二：基础集成</h3>
<ol>
<li>创建 <code>TaobaoAutomator</code> 类</li>
<li>集成现有的代理服务器</li>
<li>实现基础的搜索和数据提取功能</li>
</ol>
<h3 id="阶段三高级功能">阶段三：高级功能</h3>
<ol>
<li>反爬虫策略优化</li>
<li>数据清洗和存储</li>
<li>错误处理和重试机制</li>
</ol>
<h3 id="阶段四性能优化">阶段四：性能优化</h3>
<ol>
<li>并发处理</li>
<li>资源管理</li>
<li>监控和日志</li>
</ol>
<h2 id="-备选方案">💡 备选方案</h2>
<h3 id="方案-a纯-playwright如果团队技术能力强">方案 A：纯 Playwright（如果团队技术能力强）</h3>
<ul>
<li>性能最佳</li>
<li>功能最全面</li>
<li>需要较多学习时间</li>
</ul>
<h3 id="方案-bselenium如果需要最大兼容性">方案 B：Selenium（如果需要最大兼容性）</h3>
<ul>
<li>社区资源最丰富</li>
<li>兼容性最好</li>
<li>性能相对较慢</li>
</ul>
<h3 id="方案-c混合方案">方案 C：混合方案</h3>
<ul>
<li>DrissionPage 处理复杂交互</li>
<li>requests 处理简单API调用</li>
<li>mitmproxy 处理数据截取</li>
</ul>
<h2 id="-具体实现示例">🎪 具体实现示例</h2>
<h3 id="drissionpage-基础用法">DrissionPage 基础用法</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">from</span> DrissionPage <span style="color:#ff79c6">import</span> ChromiumPage
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 创建页面对象</span>
</span></span><span style="display:flex;"><span>page <span style="color:#ff79c6">=</span> ChromiumPage()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 访问淘宝</span>
</span></span><span style="display:flex;"><span>page<span style="color:#ff79c6">.</span>get(<span style="color:#f1fa8c">&#39;https://www.taobao.com&#39;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 搜索商品</span>
</span></span><span style="display:flex;"><span>search_box <span style="color:#ff79c6">=</span> page<span style="color:#ff79c6">.</span>ele(<span style="color:#f1fa8c">&#39;#q&#39;</span>)
</span></span><span style="display:flex;"><span>search_box<span style="color:#ff79c6">.</span>input(<span style="color:#f1fa8c">&#39;手机&#39;</span>)
</span></span><span style="display:flex;"><span>search_box<span style="color:#ff79c6">.</span>after()<span style="color:#ff79c6">.</span>click()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># 获取商品信息</span>
</span></span><span style="display:flex;"><span>products <span style="color:#ff79c6">=</span> page<span style="color:#ff79c6">.</span>eles(<span style="color:#f1fa8c">&#39;.item&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">for</span> product <span style="color:#ff79c6">in</span> products:
</span></span><span style="display:flex;"><span>    title <span style="color:#ff79c6">=</span> product<span style="color:#ff79c6">.</span>ele(<span style="color:#f1fa8c">&#39;.title&#39;</span>)<span style="color:#ff79c6">.</span>text
</span></span><span style="display:flex;"><span>    price <span style="color:#ff79c6">=</span> product<span style="color:#ff79c6">.</span>ele(<span style="color:#f1fa8c">&#39;.price&#39;</span>)<span style="color:#ff79c6">.</span>text
</span></span><span style="display:flex;"><span>    <span style="color:#8be9fd;font-style:italic">print</span>(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;</span><span style="color:#f1fa8c">{</span>title<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">: </span><span style="color:#f1fa8c">{</span>price<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#34;</span>)
</span></span></code></pre></div><h3 id="与现有架构集成">与现有架构集成</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">from</span> DrissionPage <span style="color:#ff79c6">import</span> ChromiumPage
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">from</span> crawler.gateway.proxy_server <span style="color:#ff79c6">import</span> ProxyServer
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">class</span> <span style="color:#50fa7b">TaobaoAutomator</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> <span style="color:#50fa7b">__init__</span>(<span style="font-style:italic">self</span>):
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 启动代理服务器</span>
</span></span><span style="display:flex;"><span>        <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>proxy_server <span style="color:#ff79c6">=</span> ProxyServer()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 配置 DrissionPage 使用代理</span>
</span></span><span style="display:flex;"><span>        <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>page <span style="color:#ff79c6">=</span> ChromiumPage()
</span></span><span style="display:flex;"><span>        <span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>page<span style="color:#ff79c6">.</span>set<span style="color:#ff79c6">.</span>proxy(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#39;127.0.0.1:</span><span style="color:#f1fa8c">{</span><span style="font-style:italic">self</span><span style="color:#ff79c6">.</span>proxy_server<span style="color:#ff79c6">.</span>port<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">&#39;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> <span style="color:#50fa7b">search_products</span>(<span style="font-style:italic">self</span>, keyword):
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 实现搜索逻辑</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">pass</span>
</span></span></code></pre></div><h2 id="-技术要点">🔧 技术要点</h2>
<ol>
<li><strong>代理集成</strong>：确保自动化框架使用现有的代理服务器</li>
<li><strong>数据同步</strong>：截取的API数据与页面数据关联</li>
<li><strong>反爬虫</strong>：实现用户行为模拟和请求间隔控制</li>
<li><strong>错误处理</strong>：网络异常、页面变化等情况的处理</li>
</ol>
<h2 id="-预期效果">📈 预期效果</h2>
<ul>
<li><strong>开发效率提升 50%</strong>：相比从零开始</li>
<li><strong>数据质量提升</strong>：结合API和页面数据</li>
<li><strong>稳定性增强</strong>：多重反爬虫策略</li>
<li><strong>维护成本降低</strong>：统一的架构设计</li>
</ul>
]]></content:encoded></item></channel></rss>