
前言
爬虫的路上总有我们这些小白解不了的密, 反不了的爬。这时候就需要自动化工具了, 但是一般情况下, 直接使用自动化工具都会被目标网站监测到, 因为有几十个特征会被暴露的特征。所以这篇文章写一下, 常见的浏览器如何执行js, 和隐藏浏览器特征。文章不会涉及到配安装和配置环境步骤。自行查教程
selemium
最早接触的自动化模块
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| # -*- coding: utf-8 -*- # @Author: Mehaei # @Date: 2023-12-07 19:58:47 # @Last Modified by: Mehaei # @Last Modified time: 2023-12-07 21:03:31 import time from selenium import webdriver def start(): driver = webdriver.Chrome() with open('stealth.min.js', 'r') as f: js = f.read() driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {'source': js}) driver.get("https://bot.sannysoft.com/") time.sleep(60) if __name__ == '__main__': start()
|
pyppeteer
实测还是会有少部分特征会无法隐藏, 不过还有其它办法
pyppeteer_stealth隐藏pyppeteer特征天花板神器
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| # -*- coding: utf-8 -*- # @Author: Mehaei # @Date: 2023-12-07 19:58:47 # @Last Modified by: Mehaei # @Last Modified time: 2023-12-07 21:22:31 import asyncio from pyppeteer import launch async def start(): browser = await launch(headless=False) page = await browser.newPage() with open('stealth.min.js', 'r') as f: js = f.read() await page.evaluateOnNewDocument(js) await page.goto("https://bot.sannysoft.com/") await asyncio.sleep(60) if __name__ == '__main__': asyncio.get_event_loop().run_until_complete(start())
|

playwright
新一代爬虫工具
可以录制手动的操作, 自动生成代码。自动化神器
官网
https://playwright.dev/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| # -*- coding: utf-8 -*- # @Author: Mehaei # @Date: 2023-12-07 19:58:47 # @Last Modified by: Mehaei # @Last Modified time: 2023-12-07 20:52:55 import time from playwright.sync_api import sync_playwright def start(): with sync_playwright() as p: browser = p.chromium.launch(headless=False) context = browser.new_context() context.add_init_script(path='stealth.min.js') page = context.new_page() page.goto("https://bot.sannysoft.com/", timeout=100000) time.sleep(60) if __name__ == '__main__': start()
|
DrissionPage
新的自动化工具, 同时兼容requests便利性和自动化工具的强大行
且会自动隐藏掉一些自动化特征和无需安装驱动, 感兴趣的可以看官网
https://g1879.gitee.io/drissionpagedocs/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| # -*- coding: utf-8 -*- # @Author: Mehaei # @Date: 2023-12-07 19:58:47 # @Last Modified by: Mehaei # @Last Modified time: 2023-12-07 22:02:58 import time from DrissionPage import ChromiumPage def start(): page = ChromiumPage() with open('stealth.min.js', 'r') as f: js = f.read() """ 运行js, 但是运行这个stealth脚本会报错 """ # page.run_js(js) page.get("https://bot.sannysoft.com/") time.sleep(60) if __name__ == '__main__': start()
|
js文件下载地址
python常用的自动化工具就这些,到这里就完了, 这篇其实也没写啥, 就是一点点简单的例子, 记录一下
关注 【 不止于python 】公众号, 后台回复 【 stealth_js 】获取
一直在努力, 记得点个在看哦!