Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

随机等待时长已经加到非常长了,但还是会在相似的页数被封禁 #578

Open
EthanNCai opened this issue Apr 28, 2024 · 3 comments
Labels
failed 程序运行出错

Comments

@EthanNCai
Copy link

为了更好的解决问题,请认真回答下面的问题。等到问题解决,请及时关闭本issue。

  • 问:请您指明哪个版本运行出错(github版/PyPi版/全部)?

答:
github版

  • 问:您使用的是否是最新的程序(是/否)?

答:
是的

  • 问:爬取任意用户都会运行出错吗(是/否)?

答:
否,只尝试了一个用户

  • 问:若只有爬特定微博时才出错,能否提供出错微博的weibo_id或url(非必填)?

答:
weiboid -> 1640337222

  • 问:若您已提供出错微博的weibo_id或url,可忽略此内容,否则能否提供出错账号的user_id及您配置的since_date,方便我们定位出错微博(非必填)?

答:

  • 问:如果方便,请您描述出错详情,最好附上错误提示。

答:
"random_wait_pages": [1, 2],
"random_wait_seconds": [70, 110], 在这个设置下仍然会在第200条微博(第二十页附近)被封

@EthanNCai EthanNCai added the failed 程序运行出错 label Apr 28, 2024
@dataabc
Copy link
Owner

dataabc commented Apr 28, 2024

可能和目标账号有关,某些类型的微博限制比较严。您可以修改spider.py,把range(1, page_num + 1)改成range(20, page_num + 1),这样程序就会从20页开始获取。

@EthanNCai
Copy link
Author

感谢解答,但从20页开始获取仍然会在40页左右被封,也许确实是因为这个账号比较严,现在的解决方法是将参数设置为"random_wait_pages": [1, 2],
"random_wait_seconds": [120, 180] 就可以无限获取了,为了效率只能考虑用多个代理ip同时爬

@xiaoyequ04
Copy link

爬取多个微博账号时都出现同样的情况,无法爬取。
比如以下微博目标账户:2974325495;1682207150

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
failed 程序运行出错
Projects
None yet
Development

No branches or pull requests

3 participants