首页 \ 问答 \ 延迟重定向后抓取HTML源(Grabbing HTML source after delayed redirect)

延迟重定向后抓取HTML源(Grabbing HTML source after delayed redirect)

我试图用Python获取网站的HTML源代码。 但是,当您访问网站上的任何页面时,都会有一个延迟重定向,就像加载屏幕一样。 每当我做一个requests.get(url)我最终抓住那个加载屏幕,而不是后面的东西。 我正在使用请求库。 有没有办法让请求等到重定向后? 重定向大约3秒。

这是我使用的代码:

import Requests
page = requests.get(url)
print(page.text)

I am trying to grab the HTML source of a website with Python. However, when you visit any page on the site there is a delayed redirect almost like a loading screen. Whenever I do a requests.get(url) I end up grabbing that loading screen and not what comes after it. I am using the Requests library. Is there a way to make the request wait till after the redirect? The redirect is about 3 seconds.

Here is the code I use:

import Requests
page = requests.get(url)
print(page.text)

原文:https://stackoverflow.com/questions/37450953
更新时间:2019-07-10 22:36

最满意答案

重定向可能由您的浏览器完成,而不是由服务器完成。 有两种常见的方式: “元刷新”或Javascript。

对于前者,您可以使用类似BeautifulSoup的方式解析HTML响应,检查它以获取元刷新标记,提取目标网址,然后使用第二个请求检索它。

如果使用Javascript完成重定向更加困难,因为有很多方法可以完成重定向。

无论哪种方式都有点麻烦,所以你最好的选择是使用像这样的东西,它基本上可以让你编写浏览器脚本,这样你就可以让浏览器为你做元刷新/ JavaScript重定向。


The redirect is probably done by your browser, not by the server. There are 2 common ways: "meta refresh" or Javascript.

For the former you can parse the HTML response using something like BeautifulSoup, examine it for a meta refresh tag, extract the destination URL, then retrieve it with a second request.

It's more difficult if the redirect is done with Javascript as there are many ways in which the redirect could be done.

Either way is a bit messy, so your best bet would be to use something like selenium which basically lets you script your browser so that you can let a browser do the meta refresh/javascript redirect for you.

2016-05-26

相关文章

更多

最新问答

更多
  • jsPlumb draggable element javascript函数(jsPlumb draggable element javascript function)
  • MVC4:ViewModel(带有radiobuttonlist)在HttpPost之后为空(MVC4: ViewModel (with radiobuttonlist) is empty after HttpPost)
  • 如何在同一帐户上设置“Dev repo”(在prod和团队之间)(How to set up a “Dev repo” (between the prod and the team) on the same account)
  • 如何在tcl中将eth0配置为发送方udp端口(how to configure eth0 as a sender udp port in tcl)
  • 如何在datarow []中的列中找到最大值?(How to find max value in a column in a datarow[] ?)
  • 如何使用预定义文本替换来自数据库的部分结果(How do I replace part of result coming from Database with predefined text)
  • Selenium Java注入了新的Javascript函数(Selenium Java inject new Javascript function)
  • 使用.on的多个下拉菜单选择文本仅适用于第一个下拉列表(Multiple Dropdowns Menu Selection text using .on works only on first dropdown)
  • 快速将黄土曲线添加到大型数据集图中的方法(Quick way to add loess curve to large data set graph)
  • FilteringSelect in mvc(FilteringSelect in mvc)
  • 在Delphi XE2中开发Mac或iOS应用程序需要哪些硬件/软件?(What hardware/software is necessary to develop Mac or iOS apps in Delphi XE2?)
  • 在原型的构造函数中初始化属性时获取“未定义”(Getting 'undefined' when a property is initialized in the constructor of a prototype)
  • 通过越狱加载的应用程序的Documents文件夹位置(Location of Documents folder for an app loaded via jailbreak)
  • 在OpenGL中使用可编程和固定管道功能(Using both programmable and fixed pipeline functionality in OpenGL)
  • 将任何用户输入重定向到单独的底层程序(redirect any user input to a separate underlying program)
  • 编辑文本不能正常工作android(Edit texts not working properly android)
  • “user_denied”Facebook应用页面上的Facebook用户区域设置(Facebook user locale on “user_denied” facebook app page)
  • 在大图像中找到小的部分透明图像的坐标(find coordinates of small partially-transparent image within a large image)
  • 我如何在cakephp 3.1中获得完整的相对路径?(How i can get full relative path of image in cakephp 3.1?)
  • 如何保存拖动标记的新本地化?(How to save new localization of dragged marker?)
  • MySQL UPDATE vs INSERT和DELETE(MySQL UPDATE vs INSERT and DELETE)
  • 在执行查询之前,在SQLAlchemy模型中将datetime转换为unix时间戳?(Convert datetime to unix timestamp in SQLAlchemy model before executing query?)
  • OpenCL与OpenGL互操作的优势(Advantage of OpenCL interoperability with OpenGL)
  • 如何解析用点和等分隔的数据然后添加到listview(How to parsing data from delimited with dot and equal then add to listview)
  • 带调试输出的X3解析器段错误(BOOST_SPIRIT_X3_DEBUG)(X3 parser segfaults with debug output (BOOST_SPIRIT_X3_DEBUG))
  • 将文件夹名称添加到fgrep结果(Add folder name to fgrep result)
  • 在MySQL中加载一个表是非常慢的(Loading one table in MySQL is ridiculously slow)
  • 如何将JSON放入PHP变量?(How do I put JSON into a PHP Variable?)
  • 如何绕过Microsoft.Speech.Recognition中的不流畅?(How to bypass disfluencies in Microsoft.Speech.Recognition?)
  • 原点的最后一行是什么?(What is the last row of an origin for?)