首页 \ 问答 \ 延迟重定向后抓取HTML源(Grabbing HTML source after delayed redirect)

延迟重定向后抓取HTML源(Grabbing HTML source after delayed redirect)

我试图用Python获取网站的HTML源代码。 但是,当您访问网站上的任何页面时,都会有一个延迟重定向,就像加载屏幕一样。 每当我做一个requests.get(url)我最终抓住那个加载屏幕,而不是后面的东西。 我正在使用请求库。 有没有办法让请求等到重定向后? 重定向大约3秒。

这是我使用的代码:

import Requests
page = requests.get(url)
print(page.text)

I am trying to grab the HTML source of a website with Python. However, when you visit any page on the site there is a delayed redirect almost like a loading screen. Whenever I do a requests.get(url) I end up grabbing that loading screen and not what comes after it. I am using the Requests library. Is there a way to make the request wait till after the redirect? The redirect is about 3 seconds.

Here is the code I use:

import Requests
page = requests.get(url)
print(page.text)

原文:https://stackoverflow.com/questions/37450953
更新时间:2019-07-10 22:36

最满意答案

重定向可能由您的浏览器完成,而不是由服务器完成。 有两种常见的方式: “元刷新”或Javascript。

对于前者,您可以使用类似BeautifulSoup的方式解析HTML响应,检查它以获取元刷新标记,提取目标网址,然后使用第二个请求检索它。

如果使用Javascript完成重定向更加困难,因为有很多方法可以完成重定向。

无论哪种方式都有点麻烦,所以你最好的选择是使用像这样的东西,它基本上可以让你编写浏览器脚本,这样你就可以让浏览器为你做元刷新/ JavaScript重定向。


The redirect is probably done by your browser, not by the server. There are 2 common ways: "meta refresh" or Javascript.

For the former you can parse the HTML response using something like BeautifulSoup, examine it for a meta refresh tag, extract the destination URL, then retrieve it with a second request.

It's more difficult if the redirect is done with Javascript as there are many ways in which the redirect could be done.

Either way is a bit messy, so your best bet would be to use something like selenium which basically lets you script your browser so that you can let a browser do the meta refresh/javascript redirect for you.

2016-05-26

相关问答

更多

Java Calendar.DAY_OF_YEAR有时会被一个人关闭(Java Calendar.DAY_OF_YEAR sometimes off by one)

边缘条件导致问题。 2000年有什么特别之处? 这是闰年。 Calendar cal = new GregorianCalendar(); cal.set(2000, 11, 31); System.out.println(cal.getTime()); System.out.println(cal.get(Calendar.DAY_OF_YEAR)); 输出: Sun Dec 31 13:43:28 EST 2000 366 2月29日之后的一切都被闰年特别抵

匹配所有内容但不引用字符串(Match everything but not quoted strings)

编辑:重写以涵盖更多边缘案例。 这可以做到,但有点复杂。 result = subject.match(/(?:(?=(?:(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*'(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*')*(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*$)(?=(?:(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*"(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*")*(?:

如何打破破折号?(How to word-break on a dash?)

用以下替换你的连字符: &shy; 它被称为“软”连字符。 div { width: 150px; } <div> 12333&shy;2333&shy;233&shy;23339392&shy;332332323 </div> Replace your hyphens with this: &shy; It's called a "soft" hyphen. div { width: 150px; } <div> 12333&shy;2333&shy;233&

Arch Linux中没有OpenGL 3标头(No OpenGL 3 headers in Arch Linux)

你不需要GL3.h,GL3.h只是摆脱了所有弃用的功能。 所以只需要包含gl.h而不是gl3.h 下载和手动copyieng到/usr/include/GL/不是一个真正的解决方案,它的工作原理是肯定的,但如果添加到mesa包中的gl3.h怎么gl3.h ? 你是包经理会发现冲突。 最好将gl3.h添加到项目或主目录中,并将其添加到$ PATh或使用-I标志。 You don't need a GL3.h, the GL3.h just got rid of all deprecated fea

在Oracle中使约束可以推迟(Make Constraint Deferrable in Oracle)

它必须被删除并重新创建。 It has to be dropped and recreated.

UTF16十六进制到文本(UTF16 hex to text)

没错,Postgres不支持UTF-16。 但是,它确实支持Unicode转义序列 : SELECT U&'\0633\0644\0627\0645' 但请记住, Unicode代码点和UTF-16代码单元仅在Basic Multilingual Plane中是等效的。 换句话说,如果您有任何跨越多个16位代码单元的UTF-16字符,则需要自己将它们转换为相应的代码点。 That's right, Postgres doesn't support UTF-16. However, it doe

澄清用户识别的AWS教程 - 变量发生变化(Clarifying AWS tutorial for User identification - variables change)

您应该将该代码放在实例化凭证提供程序的位置(理想情况下应该是单例)。 就区域而言,您可以尝试Regions.US_EAST_1而不是Regions.USEAST1吗? 如果您很好奇,Cognito 示例应用程序会有一个示例实现。 You should put that code wherever you instantiate the credentials provider (which should ideally be a singleton). As far as region, can

如何使用javascript注入一个大的html代码点击按钮?(How can I inject a large html code onclick of a button using javascript?)

HTML <div id="form-container"> </div> <button id ="get-form" onclick="getForm()">Get Form</button> 使用Javascript <script> var form = '<form> <h1>form title</h1> </form>'; function() { document

相关文章

更多

最新问答

更多
  • 在svg - gstatic urls中嵌入'Open Sans'字体(Embedding 'Open Sans' font in svg - gstatic urls)
  • 停止python线程等待popen?(stopping a python thread waiting for popen?)
  • Julia:如何从文本文件导入图形(csv具有不等数量的'列')?(Julia: How to import a graph from text file (csv with unequal number of 'columns')?)
  • 在另一个插件中使用qgis核心插件(Useage of qgis core plugin in another plugin)
  • jQuery .get缓存工作得太好了?(jQuery .get caching working too well?)
  • Gameplay3D gameplay :: TimeListener(Gameplay3D gameplay::TimeListener)
  • 获取ItemDataBound中的行数(Get row count in ItemDataBound)
  • NHibernate SchemaExport(config).Create(false,true)不会使用SQL Server 2014 LocalDB创建数据库架构(NHibernate SchemaExport(config).Create(false, true) does not create DB Schema with SQL Server 2014 LocalDB)
  • 列出迭代器失效(List iterator disfunctionality)
  • 列出多个变量的理解(List comprehension with multiple variables)
  • 如何在asp.net MVC5中显示锁定结束时间(How to Display locking end time in asp.net MVC5)
  • 在android中的ImageView上显示TextView(Display TextView over ImageView in android)
  • 64位sharemem.pas / borlndmm.dll XE2(64-bit sharemem.pas/borlndmm.dll XE2)
  • 为什么分离和追加比直接更改html更有效?(Why is detaching and appending more efficient than changing html directly?)
  • Eclipse将“排序成员”设置保存到repo(Eclipse Saving “Sort members” settings to repo)
  • 当我查看最近标记的instagram API时,为什么帖子似乎会改变(why do the posts seem to change when i look at the recent tagged instagram API)
  • 使用php邮件功能发送电子邮件 - 变慢[关闭](Sending email using php mail function - Getting slow [closed])
  • Web服务器可以写入登录用户的屏幕吗?(Can a web server write to a logged-in user's screen?)
  • CFNetwork匿名密码?(CFNetwork Anonymous Ciphers?)
  • 更快的算法来计算一个范围内的特定整数可以分割多少个数字(Faster algorithm to count how many numbers are divisible by a specific integer in a range)
  • 用于散点图的Matplotlib FuncAnimation(Matplotlib FuncAnimation for scatter plot)
  • Gradle:Manifest merge更改AndroidManifest文件中“theme”参数的值:(Gradle: Manifest merge changes the value for the “theme” parameters in the AndroidManifest file:)
  • 如何在属性中渲染cms / static块(How do I render cms/static blocks in an attribute)
  • Javascript显示错误和页面重新加载问题(Javascript display error and page reload problems)
  • android官方固件签名(android official firmware signature)
  • 在这种情况下,有比eval()更好的方法吗?(Is there a better way than eval() in this scenario?)
  • primefaces commandbutton actionlistener未调用(primefaces commandbutton actionlistener not called)
  • C将指针传递给堆栈上的数组(C passing a pointer to an array on the stack)
  • 简单的产品内部可配置产品(Simple product inside configurable product)
  • nhibernate HbmMapping到Xml(nhibernate HbmMapping to Xml)