我想下载来自在线杂志的 PDF。要打开它,必须先登录。然后打开 PDF 并下载。

以下是我的代码。可以登录页面,也可以打开PDF。但是无法下载 PDF,因为我不确定如何模拟点击保存。我使用 FireFox。

import os, time 
from selenium import webdriver 
from bs4 import BeautifulSoup 
 
# Use firefox dowmloader to get file 
fp = webdriver.FirefoxProfile() 
fp.set_preference("browser.download.folderList",2) 
fp.set_preference("browser.download.manager.showWhenStarting",False) 
fp.set_preference("browser.download.dir", 'D:/eBooks/Stocks_andCommodities/2008/Jul/') 
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf") 
fp.set_preference("pdfjs.disabled", "true") 
 
# disable Adobe Acrobat PDF preview plugin 
fp.set_preference("plugin.scan.plid.all", "false") 
fp.set_preference("plugin.scan.Acrobat", "99.0") 
 
browser = webdriver.Firefox(firefox_profile=fp) 
 
# Get the login web page 
web_url = 'http://technical.traders.com/sub/sublogin2.asp' 
browser.get(web_url) 
 
# SImulate the authentication 
user_name = browser.find_element_by_css_selector('#SubID > input[type="text"]') 
user_name.send_keys("thomas2003@test.net") 
password = browser.find_element_by_css_selector('#SubName > input[type="text"]') 
password.send_keys("LastName") 
time.sleep(2) 
submit = browser.find_element_by_css_selector('#SubButton > input[type="submit"]') 
submit.click() 
time.sleep(2) 
 
# Open the PDF for downloading 
url = 'http://technical.traders.com/archive/articlefinal.asp?file=\V26\C07\\131INTR.pdf' 
browser.get(url) 
time.sleep(10) 
 
# How to simulate the Clicking to Save/Download the PDF here? 

请您参考如下方法:

您不应在浏览器中打开该文件。一旦你有了文件 url。获取包含所有 cookie 的请求 session

def get_request_session(driver): 
    import requests 
    session = requests.Session() 
    for cookie in driver.get_cookies(): 
        session.cookies.set(cookie['name'], cookie['value']) 
 
    return session 

一旦你有了 session ,你就可以使用相同的方式下载文件

url = 'http://technical.traders.com/archive/articlefinal.asp?file=\V26\C07\\131INTR.pdf' 
session = get_request_session(driver) 
r = session.get(url, stream=True) 
chunk_size = 2000 
with open('/tmp/mypdf.pdf', 'wb') as file: 
    for chunk in r.iter_content(chunk_size): 
        file.write(chunk) 


评论关闭
IT序号网

微信公众号号:IT虾米 (左侧二维码扫一扫)欢迎添加!