Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_datasets() proxy issue #4

Open
ab2dridi opened this issue Aug 29, 2019 · 3 comments
Open

get_datasets() proxy issue #4

ab2dridi opened this issue Aug 29, 2019 · 3 comments

Comments

@ab2dridi
Copy link

hello,
Like the same issue here (expersso/OECD#11) can you modify the package to support corporate proxy using httr:
the solution is to modify the get_datasets() function like below:

get_datasets <- function() {
url <- complete_url("/statistics/full_data_sets.htm")
page <- xml2::read_html(httr::GET(url))
nodes <- rvest::html_nodes(page, xpath = "//a[contains(@href, 'zip')]")
dplyr::tibble(name = rvest::html_text(nodes),
url = complete_url(rvest::html_attr(nodes, "href")))
}

@expersso
Copy link
Owner

I don't think adding httr as a dependency is the right course of action here. The package should work fine with a corporate proxy as long as you set your https_proxy environmental variable.

@ab2dridi
Copy link
Author

Thank you for your response, it's not working for me using http_proxy and https_proxy environmental variable, i get 407 error, we are using NTLM auth,

datasets <- get_datasets()
Error in open.connection(x, "rb") :
Received HTTP code 407 from proxy after CONNECT

the only solution that is working for me is to usee httr::GET(url)

thank you :)

@dbradnum
Copy link

dbradnum commented Feb 18, 2021

Hi,

Apologies for returning to a pretty old issue - but I've just discovered this after a colleague ran into the same problem. As it happens, I'm also the author of the linked issue above in the OECD package.

After testing, I also agree with @ab2dridi - get_datasets() doesn't work even when the proxy server address is configured with an environment variable: that isn't always enough to authenticate with the proxy. So I think the change he suggests, to use httr::GET(), would be very helpful. (Would you be open to a PR?)

(Digging into details a bit - the key thing seems to be that xml2::read_html(url) uses the curl package under the hood, and I don't know of any way to configure that with the proxy server's authentication mode (ie NTLM in our case). It doesn't appear that libcurl has an environment variable to set this, sadly - see here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants