-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gcs_get_object(object, parseObject = TRUE) returns raw object from .csv and will not parse into a data frame #184
Comments
The csv parse function can be configured and since csv is not a real standard it's possible your file (which you say is big) may have some issues. Have you tested it on a smaller file first yet, perhaps also do an upload of a data.frame then download to object to check. Please also report your sessionInfo(). |
Here's my sessionInfo:
I tried
I haven't gone through all of your suggestions yet, just wanted to follow up on my start. Thank you! |
Update: For context, I'm working on a large machine with 128 vCPU and 864 GB of RAM. When I try to do an upload of a very small data.frame:
I get the message
I've also tried creating a custom parse function as suggested, but that doesn't seem to be working either. Any thoughts? |
The parse function needs an input/output argument, check the help for exact syntax. It looks like you haven't access to the bucket you're trying to access anyhow. (403 is usually authentication issues) If you check out the examples on the website I recommend getting those to work first, my guess is it is not a bug etc. |
Thank you! I have had some access issues, so I'll check those out. Also, I did see the note about custom functions, but after the fact, so I'll play around with that. |
I suspect a custom function isn't necessary if it's access issues. Check if your bucket is "fine-grain control" vs "bucket level" IAM and/or the role you have to access is sufficient for the email you are authenticating with. |
Thank you for this--would you mind explaining what effect fine-grain control vs bucket level IAM would have on my access and use of this package? To explain on my end a bit--the original owner/creator of my VM is one of my organization's IT departments (I am an analyst). They have control of the original service account. They created one for me with wide permissions for this purpose, but I'm still getting the same error. I saw that in the documentation you mentioned needing to be an owner to properly authenticate. Would you mind going into more detail about this (both the fine-grain vs bucket-level control, as well as the ownership role [or whatever else might be relevant]? This might help us define exactly what service account and permissions I need access to in order to make this work. Your package really is an excellent solution in terms of limiting persistent storage usage, especially when using large data, so I would really love to get this underway. |
You need Cloud Storage Admin role I think, not Owner. And if on a VM on GCP you can reuse the authentication of that VMs service key if configured, so perhaps that's the issue if they are clashing. When trying the examples putting options(googleAuthR.verbose=2) will give more auth info. If still issues please use that and show the code you are using. Fine-grained buckets you can set a different authentication state on every file uploaded to it. That was the original way buckets were used. But hassle, so bucket level means you get the same permissions for all objects within. It's a common issue confusing the two. |
Thanks! I'll check it out and report back. |
|
I have just finished authenticating and connecting to my project's bucket, where I have several somewhat large .csv files, from a virtual machine (Debian) on the same project. I am trying to load these files into my R project environment as data frames, but I only seem to be able to get a raw object when using
gcs_get_object(object, parseObject = TRUE)
.Both
gcs_parse_download(object, encoding = "UTF-8")
directly on the bucket object andcontent(object)
on the output ofgcs_get_object(object, parseObject = TRUE)
, throwError in content(object) : is.response(x) is not TRUE.
(which is expected, at least for the latter case).
I have also tried
gcs_parse_download(object, encoding = "ANSI")
with the same results.Do you know what might be happening here? Have I misunderstood one of these functions? How can I get my data into a data frame?
The text was updated successfully, but these errors were encountered: