Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this package actively maintained? #433

Open
davidbudzynski opened this issue Sep 13, 2023 · 8 comments
Open

Is this package actively maintained? #433

davidbudzynski opened this issue Sep 13, 2023 · 8 comments

Comments

@davidbudzynski
Copy link

I can see that this package hasn't received any commits to master since 2020 and there are a lot of unresolved issues without any response to them from the devs. Is it still active?

@odysseu
Copy link

odysseu commented Nov 10, 2023

Doesn't look like it is... 😞

image

@davidbudzynski
Copy link
Author

the entrire cloudyr site looks like it's been abandoned unfortunately... It may make more sense to some people to just use aws-cli instead of an R specific layer to interact with it

@odysseu
Copy link

odysseu commented Nov 10, 2023

For other s3 solutions like minIO, there might be more chance trying https://github.com/paws-r/paws

@DyfanJones
Copy link

DyfanJones commented Nov 10, 2023

Hi All,

Just to give a little context around paws. Paws is designed to be an AWS SDK. It aims to give the full suite of AWS services from within R.

This means it follows other AWS SDK styles i.e. comparison between boto3 (python aws sdk) and paws.

import boto3

client = boto3.client("s3")
client.download_file(Bucket = "mybucket", Key = "path/to/my/file.txt", Filename = "file.txt")
client = paws::s3()
client$download_file(Bucket = "mybucket", Key = "path/to/my/file.txt", Filename = "file.txt")

This is a different approach that aws.s3 and the other cloudyr project packages. As they give a more of a R approach.

aws.s3::save_object("path/to/my/file.txt" file = "file.txt", bucket = "mybucket")

# i.e. helpful wrapper function for more R friendly use
aws.s3::s3read_using(FUN = readLines, object ="path/to/my/file.txt", bucket = "mybucket")

aws.s3::s3read_using is ultimately a wrapper around aws.s3::save_object and the FUN parameter (in the above example readLines.

paws doesn't aim to replace these helper functions i.e. aws.s3::s3read_using however there are several packages the use paws to give a more R friendly interface:

  • pins-r: The pins package publishes data, models, and other R objects, making it easy to share them across projects and with your colleagues.
  • s3fs: s3fs provides a file-system like interface into Amazon Web Services for R.

As paws aims to offer a wide range of aws services it can also support packages that need connections to other AWS services other than AWS S3.

  • vetiver: The goal of vetiver is to provide fluent tooling to version, share, deploy, and monitor a trained model. (AWS Sagemaker interface is achieve through the use of paws)
  • noctua: The goal of the noctua package is to provide a DBI-compliant interface to Amazon’s Athena (https://aws.amazon.com/athena/) using paws SDK.
  • etc ....

As the current maintainer of paws I am bias towards paws, however aws.s3 and the other cloudyr packages are an excellent set of tools to interface into AWS.

As aws.s3 hasn't had any updates as of late it does suffer from some of the latest AWS changes i.e. redirects

aws.s3::save_object(object = "path/to/my/file.txt", bucket = "mybucket", 
                    file = "file.txt")
#> List of 6
#>  $ Code     : chr "PermanentRedirect"
#>  $ Message  : chr "The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future "| __truncated__
#>  $ Endpoint : chr "mybucket.s3.amazonaws.com"
#>  $ Bucket   : chr "mybucket"
#>  $ RequestId: chr "RXE9GRFR52YZ31VP"
#>  $ HostId   : chr "9ZoyfJpqpfzVZOsxP8r8Xj/93oYPLOuxmkW5BVUGxWYal2WbPli2yyj1CX0TXwJPmxUZgo9Vnqc="
#>  - attr(*, "headers")=List of 7
#>   ..$ x-amz-bucket-region: chr "eu-west-1"
#>   ..$ x-amz-request-id   : chr "RXE9GRFR52YZ31VP"
#>   ..$ x-amz-id-2         : chr "9ZoyfJpqpfzVZOsxP8r8Xj/93oYPLOuxmkW5BVUGxWYal2WbPli2yyj1CX0TXwJPmxUZgo9Vnqc="
#>   ..$ content-type       : chr "application/xml"
#>   ..$ transfer-encoding  : chr "chunked"
#>   ..$ date               : chr "Fri, 10 Nov 2023 11:52:24 GMT"
#>   ..$ server             : chr "AmazonS3"
#>   ..- attr(*, "class")= chr [1:2] "insensitive" "list"
#>  - attr(*, "class")= chr "aws_error"
#>  - attr(*, "request_canonical")= chr "GET\n/mybucket/path/to/my/file.txt\n\nhost:s3.amazonaws.com\nx-amz-date:20231110T115224Z\n\nhost;x-amz-date\ne3b0"| __truncated__
#>  - attr(*, "request_string_to_sign")= chr "AWS4-HMAC-SHA256\n20231110T115224Z\n20231110/us-east-1/s3/aws4_request\n544435802593214ecab8eb95fa7623779125eae"| __truncated__
#>  - attr(*, "request_signature")= chr "AWS4-HMAC-SHA256 Credential=DUMMY/20231110/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-da"| __truncated__
#> NULL
#> Error in parse_aws_s3_response(r, Sig, verbose = verbose): Moved Permanently (HTTP 301).

Created on 2023-11-10 with reprex v2.0.2

Whereas paws can handle these:

client = paws::s3()
client$download_file(Bucket = "mybucket", Key = "path/to/my/file.txt", Filename = "file.txt")
#> list()
file.exists("file.txt")
#> [1] TRUE

Created on 2023-11-10 with reprex v2.0.2

Final note, paws is generated from AWS's own API definitions so new services and methods are added within each release.

@tyner
Copy link

tyner commented Mar 4, 2024

Just curious, does any benchmarking exist in terms of the relative speed of the cloudyr package functionalities versus their paws counterparts? For example, if we switch from the former to the latter, might we expect speed improvements when accessing S3 objects?

@DyfanJones
Copy link

@tyner consistent benchmark can be difficult due to a number of factors:

  • Internet connection
  • AWS api slows down when hit repeatedly
  • etc...

However performance can be still identified. Ultimately both packages use the curl package to make the api call to AWS. So the biggest performance would be achieve in how fast each package can make the call and parse the results.

For this PR paws-r/paws#762 there is a benchmark that mocks the response from AWS (to remove the internet connection and aws api slowing down) to zone on the performance of making the AWS API call and parsing the results. It isn't an intensive benchmarking but I hope it helps.

Side note: performance has been a big factor in paws as of late (https://github.com/paws-r/paws/blob/main/paws.common/NEWS.md) with several functions being refactored into cpp to improve the overall speed of the sdk.

@tyner
Copy link

tyner commented Mar 12, 2024

Thanks @DyfanJones, I did my own comparison of aws.s3::put_object versus s3fs::s3_file_upload and the timings were roughly the same. Looking forward to re-running the test under the new version of paws.common !

@philiporlando
Copy link

Looks like AWS' blog recommends using {paws} as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants