Is this package actively maintained? #433

davidbudzynski · 2023-09-13T14:17:27Z

I can see that this package hasn't received any commits to master since 2020 and there are a lot of unresolved issues without any response to them from the devs. Is it still active?

odysseu · 2023-11-10T10:24:41Z

Doesn't look like it is... 😞

davidbudzynski · 2023-11-10T10:30:27Z

the entrire cloudyr site looks like it's been abandoned unfortunately... It may make more sense to some people to just use aws-cli instead of an R specific layer to interact with it

odysseu · 2023-11-10T10:46:10Z

For other s3 solutions like minIO, there might be more chance trying https://github.com/paws-r/paws

DyfanJones · 2023-11-10T11:59:41Z

Hi All,

Just to give a little context around paws. Paws is designed to be an AWS SDK. It aims to give the full suite of AWS services from within R.

This means it follows other AWS SDK styles i.e. comparison between boto3 (python aws sdk) and paws.

import boto3

client = boto3.client("s3")
client.download_file(Bucket = "mybucket", Key = "path/to/my/file.txt", Filename = "file.txt")

client = paws::s3()
client$download_file(Bucket = "mybucket", Key = "path/to/my/file.txt", Filename = "file.txt")

This is a different approach that aws.s3 and the other cloudyr project packages. As they give a more of a R approach.

aws.s3::save_object("path/to/my/file.txt" file = "file.txt", bucket = "mybucket")

# i.e. helpful wrapper function for more R friendly use
aws.s3::s3read_using(FUN = readLines, object ="path/to/my/file.txt", bucket = "mybucket")

aws.s3::s3read_using is ultimately a wrapper around aws.s3::save_object and the FUN parameter (in the above example readLines.

paws doesn't aim to replace these helper functions i.e. aws.s3::s3read_using however there are several packages the use paws to give a more R friendly interface:

pins-r: The pins package publishes data, models, and other R objects, making it easy to share them across projects and with your colleagues.
s3fs: s3fs provides a file-system like interface into Amazon Web Services for R.

As paws aims to offer a wide range of aws services it can also support packages that need connections to other AWS services other than AWS S3.

vetiver: The goal of vetiver is to provide fluent tooling to version, share, deploy, and monitor a trained model. (AWS Sagemaker interface is achieve through the use of paws)
noctua: The goal of the noctua package is to provide a DBI-compliant interface to Amazon’s Athena (https://aws.amazon.com/athena/) using paws SDK.
etc ....

As the current maintainer of paws I am bias towards paws, however aws.s3 and the other cloudyr packages are an excellent set of tools to interface into AWS.

As aws.s3 hasn't had any updates as of late it does suffer from some of the latest AWS changes i.e. redirects

aws.s3::save_object(object = "path/to/my/file.txt", bucket = "mybucket", 
                    file = "file.txt")
#> List of 6
#>  $ Code     : chr "PermanentRedirect"
#>  $ Message  : chr "The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future "| __truncated__
#>  $ Endpoint : chr "mybucket.s3.amazonaws.com"
#>  $ Bucket   : chr "mybucket"
#>  $ RequestId: chr "RXE9GRFR52YZ31VP"
#>  $ HostId   : chr "9ZoyfJpqpfzVZOsxP8r8Xj/93oYPLOuxmkW5BVUGxWYal2WbPli2yyj1CX0TXwJPmxUZgo9Vnqc="
#>  - attr(*, "headers")=List of 7
#>   ..$ x-amz-bucket-region: chr "eu-west-1"
#>   ..$ x-amz-request-id   : chr "RXE9GRFR52YZ31VP"
#>   ..$ x-amz-id-2         : chr "9ZoyfJpqpfzVZOsxP8r8Xj/93oYPLOuxmkW5BVUGxWYal2WbPli2yyj1CX0TXwJPmxUZgo9Vnqc="
#>   ..$ content-type       : chr "application/xml"
#>   ..$ transfer-encoding  : chr "chunked"
#>   ..$ date               : chr "Fri, 10 Nov 2023 11:52:24 GMT"
#>   ..$ server             : chr "AmazonS3"
#>   ..- attr(*, "class")= chr [1:2] "insensitive" "list"
#>  - attr(*, "class")= chr "aws_error"
#>  - attr(*, "request_canonical")= chr "GET\n/mybucket/path/to/my/file.txt\n\nhost:s3.amazonaws.com\nx-amz-date:20231110T115224Z\n\nhost;x-amz-date\ne3b0"| __truncated__
#>  - attr(*, "request_string_to_sign")= chr "AWS4-HMAC-SHA256\n20231110T115224Z\n20231110/us-east-1/s3/aws4_request\n544435802593214ecab8eb95fa7623779125eae"| __truncated__
#>  - attr(*, "request_signature")= chr "AWS4-HMAC-SHA256 Credential=DUMMY/20231110/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-da"| __truncated__
#> NULL
#> Error in parse_aws_s3_response(r, Sig, verbose = verbose): Moved Permanently (HTTP 301).

^{Created on 2023-11-10 with reprex v2.0.2}

Whereas paws can handle these:

client = paws::s3()
client$download_file(Bucket = "mybucket", Key = "path/to/my/file.txt", Filename = "file.txt")
#> list()
file.exists("file.txt")
#> [1] TRUE

^{Created on 2023-11-10 with reprex v2.0.2}

Final note, paws is generated from AWS's own API definitions so new services and methods are added within each release.

tyner · 2024-03-04T19:22:27Z

Just curious, does any benchmarking exist in terms of the relative speed of the cloudyr package functionalities versus their paws counterparts? For example, if we switch from the former to the latter, might we expect speed improvements when accessing S3 objects?

DyfanJones · 2024-03-11T23:30:39Z

@tyner consistent benchmark can be difficult due to a number of factors:

Internet connection
AWS api slows down when hit repeatedly
etc...

However performance can be still identified. Ultimately both packages use the curl package to make the api call to AWS. So the biggest performance would be achieve in how fast each package can make the call and parse the results.

For this PR paws-r/paws#762 there is a benchmark that mocks the response from AWS (to remove the internet connection and aws api slowing down) to zone on the performance of making the AWS API call and parsing the results. It isn't an intensive benchmarking but I hope it helps.

Side note: performance has been a big factor in paws as of late (https://github.com/paws-r/paws/blob/main/paws.common/NEWS.md) with several functions being refactored into cpp to improve the overall speed of the sdk.

tyner · 2024-03-12T16:51:32Z

Thanks @DyfanJones, I did my own comparison of aws.s3::put_object versus s3fs::s3_file_upload and the timings were roughly the same. Looking forward to re-running the test under the new version of paws.common !

philiporlando · 2024-04-06T20:44:53Z

Looks like AWS' blog recommends using {paws} as well.

odysseu mentioned this issue Nov 14, 2023

Can't get s3 bucket object when using SSE [minio] [s3 object] [encryption] paws-r/paws#718

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is this package actively maintained? #433

Is this package actively maintained? #433

davidbudzynski commented Sep 13, 2023

odysseu commented Nov 10, 2023

davidbudzynski commented Nov 10, 2023

odysseu commented Nov 10, 2023

DyfanJones commented Nov 10, 2023 •

edited

Loading

tyner commented Mar 4, 2024

DyfanJones commented Mar 11, 2024

tyner commented Mar 12, 2024

philiporlando commented Apr 6, 2024

Is this package actively maintained? #433

Is this package actively maintained? #433

Comments

davidbudzynski commented Sep 13, 2023

odysseu commented Nov 10, 2023

davidbudzynski commented Nov 10, 2023

odysseu commented Nov 10, 2023

DyfanJones commented Nov 10, 2023 • edited Loading

tyner commented Mar 4, 2024

DyfanJones commented Mar 11, 2024

tyner commented Mar 12, 2024

philiporlando commented Apr 6, 2024

DyfanJones commented Nov 10, 2023 •

edited

Loading