Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialisation seems to maintain cached filesystem #42

Open
benrutter opened this issue Feb 27, 2024 · 3 comments
Open

Initialisation seems to maintain cached filesystem #42

benrutter opened this issue Feb 27, 2024 · 3 comments

Comments

@benrutter
Copy link
Contributor

I'm not 100% sure that the title here, is accurate, as it involves a bit more understanding of what's happening under the hood with asyncssh than I have so far.

I'm also not sure if this is intended behaviour vs actually a bug (sorry!)

The issue is something like this:

fs = SSHFileSystem(host, username=username, password=password)
for filepath in long_list_of_files:
    with fs.open(filepath) as file:
        _ = file.read()

If this runs for a long time, the connection might be shut down from the other side throwing up an asnycssh.sftp.SFTPNoConnection error, so far this is all as expected.

The bit that seems unusual is that something like this:

fs = SSHFileSystem(host, username=username, password=password)
for filepath in long_list_of_files:
    try:
        with fs.open(filepath) as file:
            _ = file.read()
    except SFTPNoConnection:
        new_fs = SSHFileSystem(host, username=username, password=password)
        with new_fs.open(filepath) as file:
            _ = file.read()

The new_fs will always throw up the same SFTPNoConnection error, which seems to be because something behind the scenes is being cached?

Notably, the following works by clearing the cache before reconnecting:

fs = SSHFileSystem(host, username=username, password=password)
for filepath in long_list_of_files:
    try:
        with fs.open(filepath) as file:
            _ = file.read()
    except SFTPNoConnection:
        fs.clear_instance_cache()
        new_fs = SSHFileSystem(host, username=username, password=password)
        with new_fs.open(filepath) as file:
            _ = file.read()

I'd assume the expected behaviour would be that initialising a new SSHFileSystem would create a fully new connection - is this intentional behaviour?

@benrutter
Copy link
Contributor Author

Should also mentioned that there's a stack overflow issue related to this

@benrutter
Copy link
Contributor Author

I've done some playing around and noticed the connection isn't closed during finalize is that intentional? Also, is there any reason not to put a self.clear_instance_cache() in that finalize step? (I feel like there probably is, I'm mainly just interested!)

@fkrauthan
Copy link

We discovered the same issue. Because of the cached file system this library actually leaks connections as every time you initialize an instance of SSHFileSystem it connects to the SFTP server while all method calls use the very first instance only.

Given that the library opens a new connection as part of the constructor it probably should overwrite the cachable attribute of AbstractFileSystem and set it to False.

A workaround until that is done:

class SSHFileSystemNoCache(SSHFileSystem):
    cachable = False

and then use for your code the SSHFileSystemNoCache class instead.

Also it would be great to explicit be able to close the connection. Currently I use a sf.client.close() to enforce that after I am done using my fs instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants