Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cloud file handling proposal #411

Merged
merged 13 commits into from
Jun 22, 2023
197 changes: 197 additions & 0 deletions proposals/CloudIdentifier.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
# Cloud Identifier

## Authors:

* Alexander Hendrich (hendrich@chromium.org)

## Participate

* [Issue tracker](https://github.com/WICG/file-system-access/issues)

## Table of Contents

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

## Introduction
The objective of this API is to allow web applications to detect whether a `FileSystemHandle` they have acquired (obtained via file/directory picker or as a parameter of an opening flow as a registered file handler) belongs to a cloud-synced file/directory. If so, the web application receives a “cloud identifier” so that it can directly interact with the file/directory using the cloud storage provider’s (CSP) web APIs.

A CSP can register as a local sync client to the browser, which in turn may ask the local sync client to provide a unique identifier for a given file/directory when requested.

### Sample usage

```javascript
const [fileHandle] = await window.showOpenFilePicker(pickerOpts);
const cloudIdentifiers = await fileHandle.getCloudIdentifiers();

if(cloudIdentifiers.length === 0) {
// File is not synced by any CSP
}
for(const cloudIdentifier of cloudIdentifiers) {
if(cloudIdentifier.providerName === 'drive.google.com') {
// retrieve/modify the file from Google Drive API using cloudIdentifier.id
} else if(cloudIdentifier.providerName === 'onedrive.live.com') {
// retrieve/modify the file from Microsoft OneDrive API using cloudIdentifier.id
}
}
```

### Use Cases

#### Remote file handling

Web applications offering app streaming or VDI might want to make a locally synced file available to the remote virtual machine where the app/desktop is being streamed from. For example, the user runs their image editor on a remote machine, launches the file picker to open image.png from the local client device, makes some changes and then saves the file with Ctrl+S.

##### Before

![image](./images/cloud-identifier/remote-file-handling-before.png "Before")

1. VDI web app requests and receives the file via `FileSystemFileHandle::getFile()`
alex292 marked this conversation as resolved.
Show resolved Hide resolved
2. CSP client app downloads the file from CSP server (if not already on disc)
3. VDI web app transfers the file’s content to VDI server, which creates it as local file on the server and opens the application for that file
4. VDI server sends any changes made to the file back to the VDI web app
5. VDI web app writes updated file contents via `FileSystemFileHandle::write()`, which is picked up by the CSP client app
6. CSP client app synchronizes the file by uploading it to the CSP server

In this scenario all transfers (arrows in diagram) transfer the entire file and the file has to be downloaded and uploaded from the local device’s network connection.

##### After

![image](./images/cloud-identifier/remote-file-handling-after.png "After")

1. VDI web app requests and receives the file’s cloud identifier via FileSystemFileHandle::getCloudIdentifiers()
alex292 marked this conversation as resolved.
Show resolved Hide resolved
2. VDI web app sends the file’s cloud identifier to VDI server
3. VDI server requests and receives the file’s content from CSP server using the cloud identifier
4. VDI server sends any changes made to the file back to the CSP server
5. CSP client synchronizes updated file to local device (might be metadata only)

In this scenario only transfers (3) and (4) would actually transfer the full file, all the other transfers only move the cloud identifier. These actual file transfers also don’t move across the device’s network connection, but rather use the VDI server’s network connection to the CSP server, which should likely also have higher bandwidth.

This reduces network traffic and delays (especially with large files) and prevents version drift (file being modified somewhere else while a local client device is uploading).

#### De-duplication for online document editors

Web-based document editors can already open local files using existing file system APIs. In order to offer full functionality, including cross device support and some advanced editing/sharing features, the documents need to be uploaded to a cloud storage (Google Drive or Microsoft OneDrive or similar) before they can be edited. This leads to duplicate files for documents that were already stored in these locations.

With the proposed changes, the web application could check whether the given file handle is already synced by a cloud storage and generate identifiers for these without a duplicate upload.

#### Drag & Drop into Mail

Sharing files via mail used to be done using simple file attachments. Especially for large files, it is preferable to instead upload them to a CSP and then share an access link to that file to the recipient. Google Mail’s web application, for example, already does this by prompting the user to upload a file to Google Drive when attaching a large file.

With the proposed changes, the web application could already detect that the file is synced by cloud storage and generate a share link for that file’s cloud identifier without requiring the user to manually go through these steps or prompt the user to upload the file to cloud storage again.

### Non-Goals

This proposal does not plan to provide a standardized way for CSP’s web APIs to
alex292 marked this conversation as resolved.
Show resolved Hide resolved
* give access permissions to these files/directories.
alex292 marked this conversation as resolved.
Show resolved Hide resolved
* interact (fetch/modify/etc) with these files/directories.
* provide additional meta-data (e.g. sync status)

Instead, it should only generate an opaque identifier for a file/directory so that web apps can then interact with these individual CSPs’ web APIs on the same file/directory. Obtaining the required access permissions on that file is delegated to the web app.
Web apps might only support one specific CSP. When supporting multiple CSPs, each CSP will require their own implementation, although semantically they would likely perform similar steps.
alex292 marked this conversation as resolved.
Show resolved Hide resolved

## Design

![image](./images/cloud-identifier/design.png "Design")

1. Locally installed CSP client registers itself as a provider for certain directories with the browser
2. Web app receives `FileSystemHandle` for a file/directory (via File Handling opening flow, File System Access file/directory picker or drag & drop) and requests its cloud identifier(s) via `getCloudIdentifiers()`.
3. Browser checks whether this path has been registered as being synced by a provider
* If no, resolve promise with empty list
4. For each registered provider for that path, the browser will send a request to the registered provider’s executable with the file’s/directory’s path
5. Provider requests a token for that path from the CSP
* The token can also be a stable and already cached identifier, in which case step 5 and 6 are skipped
alex292 marked this conversation as resolved.
Show resolved Hide resolved
6. CSP can choose to generate a one-time token or stable identifier to later be used by the CSP’s web APIs. Either way, the properties of the token are up to the CSP and completely opaque to the browser.
7. Provider responds to browser with the token
alex292 marked this conversation as resolved.
Show resolved Hide resolved
8. Browser gathers all responses from providers. For all incoming responses, the browser will construct a cloud identifier consisting of the responding provider’s registered identifier and the provided token. Once all registered providers for that path have responded, the promise is resolved with a list of cloud identifiers. If the provider fails to respond within a reasonable time frame, the promise is resolved with all received cloud identifiers until then or empty list if none.
9. Web app can freely interact with CSP’s web APIs on that file/directory. The web application is responsible for getting the right access permissions for these web APIs.

### Web IDL

This section describes the interface the web app would interact with.

```idl
dictionary FileSystemCloudIdentifier {
DOMString providerName;
DOMString id;
};

partial interface `FileSystemHandle` {
Promise<FileSystemCloudIdentifier[]> getCloudIdentifiers();
}
```

The new method
* extends FileSystemHandle, i.e. is available for files and directories.
* returns a list of FileSystemCloudIdentifiers since a single file/directory can be synced by multiple CSPs at the same time, although in most cases this list would only contain a single entry.
* returns an empty list if the file/directory is not synced by any CSPs or no CSP client responded in time.

### Interaction with CSP client

This web API requires the browser and the CSP’s local sync client to exchange information. This piece is not part of the official specification and up to individual browser’s to implement, but we can provide guidelines here as well.

#### Registration

The browser needs to be aware of
* which CSP client’s are available on the user’s computer
* which files/directories are synced by the CSP client
* where the CSP client executable is located
* the CSP’s identifier (e.g. “com.google.drive”)

Therefore, the CSP needs to provide a JSON file containing these details and register it with the browser at a known location (specific directory or registry key), similar to [Chrome extension’s native messaging host registration](https://developer.chrome.com/docs/apps/nativeMessaging/#native-messaging-host-location).

The provided file would look like this:

```json
{
"name": "com.foo-company.drive",
"path": "C:\\Program Files\\FooCompany\\cloud_storage_provider_helper.exe",
"synced_paths": [
"C:\\path_to_synced_directory\\",
"C:\\path_to_other_synced_directory\\",
]
}
```

#### Requesting a cloud identifier

Whenever a web app calls `getCloudIdentifiers()`, the browser will iterate through all registered CSP clients and filter for ones that cover the file’s/directory’s path as part of their `synced_paths`.

For these, the browser will launch the executable at `path` and transfer the web app’s origin and the requested file/directory path, either via command line argument or via stdin (preceded with a 32-bit message length).

The CSP client will then respond with the token for the given file/directory via stdout (preceded with a 32-bit message length).

Once the browser has received all responses (or some reasonable timeout has expired), the browser will bundle the individual tokens and the CSPs’ names together to return a cloud identifier.

It is up to each individual browser’s implementation on how they handle dynamic registrations, i.e. whether they re-read all the CSP registration files for each request and how long or if they cache them.

## Security and Privacy Considerations

### Fingerprinting

The browser has no control whether the CSP will provide one-time tokens or stable identifiers as part of their cloud identifier. If the CSP provides stable identifiers, the web application could use these as a [fingerprinting](https://www.w3.org/TR/fingerprinting-guidance/#dfn-active-fingerprinting) mechanism for the files/directories it has access to.

In theory, if a web application already has access to a `FileSystemHandle` the web application could already use other mechanisms to generate fingerprinting identifiers, but these are less stable:
* Hashing the file’s/directory’s contents -> identifier will change if file/directory content changes
* `FileSystemHandle::getUniqueId()` [[explainer](https://github.com/whatwg/fs/pull/46)] -> clearing browsing data will reset unique IDs

The web application would need repeated access to the same `FileSystemHandle` to perform this fingerprinting though. That means the user must either re-grant access to the same file or the web app stores the file handle in an IndexDB, which can be cleared by the user though by clearing their browsing data.
alex292 marked this conversation as resolved.
Show resolved Hide resolved

It would also be up to the CSP whether they actually provide permanent tokens or temporary tokens.

### Modification via read-only permission

If a web application only has `read` level access to a `FileSystemHandle`, but has `write` level access to that file via CSP web APIs, it could still modify the cloud-stored file, which is then synced to the device and thereby modify the file.
alex292 marked this conversation as resolved.
Show resolved Hide resolved

In this case, the user has clearly granted write access to the CSP-backed file and by having a sync client, the user also allows the local files to be modified, so the change would actually not be surprising.
a-sully marked this conversation as resolved.
Show resolved Hide resolved

## Contributors

* Austin Sullivan (asully@chromium.org)
* Rob Beard (rbeard@google.com)

## Stakeholder Feedback / Opposition
TBD
Binary file added proposals/images/cloud-identifier/design.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.