Skip to content

Commit

Permalink
πŸ“ Populate storage FAQ (#45)
Browse files Browse the repository at this point in the history
* πŸ“ Populate storage FAQ

* πŸ“ Iterate on the storage FAQ

* πŸ“ Finalize
  • Loading branch information
falexwolf authored Aug 11, 2023
1 parent bc51895 commit 2638c99
Showing 1 changed file with 107 additions and 9 deletions.
116 changes: 107 additions & 9 deletions docs/faq/storage.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,117 @@
# Storage FAQ

1. What is default storage? How do I find out about default storage?
## What is the default storage location?

2. Where is my SQLite file? What happens if I move the `.lndb` file around?
It's the directory or cloud bucket that you pass when initializing a LaminDB instance:

3. What is the `.lamindb/` folder? Will there be multiple `.lamindb/` folders if I have multiple storage locations?
```bash
lamin init --storage ./default-storage # or s3://default-bucket or gs://default-bucket
```

4. What happens if I move files around? What should I do if I want to bulk migrate files to another storage (let’s say another s3 bucket)?
It's easiest to see and update default storage in the Python API ({attr}`~lamindb.dev.Settings.storage`):

5. When should I pass `key=` and when should I rely on cryptic ids to register a file? What’s the recommended process to register a file?
```python
import lamindb as ln
ln.settings.storage # set via ln.settings.storage = "s3://other-bucket"
#> s3://default-bucket
```

6. Will I never be able to find my file if I don’t give it a description? (should this even be allowed?)
You can also change it using the CLI via

7. What should I do if I have a local file and want to upload it to S3? (Shall I register a File first and upload it with `.save`, or shall I upload outside of Lamin before registering it?)
```
lamin set --storage s3://other-bucket
```

8. How to update a file in storage? What’s the process to update file records after I moved files around or updated files?
## Where is my SQLite file?

9. How do I version a file? Do I always make a new file record and a new transform if I want to track the parent files?
The SQLite file is in the default storage location of the instance and called: `f"{instance_name}.lndb"`

You can also see it as part of the database connection string:

```
ln.setup.settings.instance.db
#> sqlite:///path-to-sqlite
```

If default storage is in the cloud, the SQLite file is cached in the local cache directory ({attr}`~lamindb.setup.dev.StorageSettings.cache_dir`):

```
ln.setup.settings.storage.cache_dir
#> path-to-cache-dir
```

## What happens if I move the `.lndb` file around?

The SQLite file has to remain in the default storage location of the instance.

You can, however, take the SQLite file and place it in a new location (`./mydir`, `s3://my-bucket`) and create a new LaminDB instance passing `--storage ./mydir` (or `--storage s3://my-bucket`). All your metadata is then present in the new instance.

## What is the `.lamindb/` directory?

It stores files that are merely referenced by metadata ({attr}`~lamindb.File.key` is `None`).

There is only a single `.lamindb/` directory per LaminDB instance.

## What should I do if I want to bulk migrate files to another storage?

Currently, you can only achieve this manually:

1. Copy or move files into the desired new storage location
2. Adapt the corresponding record in the {class}`~lamindb.Storage` registry by setting the `root` field to the new location

## When should I pass `key` and when should I rely purely on metadata to register a file?

The recommended way of making files findable in LaminDB is to link them to labels and use the {attr}`~lamindb.File.description` field.

When you're registering existing data, however, they'll often come with a semantic `key` (the relative path within the storage location).

## Will I never be able to find my file if I don’t give it a description?

You can't create files that have _none_ of `description`, `key` and `run` set.
Hence, you will always be able to find your find through either of these or
through additional metadata.

## What should I do if I have a local file and want to upload it to S3?

You can either create a file object from the local file and auto-upload it to the cloud during `file.save()`:

```
file = ln.File(local_filepath)
file.save() # this will upload to the cloud
```

You can also create a file object from an existing cloud path:

```
file = ln.File("s3://my-bucket/my-file.csv")
file.save() # this will only save metadata as the file is already in registered storage
```

This enables to use any tool to move data into the cloud.

## How to update a file in storage?

You can edit metadata of the file by querying it and then resetting its attributes. For instance,

```
file.description = "My new description"
file.save() # save the change to the database
```

## How do I version a file?

You use the `make_new_version_of` parameter:

```
new_file = ln.File(df, make_new_version_of=old_file)
```

Then, `new_file` automatically has {attr}`~lamindb.File.version` set, incrementing the version number by one.

You can also pass a custom version:

```
new_file = ln.File(df, version="1.1", make_new_version_of=old_file)
```

It doesn't matter which old version of the file you use, any old version is good!

0 comments on commit 2638c99

Please sign in to comment.