Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

Just a dumb SQLite/LMDB-based cache for JSON-serializeable objects. !!! Migrated to Codeberg 🏔️ !!!

License

Notifications You must be signed in to change notification settings

KOLANICH-libs/Cache.py

Repository files navigation

Cache.py Unlicensed work

wheel wheel (GHA via nightly.link) GitLab Build Status GitHub Actions ~~GitLab Coverage Coveralls Coverage~ Libraries.io Status Code style: antiflash

We have moved to https://codeberg.org/KOLANICH-libs/Cache.py, grab new versions there.

Under the disguise of "better security" Micro$oft-owned GitHub has discriminated users of 1FA passwords while having commercial interest in success and wide adoption of FIDO 1FA specifications and Windows Hello implementation which it promotes as a replacement for passwords. It will result in dire consequencies and is competely inacceptable, read why.

If you don't want to participate in harming yourself, it is recommended to follow the lead and migrate somewhere away of GitHub and Micro$oft. Here is the list of alternatives and rationales to do it. If they delete the discussion, there are certain well-known places where you can get a copy of it. Read why you should also leave GitHub.


Just a dumb key-value disk-persistent compressed cache. Values types available depend on container type:

  • BlobCache - cache for dumb binary blobs. A base class, all other classes are just some filters added upon it.
    • StringCache - cache for UTF-8 strings.
      • JSONCache - cache for anything JSON-serializeable. Uses ujson if it is available which is faster than built-in json module.
    • BSONCache - more space-efficient than JSONCache. Less efficient for storage binary blobs and strings than BlobCache and StringCache. Available if pymongo is installed.
    • MsgPackCache - more space-efficient than BSONCache. You need a package for MsgPack serialization.
    • CBORCache - may be a bit less efficient than MsgPackCache, but supports recursion. You need a package for CBOR serialization: either cbor or cbor2.
    • Cache - selects the most capable container available. It is not portable and expected to be used on a local machine for caching stuff. If you need compatibility, use containers explicitly.

The keys are strings, bytes, ints or anything that can be a value (since locating a record is usually done by comparing serialized representation even dicts can be a key, if they deterministically result into the same bytes (since json.dumps sorts keys, they are)).

import Cache
c = Cache.Cache("./cache.sqlite", Cache.compressors.lzma) # or you can put True to automatically select the best compressor available. File extension matters, based on it backend is automatically selected!
c["str"] = "str"
c["int"] = 10
c["float"] = 10.1
c["object"] = {"some":1, "cached":{"shit":[1,2,3]}}
print(c["str"], c["int"], c["float"], c["object"])

print("object" in c)
del(c["object"])
print("object" in c)
c.empty()
print("str" in c)

Why?

Because pickle is insecure shit which is often abused by using it for caching the data not requiring code execution.

Compression optimization

When you have populated the cache with enough data to derive an effective shred dictionary, call optimizeCompression. This will compute a dictionary, write it into the base and then will start recompression the records with dictionary. Improves compression greatly (6 times) when compared to each record compessed independently. Also should speed-up further compressions. For now works only for zstd, and you need to rebuild the python bindings because the impl currently used hardcodes the slow method for dictionary creation.

Backends

Library architecture allows plugging multiple backends. A backend is a wrapper around another database providing a unufied key-value interface. They are probably useful even without this lib.

Following backends are implemented:

By default it uses an SQLite database as a backend (and historically it was the only store available when there were no backends). Bindings to SQLite are in python stdlib, no need to install any third-party dependencies. If SQLITE_ENABLE_DBSTAT_VTAB was not defined during SQLite dynamic shared library used by python (sqlite3.dll on Windows, sqlite3.so on Linux) build, it is strongly recommended to rebuild it with the macrodef defined and replace it in order to make python use it. It is OK to build with MinGW-W64, in fact in Anaconda it is built with MinGW, not with MSVC. It will allow measuring tables size, which is useful for recompression. You can get SQLite library build options using getSQLiteLibCompileOptions.

Optionally an LMDB backend is available.

Hacking

See Contributing.md for architecture overview and design decisions rationale.

About

Just a dumb SQLite/LMDB-based cache for JSON-serializeable objects. !!! Migrated to Codeberg 🏔️ !!!

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages