🤖 Hide text within other natural language text 🍅
Tomato is a proof of concept steganography tool that utilises minimum-entropy coupling code provided by ssokota! ⭐
- LLM-Generated Cover Text: The LLM, as normal, generates coherent text based off a prompt.
- Embedding with MEC: MEC is applied to merge the probability distribution of the hidden message (ciphertext) with the distribution of the LLM-generated covertext. This coupling minimizes the joint entropy, ensuring that the stegotext (covertext with the embedded message) retains the statistical properties of natural language, making the hidden message effectively undetectable.
- Decoding Process: During decoding, the LLM assists by providing a context-aware interpretation of the stegotext. MEC is then used in reverse to decouple the hidden message from the covertext. The process leverages the same probability distributions used during embedding, ensuring that the message is accurately extracted without compromising the integrity of the covertext.
This method ensures that the hidden message is seamlessly integrated into the text and can be securely and precisely retrieved later, with minimal risk of detection.
from tomato import Encoder
encoder = Encoder()
plaintext = "hello"
formatted_stegotext, stegotext = encoder.encode(plaintext)
estimated_plaintext, estimated_bytetext = encoder.decode(stegotext)
Output:
Stegotext: After hours, I like to walk. Sometimes I will travel by train to a station I’ve never been, and walk from there in no particular direction. As I walk, I find the world reveals itself, in small, inexplicable ways. Tonight, a rabbit darted across the track ahead of the train with such urgency I thought, for a moment, it was a fox, or something more menacing. And when I pulled my phone out to
------
Decoded Plaintext: helloAAAAAAAAAA # The A's are padding up to the encryption key length
Tomato required Nvidia CUDA. Follow the steps below:
- Ensure your Nvidia drivers are up to date: https://www.nvidia.com/en-us/geforce/drivers/
- Install the appropriate dependancies from here: https://pytorch.org/get-started/locally/
- Validate CUDA is installed correctly by running the following and being returned a prompt
python -c "import torch; print(torch.rand(2,3).cuda())"
Install the dependencies using:
pip install git+https://github.com/user1342/mec
git clone https://github.com/user1342/Tomato.git
cd tomato
pip install -r requirements.txt
pip install -e .
You can use the Tomato Encoder/Decoder Tool directly from the command line. Here are the available commands:
To encode a plaintext message into stegotext:
tomato-encode.exe "Your secret message here" --cipher_len 20 --shared_private_key 123abc... --prompt "Good evening."
Output:
Stegotext: [Your encoded message here]
To decode a stegotext back into its original plaintext:
tomato-decode.exe "Your stegotext here" --cipher_len 20 --shared_private_key 123abc... --prompt "Good evening."
Output:
Estimated Plaintext: [Your decoded plaintext]
Estimated Bytetext: [Your decoded bytetext]
Checkout the example playbook! For a quick demonstration, you can try encoding and decoding a simple message using the following code snippet:
from tomato import Encoder
encoder = Encoder()
plaintext = "I am a hidden code"
formatted_stegotext, stegotext = encoder.encode(plaintext)
estimated_plaintext, estimated_bytetext = encoder.decode(stegotext)
print(formatted_stegotext)
print("------")
print(estimated_plaintext)
The Tomato Encoder/Decoder Tool offers several customizable parameters:
- cipher_len: Length of the cipher (default is 15).
- shared_private_key: Shared private key in hex format. If not provided, a random key will be generated.
- prompt: Prompt for the language model (default is "Good evening.").
- max_len: Maximum length of the covertext (default is 100).
- temperature: Sampling temperature for the language model (default is 1.0).
- k: The k parameter for the language model (default is 50).
- model_name: Name of the language model to be used (default is "unsloth/mistral-7b-instruct-v0.3-bnb-4bit").
Tomato is an open-source project and welcomes contributions from the community. If you would like to contribute to Tomoto, please follow these guidelines:
- Fork the repository to your own GitHub account.
- Create a new branch with a descriptive name for your contribution.
- Make your changes and test them thoroughly.
- Submit a pull request to the main repository, including a detailed description of your changes and any relevant documentation.
- Wait for feedback from the maintainers and address any comments or suggestions (if any).
- Once your changes have been reviewed and approved, they will be merged into the main repository.
Tomato follows the Contributor Covenant Code of Conduct. Please make sure to review and adhere to this code of conduct when contributing to Tomato.
If you encounter a bug or have a suggestion for a new feature, please open an issue in the GitHub repository. Please provide as much detail as possible, including steps to reproduce the issue or a clear description of the proposed feature. Your feedback is valuable and will help improve Monocle for everyone.
MIT