Lastly, StorIA will have the option to share your story on an internal social page. This network will have a collection of stories created by kids from all around the world. This way, even if a kid doesn't want to draw, they could read all of the posted stories, and connect with children abroad through those. The internal network will also implement a ranking system where kids can rank the stories they have read and StorIA will award the best stories of the month with some sort of prize, yet to be decided.
More details on StorIA project can be found here.
In the left image you can see an example of a posible sketch, and in the right the image generated by our application, conditioned with the text "A horse in a field"
- Generated story: Once upon a time, 100 horses were in a field. They were all happy and healthy. One day, a farmer came and took 99 of the horses away. The last horse was left alone in the field. The horse was sad and lonely.
The repository has the following structure:
/APP
: This directory contains the code of StorIA application./Code_VM
: This directory contains the neccessary files, which must be copied into a virtual machine, to use StorIA AI capabilities./Design APP
: This directory contains an overview of the application initial desing./Examples
: Contains a folder with the original sketch and the image generated from it./Sketches
: Some example sketches that can be used with the ImageGenerator script.DrawSketch.py
: File with the source code for the Tkinter interface, which enables a user to draw and save a sketch using a simple toolbox.ImageGenerator.py
: Main file that contains the logic of the application. It defines the code necessary for creating the Gradio application and contains the functions that call the auxiliary models, namely: Sketch2Image, Image2History.TextGenerator.ipynb
: Testing of BLIP-2 to describe the image.environment.yml
: The environment required to execute the code of the different files.Mistral-7B.py
: File that takes an image, runs it through BLIP-2 to describe it, and then uses Mistral-7B to create the history.
The different pretrained models that we use for the generative tasks of StorIA are the following:
- Sketch2Image: StableDiffusionXL with ControlNet and a Variational Auto-Encoder
- Combination of models which are used to transform the skecth into an image taking into account the text provided to condition the generation.
- Image2Text: Blip2
- Model used to provide a description of the generated image.
- Text2Text: Mistral
- Model used to generate the text of a story page.
There is a more specific explanation of the pipeline on the report.
Before starting clone the repository:
git clone https://github.com/joanlafuente/StorIA.git
And then create a conda environment with the following command:
conda env create -f environment.yml
To be able to use StorIA is required to add a .env file in the /APP folder. This file provides the configuration to connect through ssh to a linux virtual machine, which will must contain files and folders of /Code_VM and it will execute the generative AI models. The path on the virtual machine containing /Code_VM files has to be updated using HOME_CLUSTER variable at /APP/main.py file. The virtual machine also requires the instalation of the conda environment.
If you do not have any machine available create the .env file as in the following example, in this way the application will work but it will not have the generative AI capabilities.
This .env file must have the following structure:
HOSTNAME = '<Host IP>'
PORT = '<Host port>'
USERNAME_CLUSTER = '<Username>'
PASWORD = '<Password>'
Also, Text2Text model (Mistral) requires the use of an Access Token provided by Huggingface. This token must be set on the variable HG_TOKEN_MISTRAL
at Code_VM/img2text.py, so the model can make inference.
Execute /APP/main.py and the application will be automatically launched on the main page. There you will have the options to generate a new story or acces previouslly created ones. If you have any proble do not hesitate to contact us.
The application is develoved in a way that allows you to export it and be used in IOS or Andorid devices, this process has to be done using Kivy.
Execute /Imagegenerator.py which will launch the gradio interface, in which you are able to generate one image from a sketch, as well as the start of a story from that drawing. The interface will be hosted in your local machine. The IP will be printed in the command line.
Joan Lafuente Baeza, joan.lafuente@autonoma.cat
Maria Pilligua Costa, maria.pilligua@autonoma.cat
Nil Biescas Rue, nilbiescas3@gmail.com
Jordi Longaron Carbonell, jordilongaroncarbonell@gmail.com
Xavi Soto Picón, xaviminisoto@gmail.com