A voice-based interactive assistant equipped with a variety of synthetic voices (including J.A.R.V.I.S's voice from IronMan)
Ever dreamed to ask a hyper intelligent system tips to improve your armor? Now you can! Well, maybe not the aromor part... This project exploits OpenAI Whisper, OpenAI ChatGPT and IBM Watson.
PROJECT MOTIVATION:
Many times ideas come in the worst moment and they fade away before you have the time to explore them better. The objective of this project is developping a system capable of giving tips and opinions in quasi-real-time about anything you ask. The ultimate assistant will be able to be accessed from any authorized microphone inside your house or from your phone, it should run constantly in the background and when summoned should be able to generate meaningful answers (with a badass voice) as well as interfacing with the pc or a server and save/read/write files that can be accessed later. In addition, it might interface with some external gadgets (IoT) but that's extra.
I managed to build some tools that are capable of reading and abstracting information from textual files (.txt). This tool will be precious in future, when voice commands that handle the Assistant memory will be introduced. The idea is to have specific commands like "open the last conversation about topic X" or "I remember something you said about topic Y can you make a summary of that conversation?". The LocalSearchEngine can find sort the discussions by relevancy (see UpdateHistory.md for further details). Furthermore, a Translator class was implemented as addidtional tool.
other important updates"
- introduced a
VirtualAsssistant
object to allow a more intuitivemain
flow; - rearranged the directory structure to 'hide' some backend code;
other minor updates:
- introduced sounds! Now you can have sound feedback of what is happening with the Assistant. Seeing is beliving;
- Made overall code slightly more efficient;
- An OpenAI account
- ffmpeg
- python virtual enviroment (my venv runs on python 3.7, requirements.txt are compatible with this version only)
- Some credit to spend on ChatGPT (you can get three months of free usage by making signing up to OpenAI) (strognly suggested)
- An OpenAI API key (strongly suggested)
- An IBM Cloud account to exploit their cloud-based text-to speech models (tutorial: https://www.youtube.com/watch?v=A9_0OgW1LZU) (optional)
- mic and speaker (if you have many microphones you might be reuired to tell which audio you plan to use in the
get_audio.py
) - CUDA capable graphic engine (my Torch Version: 1.12.1+cu113, CUDA v11.2
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
)
The easiest way to get answers from ChatGPT is to connect to the service via cloud using an API. To do this you can adopt 2 strategies:
- Using unofficial chatgpt-wrapper: someone amazing made this wrapper to have an ongoing conversation on ChatGPT from Command line or from your python script (https://github.com/mmabrouk/chatgpt-wrapper)
- Using your OpenAI API key you'll be able to send and recieve stuff from the DaVinci003 model (the one that powers ChatGPT3) or from the ChatGPT3.5 Turbo engine[what we are going to do]
- using unofficial SDK: it's a further option that should be viable (https://github.com/labteral/chatgpt-python)
Option 2 is the most straightfoward from March 2023 since the latest OpenAI API support chats. However, you need to have some sort of credit on your account (wether paid or got for free when subscribing). This option is implemented in the openai_api_chatbot.py
script;
Option 1 was my earlier choice: it uses a wrapper to connect to your chatGPT account so you need to authenticate manually every time and follow instructions on the authors'github. It is a sub optimal option because you can't have the system integrated at PC startup since it needs login. Moreover you might be exposed to fails due to server traffic limitations unless you are subscribed to a premium plan (see more at ChatGPT Plus )
You'll find this option implemented at openai_wrapper_chatbot.py
but it's not being updated any longer.
MAIN script you should run: openai_api_chatbot.py
if you want to use the latest version of the OpenAI API. If you rely on the wrapper open openai_wrapper_chatbot.py
instead. demos\da_vinci_demo.py
is a simple script that sends single prompts to chatgpt (no chat possible); you should verify the wrapper works properly with demos\chatgpt_wrapper_demo.py
if you want to use the wrapper. Assistant\
stores all the functions to handle mic interactions and Assistant status.
The remaining scripts are supplementary to the voice generation and should not be edited.
- Verify your graphic engine and CUDA version are compatible with pytorch by running
torch.cuda.is_available()
andtorch.cuda.get_device_name(0)
; - Get the OpenAI API from their official website. This will allow to send and recieve material to Whisper and ChatGPT.
- Authorize yourself by copying-pasting the API key inside
openai.api_key = 'your-key'
(edit these code lines on the MAIN script with your key); - Get a IBM Cloud account up and running by following the youtube video (it will require a credit card at some point but there is a service that allows limited usage free of charge);
- Copy-paste the url and the api key when authorizing and setting up the cloud service inside the principal script;
- [WARNING] If you get errors try to run demos (
/demos
) to see if the probelm is with openai/wrapper. In case: checkpip openai --version
; if the problem is with the wrapper, check if you followed the instructions at the author's github and try to runchatgpt install
with an open chrome tab; this got me some troubles at first as well.
- To have answers spoken in your language you should first check if your language is supported by the speech generator at https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices;
- If it's supported, add or change the
self.languages = {'en': "English",'it': "Italian",# add yours}
inside `Assistant/VirtualAssistant.py``.
- Remember: The loaded Whisper is the medium one. If it performs badly in your language, upgrade to the larger one in the
__main__()
atwhisper_model = whisper.load_model("large")
; but I hope your GPU memory is large likewise.
when running, you'll see much information being displayed. I'm costantly striving to improve the readability of the execution, so forgive slight variations of the screens below. Anyway, this is what generally happens when you hit 'run':
- Preliminary initializations take place: the Assistant is created following default parameters (you can edit them inside the main);
- When awaiting for triggering words is displayed you'll need to say
ELEPHANT
to summon the assistant. This magic word can be switched, but it needs to be english. At this point a conversation will begin and you can speak in whatever language you want (if you followed step 2).
- The word listening... should then appear. At this point, a conversation is began and you can make your question. When you are done just wait (
RESPONSE_TIME = 3
seconds) for the question to be subitted; - The some functions (
Assistant/get_audio.py
) will convert the recorded audio to text using Whisper; - The Assistant will expands it's internal
chat_history
with your question it will send a request with the OpenAI API, and it will update the history as soon as it recieves a full answer from ChatGPT (this may take up to 5-10 seconds, consider explicitly asking for a short answer if you are in a hurry); - If 'Hey Jarvis' has been said, the Assistatn will 'speak' using the voice-duplicating toolbox to generate a waveform from Jarvis's voice embedding;
- Elsewise, the taks is submitted to IBM Text-To-Speech services or pyttsx3;
- When any of stop key words are said the script will ask chatgpt to give a title to the conversation and will save the chat in a .txt file with the format 'CurrentDate-Title.txt';
- The assistant will then go back to sleep;
i made some other prompts, ignore the title mentioning healthcare
- to stop or save the chat, just say 'THANKS' at some point;
- To summon JARVIS voice just say 'HEY JARVIS' at some point;
not ideal i know but works for now
- [11 - 2022] Deliver chat-like prompts from python from keyboard
- [12 - 2022] Deliver chat-like prompts from python with voice
- [2 - 2023] International language support for prompt and answers
- [3 - 2023] Jarvis voice setted up
- [3 - 2023] Save conversation
- [3 - 2023] Background execution & Voice Summoning
- [3 - 2023] Improve output displayed info
- [3 - 2023] Improve JARVIS voice performaces though propmpt preprocessing
- [4 - 2023] Introducing: Project memory: store chats, events, timelines and other relevant information for a given project to be accessed later by the user or the assistant itself
- [x][4 - 2023] Create a full stack
VirtualAssistant
class with memory and local storage access - Add sound feedbacks at different stages (chimes, beeps...)
currently working on:
- International language support for voice commands
- Extend voice commands (make a beeter active assistant)
- expand Memory
following:
- Include other NLP free models if ChatGPT is unavailable (my credit is about to end)
- Connect the system to internet
- Refine memory and capabilities
- add multimodal input (i.e. "do you think 'this' [holding a paper plane] could fly" -> camera -> ChatGPT4 -> "you should improve the tip of the wings" )
- Extend project memory to images, pdfs, papers...
Check the UpdateHistory.md of the project for more insights.
Have fun!
if you have questions contact me at gianmarco.guarnier@hotmail.com
Gianmarco Guarnier