- ONLY If using a global maven repository definition! Modify the settings.xml to be able to pull Spring snapshot
repositories
- This applies if you are using a globally inclusive
<mirrorOf>*</mirrorOf>
- You must do ONE of the following:
- OPTION 1: Rename your
settings.xml
tosettings-backup.xml
- OPTION 2: Add the following to the
<mirrorOf>*</mirrorOf>
property in your settings.xml:<mirrorOf>*,!spring-milestones,!spring-snapshots</mirrorOf>
- OPTION 1: Rename your
settings.xml
is usually defined in the current logged-in users.m2
folder- MacOs:
~/.m2
- Windows:
C:\Users\[username]\.m2\settings.xml
- MacOs:
- This applies if you are using a globally inclusive
- Fill in the required credentials to external systems:
- Most of you will have received a complete
application-local.yml
. Just put this in the resources folder. - If you did not receive an application-local.yml, you will have to create it yourself. Do the following:
- Create a
application-local.yml
file in the resources folder. - Input OpenAI api key in
application-local.yml
property:spring.ai.openai.api-key
- Input Elasticsearch api-key in
application-local.yml
property:app.elasticsearch.api-key
- Important: This file is excluded in git. Only put the secret here! NOT in
application.yml
- Create a
- Most of you will have received a complete
- In root folder run:
docker-compose up -d
- You can kill it when done using
docker-compose down
in root folder. - Some IDE's also support going over to the docker-compose.yml file and running it from there if this is easier for you!
- You can kill it when done using
- Run application using
mvn clean spring-boot:run -Dspring-boot.run.profiles=local
- Set the
app.active-chatbot
property to the implementation that you wish to use! See thechatbot
package for all chatbot implementations- You set this property in the application.yml
- After running, start prompting directly in the console!
- All test-data is loaded in the docker database automatically on startup using database script
- Our OpenAPI api keys are capped to a limit, if you use too much, you (and others using the same key) will be cut off
- All vector store data is pre-built based on this test data. You will connect to a shared (online hosted) vector store. Keep in mind that changes to data in
your database will not reflect in the vector store since it will not automatically embed this data again
- This makes it so all of us do not have to make the same embeddings, which can be time-consuming and costly
- If you want to try and build your own vectors from data, refer to one of the following classes:
- FileVectorStoreDataLoader.java
- PublicationVectorStoreDataLoader.java
- TicketVectorStoreRepository.java
- PLEASE keep in mind that large datasets will take time to embed and cost more credit.
- All chatbots are setup in a way that the AI chatbot will keep your conversation context in mind. So you can ask follow up questions. This is setup in
the DefaultClientBuilder.java.
- Every restart the context resets, so if you want to start clean, just restart the application!
First of all, in this hackathon you are free to create and do as you please! We have added different domain entities and test-data sets to play around with.
In case you want a more specific assignment to do during this hackathon. Please see the LAB.md and corresponding java classes.
Otherwise you can also just explore and try out the examples!
You can also refer below to use-cases to try out:
Below are examples of use-cases. Some of these examples have been implemented in the examples already!
- Function-calls:
- Data correction (ex. grammar and spelling) using function calling
- Classification or summarization of data (ex. for reviews, publications) using function calling
- Natural language querying (text-to-sql) to get aggregations etc. using function calling
- Vector stores:
- Semantic searching and answering questions about publications using a vector store
The following datasets already exist:
- Publications: AI generated publication dataset. Used for vector store example
- This dataset is loaded into your local database, and is already fully embedded in the Elasticsearch vector store
- Database table:
publication
- Index:
ai-hackathon-publication
- Users and reviews: AI generated reviews combined with user data. Used for function calling example
- This dataset is loaded into your local database automatically. It is not embedded in Elasticsearch
- Database tables:
app_user
anduser_review
- Users and tickets: AI generated service desk tickets. Part of the LAB.md
lab
assignment- This dataset is loaded into your local database, and is already fully embedded in the Elasticsearch vector store
- Database tables:
app_user
andticket
- Index:
ai-hackathon-ticket
- Bee movie: bzzzt...
- This dataset is only loaded directly into the vector store
- Index:
ai-hackathon-beemovie
- Great gatsby
- This dataset is only loaded directly into the vector store
- Index:
ai-hackathon-great-gatsby
- Dummy guide to the internet
- This dataset is only loaded directly into the vector store
- Index:
ai-hackathon-dummy-internet
You can view this test data using the 'Connecting to Elasticsearch through separate client' and 'Connecting to PostgresQL through separate client' below.
Elasticsearch is our vector store for this hackathon. These vector stores were built ahead of time.
This is done since creating embeddings can be time-consuming and expensive depending on the amount of data.
The provided API keys are read-only, to prevent data modification.
If you wish to edit data, or to use this example more after the hackathon, you can use the local elasticsearch docker container.
If you wish, you can use the local docker container as your own Elasticsearch cluster.
You will have to:
- Modify the docker-compose.yml and put back the commented out
elasticsearch
section. - Change connection details in the application.yml
- Create the embeddings yourself! See examples in
SpringAIConfiguration.java
,PublicationVectorStoreDataLoader.java
andFileVectorStoreDataLoader.java
how to do so. - Again, keep in mind, do not start embedding large datasets, this takes time and costs $$$
- Install https://elasticvue.com/
- This is a small but powerful Elasticsearch client
- Can be installed as desktop client or as browser plugin
- Connect to the cluster
- In the top bar you can
Add cluster
- Use
API key
authentication, seeapplication.yml
for URI and api-key.
- In the top bar you can
- The most interesting tabs are the
indices
andsearch
tabs.indices
allows you to see all the available vector stores which are already indexed on the clustersearch
allows you to see the data within a specific vector store
- NOTE: Everyone is sharing this cluster during the hackathon, that is why all API keys are ready only!
Everyone is hosting their own PostgresQL service locally. The database automatically loads in a database dump containing the relevant data. This data (if applicable) is already embedded(or indexed) in the shared Elasticsearch cluster.
This allows us to modify the data locally without changing the data for others. This is especially useful when using things like function calling
We recommend using the database functionality inside Intellij IDEA. If this is not an option, pgAdmin is a good alternative.
https://www.pgadmin.org/download/
You can find the connection details in the docker-compose.yml
This application consists of:
- SpringBoot application with Spring AI (OpenAI) & Elasticsearch dependencies
- Different chatbot functions which each displays a feature that is offered by SpringAI (and OpenAI)
- Spring AI is the 'glue', whilst OpenAI offers most of the core features
- Different data sources and test data generation tools:
- PostgresQL with mock data for more traditional structured data
- User
- Publications
- Reviews
- Pre-built vector-store databases hosted in Elasticsearch, read-only!
- This is done since embedding data is expensive, doing so 20x for the same mock-data would be a waste.
- Resource text files for creating your own embeddings in a local elasticsearch cluster
- PostgresQL with mock data for more traditional structured data
- Docker-Compose YAML to run a local Elasticsearch vector store instance and postgres database
This is an adaptation of the Spring AI demo for Retrieval-Augmented Generation with Large Language Models, by Josh Long: https://github.com/spring-tips/llm-rag-with-spring-ai/
The accompanying video is hosted here: https://www.youtube.com/watch?v=p3aLjH2VPzU
This has been further expanded upon with features such as:
- Function calling
- Structured outputs
- More practical examples using relational database