Bông - A Gemini-o Web Application

Bông - A Gemini-o Web Application
Contributors

Overview

As part of Google for Developers' mission to build for the community, the comprehensive workflow of the web application presented at the AI booth during Google I/O Extended Hanoi 2024 is shared. This application, named Bông, is a real-time VLM web app featuring both voice input and output capabilities.


Speak, See, and Interact with Bông

Features

Real-time VLM Web App: Supports both voice input and output for interactive experiences.
Multimodal Model Integration: Utilizes Gemini 1.5 Flash for handling diverse inputs including audio, images, videos, and text.
Google Ecosystem Utilization: Employs Google's API and WaveNet TTS for enhanced communication capabilities in Vietnamese.
RAG Workflow: Incorporates Retrieval-Augmented Generation to keep the app updated with event information and GDG Hanoi news.
Natural and Humorous Responses: Designed to engage attendees with real-time, context-aware interactions.

Technical Details

Model and Processing

Gemini 1.5 Flash: A lightweight model optimized for speed and efficiency at scale, supporting up to 1M context lengths.
Multimodal Input: Accepts inputs from webcam videos, microphone speech recognition, and other media types.
Google Cloud's WaveNet TTS: Enhances the app's ability to communicate naturally in Vietnamese.

Workflow

Embedding Extraction: Uses Google Text Embedding API to extract embeddings from text information on URLs.
Chain Construction with LangChain: Constructs a system prompt incorporating conversational history for memory caching.
Real-time Response: The web application responds in real-time despite noisy environments and multiple individuals in the frame.

Installation

Clone the repository:

git clone https://github.com/tuanlda78202/geminio.git

Navigate to the project directory:
```
cd geminio
```
Install dependencies:
```
npm install
```
Download .google-cloud-credentials from Google Cloud and set up VITE_GEMINI_KEY, GOOGLE_APPLICATION_CREDENTIALS in .env.
Run the application:
```
npm run dev
```

Usage

Open port 3001 for Google Cloud TTS.
Open your browser and navigate to http://localhost:3000.
Allow access to your microphone and webcam.
Interact with the application using voice commands and visual inputs.

Contributing

We welcome contributions from the community. Please follow these steps to contribute:

Fork the repository.
Create a new branch for your feature or bug fix:
```
git checkout -b feature-name
```

Commit your changes:

git commit -m "Description of feature or fix"

Push to the branch:
```
git push origin feature-name
```
Create a pull request on GitHub.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or feedback, please contact me.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
public		public
src		src
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.cjs		postcss.config.cjs
server.js		server.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bông - A Gemini-o Web Application

Overview

Features

Technical Details

Model and Processing

Workflow

Installation

Usage

Contributing

License

Contact

Contributors

About

Contributors 2

Languages

License

Google-Developer-Group-HaNoi/bong

Folders and files

Latest commit

History

Repository files navigation

Bông - A Gemini-o Web Application

Overview

Features

Technical Details

Model and Processing

Workflow

Installation

Usage

Contributing

License

Contact

Contributors

About

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages