Skip to content

Tharanitharan-M/Third-Eye---Hugging-Face

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Third Eye

Third Eye is an assistive technology designed for visually impaired individuals. It uses state-of-the-art object detection and text-to-speech technologies to identify objects in real-time from a video feed and provide audio descriptions of the detected objects.

Features

  • Real-time object detection using DETR (DEtection TRansformers) from Hugging Face
  • Text-to-speech conversion for detected objects using SpeechT5 from Hugging Face
  • Plays audio descriptions of detected objects
  • User-friendly interface with bounding boxes and labels for detected objects

Installation

Prerequisites

  • Python 3.7+

Install Dependencies

  1. Clone the repository:
git clone https://github.com/Tharanitharan-M/Third-Eye---Hugging-Face.git
  1. Install the required Python packages:
pip install -r requirements.txt

Requirements File

Ensure you have a requirements.txt file with the following contents:

torch
transformers
opencv-python
Pillow
soundfile
sounddevice
datasets

Usage

  1. Run the main script:
python main.py
  1. The application will open a connection to your default camera (usually the built-in webcam) and start detecting objects in real-time.

  2. Detected objects will be outlined with bounding boxes, and their names will be displayed on the video feed. An audio description of each detected object will be played.

  3. To stop the application, press the q key.

How It Works

  • Object Detection: The application uses the DETR (DEtection TRansformers) model from Hugging Face's transformers library to detect objects in the video feed. The model processes each frame from the video feed and identifies objects, drawing bounding boxes around them and labeling them with the object's name.
  • Text-to-Speech: Once an object is detected, the object's name is converted to speech using the SpeechT5 model from Hugging Face's transformers library. The audio description is played, providing feedback to the user about the detected object.

Project Structure

thirdeye/
│
├── main.py               # Main script to run the application
├── requirements.txt      # List of Python dependencies
└── README.md             

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

Contact

For any questions or suggestions, please contact tharanimtharan@gmail.com.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages