Computer Vision

This repository contains a collection of links to my repositories showcasing implementations of Computer Vision models in Python. It features several basic models using Convolutional Neural Networks (CNNs). CNNs are a type of neural network designed to process grid-structured data like images and are particularly effective at recognizing visual patterns by capturing local features through convolutions.

Additionally, it includes models applying Transfer Learning, a technique that reuses pre-trained models on large datasets, allowing for high performance with lower computational costs. This technique leverages the knowledge acquired by models trained on millions of images or videos for specific new tasks, enhancing efficiency and accuracy.

Furthermore, several notebooks are implemented where Vision Transformer (ViT) models are fine-tuned. ViTs use attention mechanisms to process images as a whole, outperforming state-of-the-art convolutional networks and requiring fewer computational resources for training. These models excel at capturing long-range dependencies in input data, resulting in a better understanding of the global structures in images.

What is Computer Vision?

Computer Vision is a field of artificial intelligence that enables machines to interpret and understand the visual world through images and videos. It employs algorithms and deep learning models to perform tasks such as object recognition, action detection, and image segmentation. Today, it is a crucial technology in various applications, from autonomous vehicles to medical diagnostics and surveillance systems.

Implemented Models

The following are the Computer Vision models I have implemented to date:

Image Classification: This task involves assigning a label or class to an image. Inputs are pixel values that compose an image, either in grayscale or RGB. Key use cases include medical image classification, social media photo categorization, and inventory product detection.
Object Detection: These models identify and locate instances of objects such as cars, humans, buildings, animals, etc., in an image or video sequence. They return bounding box coordinates along with class labels. Current use cases include security surveillance, autonomous driving, and augmented reality.
Image Segmentation: This task classifies each pixel of an image into a specific category or instance, producing clearly defined areas for each class or object. It is divided into three main types:
- Semantic Segmentation: Assigns a class label to each pixel without distinguishing between different instances of the same class.
- Instance Segmentation: Labels each pixel and differentiates between individual instances.
- Panoptic Segmentation: Combines both techniques for detailed segmentation. Use cases include medicine (tissue segmentation), agriculture (crop detection), and robotics.
Video Classification: This task involves assigning a label or class to a video. Models process video frames and generate the probability of each class being represented. Important use cases are activity detection in security, multimedia content classification, and sports analysis.
Image Captioning: This multimodal task combines Computer Vision and Natural Language Processing (NLP) to generate textual descriptions of images. These models are useful for describing images on social media platforms, improving accessibility for visually impaired individuals, and indexing images for search engines.

Contributions

Contributions to this repository are welcome. If you have any questions or suggestions, please do not hesitate to contact me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Computer Vision

What is Computer Vision?

Implemented Models

Contributions

Technological Stack

Contact

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Computer Vision

What is Computer Vision?

Implemented Models

Contributions

Technological Stack

Contact