As a proud Malayali, I've always been captivated by the beauty and complexity of the Malayalam script. This project, Malayalam Character Recognition, is inspired by my love for my native language and its unique script. The goal is to build a tool that recognizes handwritten Malayalam characters using deep learning, contributing to the digital accessibility of this culturally rich language.
This project uses a Convolutional Neural Network (CNN) to recognize and classify handwritten Malayalam characters. By identifying individual characters from images, it has potential applications in OCR (Optical Character Recognition) systems, educational tools, and digital libraries. The project aims to help digitize handwritten Malayalam text, promoting accessibility for Malayalam speakers and learners alike.
- Accurate Character Recognition: Uses deep learning to accurately recognize and classify core Malayalam characters.
- Preprocessing Pipeline: Built-in preprocessing for image-based character data to improve model accuracy.
- User-Friendly Interface: Functions for loading images and obtaining character predictions.
- Scalable Architecture: Easily adaptable for scaling with additional data or new languages.
- Flexible Input: Works with scanned or photographed images of handwritten text, making it versatile for a variety of use cases.
- Python: Core programming language
- TensorFlow/Keras: For deep learning model implementation
- OpenCV: Image processing library
- Numpy: Data manipulation and preprocessing
- Jupyter Notebooks: For experimenting and evaluating model performance
- Python 3.7+: Ensure Python is installed on your system.
- Virtual Environment (recommended): To isolate project dependencies.
- Clone the Repository:
git clone https://github.com/cyriacjohn/malayalam-character-recognition.git cd malayalam-character-recognition
Create a virtual environment (optional but recommended):
python3 -m venv venv
.\venv\Scripts\activate
Install required packages:
pip install -r requirements.txt
- Download the Dataset from here
- Organize the dataset in data/raw folder as per your requirements.
- Training the Model
- To train the model, run:
python src/main.py --mode train --epochs 20
- Evaluating the Model
- To evaluate the trained model on test data, use:
python src/main.py --mode evaluate
- Running Prediction To predict on a single image:
python src/main.py --mode predict --image_path path/to/your/image.png
The CNN architecture consists of:
- Convolutional Layers: To detect character patterns.
- Pooling Layers: To reduce dimensionality.
- Fully Connected Layers: For classification.
- Hyperparameters
- Learning Rate: 0.001
- Batch Size: 32
- Epochs: Configurable, recommended 20-30
- Training Accuracy: 92% Validation Accuracy: 89% These metrics are based on the dataset used and may vary with different data.
Contributions are welcome to help improve this project! Feel free to:
- Fork the repository. Clone your fork and create a new branch:
git checkout -b feature-branch
- Make your changes and commit them.
- Push your branch and create a Pull Request.
My goal is to contribute to the accessibility of Malayalam and promote its use in the digital world.