Training Siamese model to create a one-shot neural network model using Face-Net and MTCNN as the backbone. Yolov5 and OpenCV are used alongside to classify faces in the webcam feed. We are trying to implement face recognition with very little memory usage by switching between yolov5 and MTCNN when necessary .The target of this repository is to use run facial recognition in the background to ensure user privacy and protection while leaving more than enough memory for the user to continue using the device for personal uses.
We are using the Olivetti dataset which contains images of faces of 40 different people with 10 images for each person. We use this dataset and resize each face to 128×128 image which is then normalized and fed into pre-trained InceptionResnetV1 (vggface2 dataset weights) which returns embeddings of shape 1×512. These embeddings represent the encoded features for an image. We then take embeddings of any two images and feed them into the Siamese model where the model returns a value between 0 and 1 which represents how similar are the images that were fed. We are only training the siamese model as the InceptionResnetV1 is already trained.
MTCNN takes up most of the memory if it directly feeds the webcam feed. To solve this problem we are using the Yolov5 model to detect if there are any people in the webcam feed and if there is any change in the number of people in the webcam feed. If there is a change we are then using MTCNN network to detect faces to create boxes for the faces which are used to crop and resize the faces to the size of 128×128 which is then normalized and fed into InceptionResnetV1 giving us target embeddings. Similarly, these image embeddings are previously saved for reference images that are present in each class in the database while initializing the models so that they can be loaded directly in the future. Thes target and reference embeddings are fed into the Siamese model which finds the similarity between the target and each reference image. The reference image with the highest similarity that crosses the minimum threshold is the predicted class. While using the Yolov5 model there might be some slight error in accuracy giving us extra boxes with low person probability which might trigger MTCNN frequently causing a drop in fps and increased memory usage. To solve this problem we are using a cooldown timer to check if the change is real or if it's due to an error caused by the low accuracy of the Yolov5 model. We are also using IOU to match the boxes generated by Yolov5 and MTCNN so that the classified classes are matched with their respective bounding boxes.
This program is only meant to detect and classify people in the webcam feed so that it can be used to ensure user privacy by trading less memory over accuracy. This makes the program weak in marking the objects. For example say there are 3 people present in the webcam , if these people start moving while actively being present inside the frame of the webcam then the program might get confused while marking which class name belongs to which person. But the position actually doesn't matter to ensure users' privacy. Still, this problem can be solved by implementing short-term memory for previous object locations and using it to predict the closest possible location. Here I have solved the problem slightly inefficiently by running MTCNN network every few seconds.
pip install -r requirements.txt
Database path : database/
Siamese model path : saved_models/siamese_model
Yolov5 type : Medium (yolov5m)
Cooldown limit : 0.5s
Regular check limit : 3s
Screen size : 800×600
Screen Scale (x,y): (1,1)
usage: start.py [-h] [-db DB_PATH] [-smp SIAMESE_MODEL_PATH] [-load LOAD_FROM_FILE] [-yolov5 YOLOV5_TYPE]
[-cdl COOLDOWN_LIMIT] [-rcl REGULAR_CHECK_LIMIT] [-size SCREEN_SIZE] [-scale SCALE]
optional arguments:
-h, --help show this help message and exit
-db DB_PATH, --db_path DB_PATH
Database path . Use relative path . Default path : database/
-smp SIAMESE_MODEL_PATH, --siamese_model_path SIAMESE_MODEL_PATH
Siamese Model path . Use relative path . Default path : saved_models/siamese_model
-load LOAD_FROM_FILE, --load_from_file LOAD_FROM_FILE
[TRUE] if you want to load reference embeddings from previously generated file , [FALSE] if
you want to recompile or create new embeddings for the reference images . Default is set to
TRUE
-yolov5 YOLOV5_TYPE, --yolov5_type YOLOV5_TYPE
Enter which yolov5 model you want to use : [yolov5s] ,[yolov5m] ,[yolov5l] ,[yolov5x] .
Default type : yolov5m
-cdl COOLDOWN_LIMIT, --cooldown_limit COOLDOWN_LIMIT
Lower the cooldown higher the precision higher the memory usage . Default value : 0.5s
-rcl REGULAR_CHECK_LIMIT, --regular_check_limit REGULAR_CHECK_LIMIT
Helps in correcting previous errors by either the camera or the program . Default value : 3s
-size SCREEN_SIZE, --screen_size SCREEN_SIZE
Set Default screen size for the webcam feed : [SCREEN_W*SCREEN_H] . Default size : 800*600
-scale SCALE, --scale SCALE
Set Default scale for the webcam feed : [SCALE_X*SCALE_Y] . Default size : 1*1
python start.py -db ../../Documents/database/ -smp ../../Documents/saved_models/siamese_model -load TRUE -yolov5 yolov5m -cdl 0.25 -rcl 3 -size 775*580 -scale 1*1
I changed the webcam feed to sample Image. While testing with my webcam directly I was still able to maintain high fps while consuming very little memory