smart-zoneminder

smart-zoneminder enables fast and accurate object detection, face recognition and upload of ZoneMinder alarm images to an S3 archive where they are made accessible by voice via Alexa.

The use of object detection remotely via Rekognition or locally via a TensorFlow-based CNN dramatically reduces the number of false alarms and provides for robust scene and object detection. Face recognition via ageitgey's Python API to dlib can be used to identify people detected in the alarm images, alternatively people can be recognized by another TensorFlow-based CNN. Alexa allows a user to ask to see an image or a video corresponding to an alarm and to get information on what caused the alarm and when it occurred.

The local processing of the machine learning workloads employed by this project can be configured to run on GPU or TPU hardware.

smart-zoneminder in its default configuration stores about three weeks of continuous video at the edge and one year of alarm images in the cloud. It costs as little as $8 per year per camera to operate.

Usage Examples

Here are a few of the things you can do with smart-zoneminder.

Note that in all the examples below if the user makes the request to an Alexa device without a screen then the skill will make an attempt to verbalize the response to the user as clearly as possible.

Ask Alexa to show the last alarm from a camera due to a person or thing

General form:

"Alexa, ask zone minder to show {Location} alarm of {PersonOrThing}"

If the user does not provide a location then the most recent alarm will be shown from any camera and if a specific person or thing is not given then an alarm caused by any person will be shown. Location can be any camera name defined in the configuration and PersonOrThing can be the name of any person defined in the configuration, 'stranger' for any person not defined or any label given in the COCO dataset. The user can see a video corresponding to the alarm by asking Alexa to "show video clip."

Specific example 1:

User: "Alexa, ask zone minder to show front porch alarm"

Alexa: "Showing last alarm from front porch camera on 2018-10-30 18:25"

Specific example 2:

User: "Alexa ask zone minder for back garage alarm of stranger"

Alexa: "Alarm from back garage caused by stranger on 2018-10-29 13:10"

Ask Alexa to show alarms from a camera due to a person or thing starting from some time ago

General form:

"Alexa, ask zone minder to show {Location} alarms of {PersonOrThing} from {SomeTimeAgo} ago"

If the user does not provide a location then the last alarm will be shown from all cameras (this can also be triggered by simply asking Alexa to "show all"). If a specific person or thing is not given then an alarm caused by any person will be shown. Location can be any camera name defined in the configuration and PersonOrThing can be the name of any person defined in the configuration, 'stranger' for any person not defined or any label given in the COCO dataset. If a duration was not given by {SomeTimeAgo} then the last alarms will be shown starting from three days ago. In all cases the number of alarms shown will not exceed 64 on the screen due to an Alexa service limitation. The user can scroll though the alarms by either touch or voice and can see a video clip corresponding to the alarm by asking Alexa to "show video clip".

Specific example 1:

User: "Alexa, ask zone minder to show front porch alarms"

Alexa: "Showing oldest alarms first from front porch camera"

Specific example 2:

User: "Alexa, ask zone minder to show alarms of Lindo"

Alexa: "Showing latest alarms from all cameras caused by Lindo"

Specific example 3:

User: "Alexa, ask zone minder to show backyard alarms of Polly"

Alexa: "Showing oldest alarms first from backyard for Polly"

Ask Alexa to play a video of a last alarm from a camera

Note: smart-zoneminder currently does not support live streaming of camera feeds. I recommend that you use alexa-ip-cam for streaming your cameras feeds live on Echo devices.

General form:

"Alexa, show {Location} video clip"

If the user does not provide a camera location then the last video clip of any alarm will be displayed.

Specific example:

User: "Alexa, ask zone minder to show front porch video clip"

Alexa: "Showing most recent video clip from front porch alarm."

Result: Video of last alarm clip from this camera will play on an Echo device with a screen.

Ask Alexa a series of commands to view alarms and videos

The skill can handle series commands that, for example, allow the user to view an alarm and then view a video clip containing that alarm. Here are some videos of these examples.

Specific example 1:

User commands: (1) Ask Alexa to show all events; (2) view a particular alarm; (3) view a video containing that alarm; (4) go back and select another alarm.

Click on image below to see video of Alexa response:

Specific example 2:

User commands: (1) Ask Alexa to show last alarm; (2) show last alarm from back garage; (3) show a video clip of that alarm.

Click on image below to see video of Alexa response:

Specific example 3:

User commands: (1) Ask Alexa to show front porch alarms of Lindo; (2) scroll to find an alarm; (3) select an alarm; (4) view video clip of alarm; (5) go back to list of alarms; (6) select another alarm; (7) view video clip of alarm; (8) go back; (9) select another alarm; (10) view video clip of alarm; (11) exit.

Click on image below to see video of Alexa response:

Send Emails of Alarms

smart-zoneminder can email alarms based on the face detected in the image. Below are examples of alarm emails sent to a mobile device with filter criterion set to any of my family members.

Alexa Notifications

As soon as the Alexa Skills Kit supports notifications they will be added.

Project Requirements

My high level goals and associated requirements for this project are shown below.

Quickly archive Zoneminder alarm frames to the cloud in order to safeguard against malicious removal of on-site server. This lead to the requirement of a ten second or less upload time to a secure AWS S3 bucket. Although ZoneMinder has a built-in ftp-based filter it was sub-optimal for this application as explained below.
Significantly reduce false positives from ZoneMinder's pixel-based motion detection. This lead to the requirement to use a higher-level object and person detection algorithm based on Amazon Rekognition remotely or TensorFlow locally (this is configurable).
Determine if a person detected in an Alarm image is familiar or not. This lead to the requirement to perform real-time face recognition on people detected in ZoneMinder images.
Make it easy and intuitive to access ZoneMinder information. This lead to the requirement to use voice to interact with ZoneMinder, implemented by an Amazon Alexa Skill. This includes proactive notifications, e.g., the Alexa service telling you that an alarm has occurred and why. For example, when an unknown person was seen by a camera or when a known person was seen. Another example is time-, object- and person-based voice search.
Have low implementation and operating costs. This lead to the requirement to leverage existing components where possible and make economical use of the AWS services. This also led to the option of using local TensorFlow based object detection since using Rekognition at scale is not inexpensive wrt the goals of this project. An operating cost of less than $10 per year is the goal.
Be competitive with smart camera systems out in the market from Nest, Amazon, and others that use image recognition and Alexa.
Learn about, and show others how to use, TensorFlow, Face Recognition, ZoneMinder, Alexa, AWS and leveraging both edge and cloud compute.

System Architecture

The figure below shows the smart-zoneminder system architecture.

The figure below shows a high-level view of the edge compute architecure.

Image Processing Pipeline

The figure below shows the smart-zoneminder image processing pipeline.

Edge Setup and Configuration

A Linux server and a Google Coral Dev Board are the hardware used for local compute and storage in this project. Object and face/person recognition can be run on either the server or the Coral dev board. See tpu-servers for installation and configuration instructions associated with the Google Coral dev board software components. Some details regarding the server hardware used in this project can be found in the appendix. The rest of this section describes the Linux server components and how to install and configure them.

ZoneMinder

You need to have ZoneMinder installed on a local linux machine to use smart-zoneminder. I'm using version 1.30.4 which is installed on machine running Ubuntu 18.04. I followed Ubuntu Server 18.04 64-bit with Zoneminder 1.30.4 the easy way to install ZoneMinder.

I have the monitor function set to Mocord which means that the camera streams will be continuously recorded, with motion being marked as an alarm within an event (which is a 600 second block of continuously recorded video). ZoneMinder stores the camera streams as JPEGs for each video frame in the event. I chose this mode because I wanted to have a record of all the video as well as the alarms. ZoneMinder does provide for a means ("filters") to upload an event to an external server when certain conditions are met, such as an alarm occurring. Its possible to use such a filter instead of the uploader I created but I didn't want to upload 600 s worth of images every time an alarm occurred and the filter would have been slow, worse case being almost 600 s if an alarm happened at the start of an event.

Its very important to configure ZoneMinder's motion detection properly to limit the number of false positives in order to minimize cloud costs, most critically AWS Rekognition. Even though the Rekognition Image API has a free tier that allows 5,000 images per month to be analyzed its very easy for a single camera to see many thousands of alarm frames per month in a high traffic area and every alarm frame is a JPEG that is sent to the cloud to be processed via the Rekognition Image API. There are many guides on the Internet to help configure ZoneMinder motion detection. I found Understanding ZoneMinder's Zoning system for Dummies to be very useful but it takes some trial and error to get it right given each situation is so different. Zoneminder is configured to analyze the feeds for motion at 5 FPS which also helps to limit Rekognition costs but it comes at the expense of possibly missing a high speed object moving through the camera's FOV (however unlikely in my situation). Since I was still concerned about Rekognition costs I also included the option to run local TensorFlow-based object detection instead. This comes at the expense of slightly higher detection times (with my current HW which uses a Nvidia Geforce GTX 1080Ti GPU for TensorFlow) but completely avoids Rekogntion costs.

If set to use remote object detection via Rekognition smart-zoneminder can be configured to either send all or some alarm frames (as specified by the frameSkip parameter in the uploader's config file) detected by ZoneMinder's motion detector to the cloud. This is expensive. Clearly there are more optimal ways to process the alarms locally in terms of more advanced motion detection algorithms and exploiting the temporal coherence between alarm frames that would limit cloud costs without some of the current restrictions. This is an area for future study by the project.

I have seven 1080p PoE cameras being served by my ZoneMinder setup. The cameras are sending MJPEG over RTSP to ZoneMinder at 5 FPS. I've configured the cameras' shutter to minimize motion blur at the expense of noise in low light situations since I found Rekognition's accuracy is more affected by the former. The object detection in TensorFlow seems more robust in this regard.

Some of the components interface with ZoneMinder's MySql database and image store and make assumptions about where those are in the filesystem. I've tried to pull these dependencies out into configuration files where feasible but if you heavily customize ZoneMinder its likely some path in the component code will need to be modified that's not in a configuration file.

TensorFlow

This project uses TensorFlow (with GPU support) for local object detection. I followed Installing TensorFlow on Ubuntu as a guide to install it on my local machine and I used a Python Virtual environment. After I installed TensorFlow I installed the object detection API using Step by Step TensorFlow Object Detection API Tutorial and this as guides. I'm currently using the rfcn_resnet101_coco model which can be found in the TensorFlow detection model zoo. See the Appendix for model benchmarking and selection.

dlib, face_recognition, scikit-learn and OpenCV

ageitgey's face_recognition API is used for face detection and for knn-based recognition. I followed the linux installation guide to install the API and dlib with GPU support on my local machine in a Python virtual environment. scikit-learn is used to train an SVM for more robust face recognition from the face encodings generated by dlib. I installed scikit-learn via pip per these instructions. OpenCV is used to preprocess the image for face recognition, I used OpenCV 3 Tutorials, Resources, and Guides to install OpenCV 3.4.2 with GPU support on my local machine. A high-level overview of how the face recognition works can be found here and here.

Apache

If you installed ZoneMinder successfully then apache should be up and running but a few modifications are required for this project. The Alexa VideoApp Interface that is used to display clips of alarm videos requires the video file to be hosted at an Internet-accessible HTTPS endpoint. HTTPS is required, and the domain hosting the files must present a valid, trusted SSL certificate. Self-signed certificates cannot be used. Since the video clip is generated on the local server Apache needs to serve the video file in this manner. This means that you need to setup a HTTPS virtual host with a publicly accessible directory on your local machine. Note that you can also leverage this to access the ZoneMinder web interface in a secure manner externally. Here are the steps I followed to configure Apache to use HTTPS and serve the alarm video clip.

Get a hostname via a DDNS or DNS provider. I used noip.
Get a SSL cert from a CA. I used Let's Encrypt and the command at my local machine certbot -d [hostname] --rsa-key-size 4096 --manual --preferred-challenges dns certonly. It will ask you to verify domain ownership by creating a special DNS record at your provider.
Follow How To Create a SSL Certificate on Apache for Debian 8 except instead of using self-signed certs use the certs generated above.
Create a directory to hold the generated alarm clip and make the permissions for g and o equal to rx. I created this directory at /var/www/loginto.me/public and there touch a file called alarm-video.mp4 and give it rx permissions of u,g, and o. This will allow the generator to write a video by that name to this directory.
Configure Apache to allow the public directory to be accessed and configure Apache to allow the CGI to be used. You should allow the CGI script only to be accessed externally via HTTPS and only with a password. You can copy the configuration file in apache/smart-zoneminder.conf to your Apache config-available directory, modify it to your needs and enable it in Apache.
Restart Apache.
Allow external access to Apache by opening the right port on your firewall.

MongoDB

I use a local mongo database to store how every alarm frame was processed by the system. Its important to record the information locally since depending on what options are set not all alarm frames and their associated metadata will be uploaded to AWS S3. The mongo logging can be toggled on or off by a configuration setting. See How to Install MongoDB on Ubuntu 18.04 for instructions on how to install mongo on your system.

Alarm Uploader (zm-s3-upload)

The Alarm Uploader, zm-s3-upload, is a node.js application running on the local server that continually monitors ZoneMinder's database for new alarm frames images and if found either directly sends them to an S3 bucket or first runs local object detection and or face recognition on the image and marks them as having been uploaded.

There are several important configuration parameters associated with object and face recognition that are set at runtime by the values in zm-s3-upload-config.json . Local object detection is enabled by setting the runLocalObjDet flag to "true" and face recognition is enabled by setting the runFaceDetRec flag to "true". Additionally, object and face detection can be run on the Google Coral dev board instead of the server, this is configured by the objDetZerorpcPipe and faceDetZerorpcPipe settings, respectively. Note you can run any server-Coral combination of local object and face detection.

The Alarm Uploader attaches metadata to the alarm frame image such as alarm score, event ID, frame number, date, and others. The metadata is used later on by the cloud services to process the image. The Alarm Uploader will concurrently upload alarm frames to optimize overall upload time. The default value is ten concurrent uploads. Upload speed will vary depending on your Internet bandwidth, image size and other factors but typically frames will be uploaded to S3 in less than a few hundred milliseconds.

The Alarm Uploader can be configured to skip alarm frames to minimize processing time, upload bandwidth and cloud storage. This is controlled by the frameSkip parameter in the configuration json.

The Alarm Uploader is run as a Linux service using systemd.

Please see the Alarm Uploader's README for installation instructions.

Local Object Detection (obj-detect)

The Object Detection Server, obj_det_server, runs the Tensorflow object detection inference engine using Python APIs and employs zerorpc to communicate with the Alarm Uploader. One of the benefits of using zerorpc is that the object detection server can easily be run on another machine, apart from the machine running ZoneMinder. Another benefit is that the server when started will load into memory the model and initialize it, thus saving time when an inference is actually run. The server can optionally skip inferences on consecutive ZoneMinder Alarm frames to minimize processing time which obviously assumes the same object is in every frame. The Object Detection Server is run as a Linux service using systemd.

I benchmarked a few Tensorflow object detection models on the machine running smart-zoneminder in order to pick the best model in terms of performance and accuracy. See the Appendix for this analysis.

Please see the Object Detection Server's README for installation instructions.

Face Recognition (face-det-rec)

The Face Detection and Recognition Server, face_detect_server.py, runs the dlib face detection and recognition engine using Python APIs and employs zerorpc to communicate with the Alarm Uploader. One of the benefits of using zerorpc is that the object detection server can easily be run on another machine, apart from the machine running ZoneMinder (e.g. when using the tpu version of this program). Face Detection and Recognition Server is run as a Linux service using systemd.

There are a number of parameters in this module that can be adjusted to optimize face detection and recognition accuracy and attendant compute. You may need to adjust these parameters to suit your configuration. These are summarized below.

Parameter	Default Value	Note
MIN_SVM_PROBA	0.8	Minimum probability for a valid face returned by the SVM classifier.
NUMBER_OF_TIMES_TO_UPSAMPLE	1	Factor to scale image when looking for faces.
FACE_DET_MODEL	cnn	Can be either 'cnn' or 'hog'. cnn works much better but uses more memory and is slower.
NUM_JITTERS	100	How many times to re-sample when calculating face encoding
FOCUS_MEASURE_THRESHOLD	200	Images with Variance of Laplacian less than this are declared blurry.

MIN_SVM_PROBA sets the minimum probablity that will be declared a valid face from the svm-based classifier. FOCUS_MEASURE_THRESHOLD sets the threshold for a Variance of Laplacian measurement of the image, if below this threshold the image is declared to be too blurry for face recognition to take place.

Please see the Face Recognition's README for installation instructions.

Person Classification (person-class)

The Person Classification Server, person_classifier_server.py, runs a TensorFlow deep convolutional neural network (CNN)-based person classifier using Python APIs and employees zerorpc to communicate with the Alarm Uploader. One of the benefits of using zerorpc is that the server can easily be run on another machine, apart from the machine running ZoneMinder. The Person Classification Server is run as a Linux service using systemd.

This server uses a fine-tuned CNN to classify that a person object detected by obj-detect is member of my family or a stranger. It is an alternative to face-det-rec and so one or the other must be run but not both.

Note that face-det-rec includes shallow learning methods (SVM or XGBoost classifiers) in the final stage of a pipeline to recognize faces in alarm images. CNNs have been shown to outperform shallow learning methods for many computer vision tasks given sufficient training data; this was the main motivation for developing person-class.

Please see the Person Classification Server's README for installation instructions.

Alarm Clip Generator (gen-vid)

The Alarm Clip Generator, gen-vid, is a python script run in Apache's CGI on the local server that generates an MP4 video of an alarm event given its Event ID, starting Frame ID and ending Frame ID. The script is initiated via the CGI by the Alexa skill handler and the resulting video is played back on an Echo device with a screen upon a user's request.

ZoneMinder does offer a streaming video API that can be used to view the event with the alarm frames via a web browser. However the Alexa VideoApp Interface that's used to playback the alarm clip requires very specific formats which are not supported by the ZoneMinder streaming API. Additionally I wanted to show only the alarm frames and not the entire event which also isn't supported by the Zoneminder API. Also its possible to create the video clip completely on the cloud from the alarm images stored in DynamoDB, however gaps would likely exist in videos created this way because there's no guarantee that ZoneMinder's motion detection would pick up all frames. So I decided to create gen-vid but it does come at the expense of complexity and user perceived latency since a long alarm clip takes some time to generate on my local machine. I'll be working to reduce this latency.

Please see the Alarm Clip Generator's README for installation instructions. Apache must be setup to enable the CGI, see above.

Cloud Setup and Configuration

The sections below describe the cloud-side components and how to install and configure them. You'll need an Amazon Developers account to use the Alexa skills I developed for this project since I haven't published them. You'll also need an Amazon AWS account to run the skill's handler, the other lambda functions required for this project and DynamoDB and S3.

DynamoDB

smart-zoneminder uses a DynamoDB table to store information about the alarm frame images uploaded to S3. This table needs to be created either through the AWS cli or the console. Here's how to do it via the console.

Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/. Make sure you are using the AWS Region that you will later create smart-zoneminder's lambda functions.
Choose Create Table.
In the Create DynamoDB table screen, do the following:
- In the Table name field, type ZmAlarmFrames.
- For the Primary key, in the Partition key field, type ZmCameraName. Set the data type to String.
- Choose Add sort key.
- In the Sort Key field type ZmEventDateTime. Set the data type to String.

When the settings are as you want them, choose Create.

S3

You'll need an S3 bucket where your images can be uploaded for processing and archived. You can create the bucket either through the AWS cli or the console, here's how to do it via the console.

Thanks to Paul Branston for great suggestions to set secure S3 permissions. Also see this blog for additional related information.

Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
Choose Create bucket.
In the Bucket name field type zm-alarm-frames.
For Region, choose the region where you want the bucket to reside. This should be the same as the DynamoDB and lambda functions region.
Choose Create.
The bucket will need two root directories, /upload and /archive. Choose Create folder to make these.
Directly under the /archive directory, create the /alerts and /falsepositives subdirectories, again by using choosing Create folder.
Now you need to limit access to the bucket, so start by log into to the AWS IAM AWS console.
Create a new user.
Set a password for the new user. Your user will also have an AWS access and secret key created. API clients (e.g., zm-s2-upload) need to use these keys and will have the same permissions as the user would in the AWS console. Save the AWS access key and the secret key which will be used in a step below.
Add permissions so that only this user has access to the bucket. My permissions to do that are shown below.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "s3:ListAllMyBuckets",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::zm-alarm-frames",
                "arn:aws:s3:::zm-alarm-frames/*"
            ]
        }
    ]
}

Note that the Alexa devices require a public URI for all images and videos that these devices display. You can either point the URI to the S3 bucket or to the the server on the local network containing the ZoneMinder image store. The lambda handler for the Alexa skill can be configured to point to either the S3 bucket or the local network for image access by changing the USE_LOCAL_PATH constant.

In the case of pointing the URI to the S3 bucket (USE_LOCAL_PATH = false) the Alexa skill handler will use signed s3 urls with an expiration time. This is the recommended approach.

Alternatively, you can serve the ZoneMinder event files locally (USE_LOCAL_PATH = true) and point a public URI to the files that the Alexa devices on your local network can access. The latency of this approach is slightly lower but comes at the expense of configuring the Apache server for this purpose and creates the potential for the DynamoDB database to be out of synch with the images stored locally since the database is only guaranteed to reflect the S3 store. If you want to enabled local access, follow the steps below.

Setup a DNS entry for the Apache server's private IP address on your LAN. I used GoDaddy but any DNS host should work, just create an A record for the Apache server's IP address and give it a hostname. Putting Private IP's into public DNS is discouraged, but since this is for personal use its fine.
Get an SSL cert and use Domain Name Validation to secure the domain. I used LetsEncrypt.
Create a site configuration file for an Apace Virtual Host for the domain and create a Directory entry to serve the ZoneMinder event files. Here's mine.

Alias /nvr /nvr
<Directory "/nvr">
  DirectoryIndex disabled
  Options Indexes FollowSymLinks
  AuthType None
  Require all granted
</Directory>

Trigger Image Processing (s3-trigger-image-processing)

The Trigger Image Processing component (s3-trigger-image-processing) is an AWS Lambda Function that monitors the S3 bucket "upload" directory for new ZoneMinder alarm image files uploaded by the Edge Compute and triggers their processing by calling the step function. There are several AWS Lambda Functions that process the alarm frames. These are described below and are in the aws-lambda folder.