Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Docker healthcheck for /10.0 #115

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Added Docker healthcheck for /10.0 #115

wants to merge 1 commit into from

Conversation

sylnsr
Copy link

@sylnsr sylnsr commented Jul 17, 2017

Explanation:
Added Docker HEALTHCHECK command. Pointing to /web/database/selector for the health check because it is the first functional thing that should work and be available for a typical install. For custom deployments where admins wish to disable access to /web/database/selector they could use a different URL, or they could just prevent access to that path at the proxy server, in which case nothing changes in within the Docker environment for this to work. (See Docker's HEALTHCHECK documentation for more)

Purpose:
Provide rudimentary Odoo server state info to Docker service so that Docker knows if the service in the container is healthy or not. Contributes to monitoring and swarm management.

@linkit-odoo
Copy link

That URL (as well as /) cause Odoo to create a session, which is stored in the sessions store in data_dir. There are two problems in continuosly checking the URL (and originating sessions):

  1. sessions keep accumulating -- one or more per minute, in a typical case
  2. worse, those open sessions interfere with operations like new database creation, where the operation oftern stops with "cuncurrent update" error. I double cheched and, in my case, the healtchecks were indeed the culprit

We need to indentify (or create) a good target for healthcheck interrogations.

@sylnsr
Copy link
Author

sylnsr commented Jul 17, 2018

Good catch. In hind-sight, I also think it needs a URL with a much smaller return body. This is one option:
/web_editor/static/src/xml/ace.xml because:

  1. it does not generate a session and
  2. the return body it smaller than most.

Also, its static so I don't expect it to grow. I understand that in certain health checks (AWS ELB for example), only the status code is checked and the body is essentially thrown away, however, I think the request should still be for a small static asset, in order to be as efficient as possible. I only mention AWS ELB, because I think if someone happens to put their containers behind a load balancer that checks the return status of the health check endpoint .. they might use the same endpoint used for the Docker health check.

So how about that one?

@thomas15v
Copy link

thomas15v commented Jul 18, 2018

Doesn't work either. The static files do not reflect odoo's status. You can shutdown the database it will still think that odoo is fine. I tried this command it worked fine:

HEALTHCHECK CMD curl --fail -b /tmp/cookies --cookie-jar /tmp/cookies -s -H "Content-Type: application/json" -X POST -d '{"jsonrpc":"2.0","method":"call","params":{"method":"list","service":"db","args":{}}}' 0.0.0.0:8069/jsonrpc || exit 1

This command will execute a jsonrpc call to list available databases. This forces odoo to check the database connection and call the connection as well. What also means that if the disk is full this status should go to unhealthy as well. Sadly I could not avoid odoo from creating a session, however I gave curl a cookie-jar so odoo doesn't keep creating sessions.

@sylnsr
Copy link
Author

sylnsr commented Jul 19, 2018

That's a great idea. Perhaps at this point its important to define what the objective is of the health-check. Should it be comprehensive or simple? Is it just check the basic health of the container or the system as a whole.

For example, you could have another situation where the instance can connect to the DB and create cookies, however, due to running out of RAM, less assets don't compile so the UI is hosed. Is that something that should be accounted for as well?

@sylnsr
Copy link
Author

sylnsr commented Jul 19, 2018

Another thing to consider is compliance with deployment recommendations. So for example, @ https://www.odoo.com/documentation/10.0/setup/deploy.html#security its recommended to "set list_db configuration option to False, to prevent listing databases entirely". I know that there are some people who follow this recommendation religiously ... so in that case, listing the DBs would not work.

@SalahAdDin
Copy link

Yes, it is true.

@mlaitinen
Copy link

Camptocamp seems to have found a solution to this problem by expiring the session immediately: https://github.com/camptocamp/odoo-cloud-platform/blob/11.0/monitoring_status/controllers/main.py

@sylnsr
Copy link
Author

sylnsr commented Dec 31, 2018

I am current using my proposal from above to get a static file, namely:
/web_editor/static/src/xml/ace.xml

Features:

  • does not create a cookie
  • uses stock code (no extra modules to install)
  • the file is really small and overhead of getting it is negligible
  • is very generic and can easily be changed by admins they need something more custom

An implementation: https://github.com/idazco/odoo-docker-ez/blob/master/10/Dockerfile#L115
For professional deployments we use something like Prometheus to check many parts of each deployment. 👍

@thomas15v , regarding The static files do not reflect odoo's status, I do not agree that the Docker health check should encompass the status of Odoo itself. Can we concretely define what "Odoo's status" even means? Does it encompass all unit tests passing with a stock install?

What we should define first, is what status are we trying to check .. or at least .. what is reasonable to check for a stock deployment:

  • Odoo is alive and can serve data (basic/simple; leaves responsibility of deep checking to other tools)

or

  • My Odoo implementation with all my chosen modules is OK. (impractical since each implementation is unique, and may include unreliable modules)

From the Docker documentation

The HEALTHCHECK instruction tells Docker how to test a container to check that it is still working. This can detect cases such as a web server that is stuck in an infinite loop and unable to handle new connections, even though the server process is still running.

So the basic advocated objective of the HEALTHCHECK is very high level, just to see if the container is working or if its "unable to handle new connections". The Docker authors never advocated it to be a thorough, deep check to see if the application is functioning according to design. Since they mention checking a "web server", we should be thinking in terms of simply checking Werkzeug, that it is listening and can respond.

@ysinjab
Copy link

ysinjab commented Dec 13, 2019

Doesn't work either. The static files do not reflect odoo's status. You can shutdown the database it will still think that odoo is fine. I tried this command it worked fine:

HEALTHCHECK CMD curl --fail -b /tmp/cookies --cookie-jar /tmp/cookies -s -H "Content-Type: application/json" -X POST -d '{"jsonrpc":"2.0","method":"call","params":{"method":"list","service":"db","args":{}}}' 0.0.0.0:8069/jsonrpc || exit 1

This command will execute a jsonrpc call to list available databases. This forces odoo to check the database connection and call the connection as well. What also means that if the disk is full this status should go to unhealthy as well. Sadly I could not avoid odoo from creating a session, however I gave curl a cookie-jar so odoo doesn't keep creating sessions.

Great Idea, but it has performance issue when you have to list hundreds of databases.

@thomas15v
Copy link

@ysinjab why do you have 100 databases on a single Odoo container? Wouldn't it be better to scale those databases to different odoo containers and use an HTTP proxy to sort these out with domain names?

@ysinjab
Copy link

ysinjab commented Dec 13, 2019

@ysinjab why do you have 100 databases on a single Odoo container? Wouldn't it be better to scale those databases to different odoo containers and use an HTTP proxy to sort these out with domain names?

The system we have requires separate tenant for every user. so I have one postgresql server and a database for every user. I have RESTful api as a proxy for odoo where every user is connected to his own database.

@sylnsr
Copy link
Author

sylnsr commented Dec 14, 2019

@ysinjab Container health and Odoo system status / database status are different things. The point of the container health check is just to give you the basic running health status of the container itself ... not all parts of the application in that container. The basic functionality behind Odoo is the Werkzeug web server. If thats down then you have a fundamental problem with the health of the container. Everything beyond that .. such as database access ... file system access etc, is the subject of a separate instrumention solution. So based on what the actual purpose of a health check is, it works perfectly.

@sylnsr
Copy link
Author

sylnsr commented Dec 14, 2019

Folks you need to keep one thing in mind, the container starts a Python process with a PID attached to PID 1. The point of the container health check is to make sure that the app attached to PID 1 in that container is not hung. That's it. Don't make this more complicated based on your own use cases.

@d-fence
Copy link
Contributor

d-fence commented Jan 22, 2020

what if the server is started with --no-http for one reason or another ?

@mamiu
Copy link

mamiu commented Nov 28, 2020

We have our own Dockerfile that wraps the default odoo docker image and adds some scripts. One of the scripts is for health checks (or liveness probe as it's called in kubernetes terms). It prevents the continuously growing number of sessions quite well. (Of course it won't work with --no-http as mentioned by @d-fence in the comment above.)

Maybe some of you will benefit from this:

#!/bin/bash

RESTORE_LOCK_FILE="$HOME/.restore.lock"
LIVENESS_PROBE_URL="localhost:8069"
LIVENESS_PROBE_FILE="$HOME/.liveness-probe-session"

# Succeed if a restore process is in progress to prevent interruption
[ -f "$RESTORE_LOCK_FILE" ] && exit 0

ensure_odoo_is_running()
{
    if [ $? -gt 0 ]; then
        # Odoo is down or does not respond
        echo "ERROR: No connection to \"http://${LIVENESS_PROBE_URL}\"!"
        exit 1
    fi
}

# Use this method to reliably get a session cookie
initialize_session()
{
    SESSION_COOKIE="$1"

    [ -n "$SESSION_COOKIE" ] && CURL_SESSION_ARG=(--header "cookie: session_id=$SESSION_COOKIE")

    SESSION_COOKIE="$(
        curl "http://${LIVENESS_PROBE_URL}/web/database/selector" \
            "${CURL_SESSION_ARG[@]}" \
            --silent \
            --cookie-jar - \
            --output /dev/null \
            | grep 'session_id' \
            | sed -e 's/^.*session_id.//g' \
            | sed -e 's/[ \t].*$//g'
    )"

    ensure_odoo_is_running

    if [ -z "$SESSION_COOKIE" ]; then
        echo "ERROR: Could not get session cookie!"
        exit 1
    else
        echo -n "$SESSION_COOKIE" > "$LIVENESS_PROBE_FILE"
        exit 0
    fi
}

# If there is already a liveness probe session use that one
if [ -f "$LIVENESS_PROBE_FILE" ]; then
    # Get existing session
    SESSION_COOKIE="$( cat $LIVENESS_PROBE_FILE )"

    RESPONSE="$(
        curl "http://${LIVENESS_PROBE_URL}/web/login" \
            --silent \
            --head \
            --header "cookie: session_id=$SESSION_COOKIE"
    )"

    ensure_odoo_is_running

    HTTP_STATUS_CODE="$( echo "${RESPONSE[@]}" | head -1 | awk '{print $2}' )"
    RESPONSE_SESSION_COOKIE="$(
        echo "${RESPONSE[@]}" \
        | grep "session_id=" \
        | sed -e 's/^.*session_id=//g' \
        | sed -e 's/;.*$//g'
    )"

    if [ -n "$RESPONSE_SESSION_COOKIE" ] && [ "$RESPONSE_SESSION_COOKIE" != "$SESSION_COOKIE" ]; then
        echo "$RESPONSE_SESSION_COOKIE" > "$LIVENESS_PROBE_FILE"
        export SESSION_COOKIE="$RESPONSE_SESSION_COOKIE"
    fi

    if [ $HTTP_STATUS_CODE -ge 200 ] && [ $HTTP_STATUS_CODE -le 299 ]; then
        # That's what we wanna see!
        exit 0
    elif [ $HTTP_STATUS_CODE -ge 300 ] && [ $HTTP_STATUS_CODE -le 399 ]; then
        # Get the URL where we are redirected to
        REDIRECTION_URL="$(
            echo "${RESPONSE[@]}" \
            | grep "Location:" \
            | sed -e "s/^.*${LIVENESS_PROBE_URL}//g"
        )"

        # If there's no database in the system yet, we're getting forwarded to the database selector
        if [ "$REDIRECTION_URL" = "/web/database/selector" ]; then
            initialize_session "$SESSION_COOKIE"
        fi
    elif [ $HTTP_STATUS_CODE -ge 400 ] && [ $HTTP_STATUS_CODE -le 499 ]; then
        initialize_session "$SESSION_COOKIE"
    else
        # It seems there is an internal server error (5XX)
        echo "ERROR: Liveness probe failed!"
        echo "HTTP status code: $HTTP_STATUS_CODE"
        echo "HTTP session cookie: $SESSION_COOKIE"
        echo "HTTP headers:"
        echo "${RESPONSE[@]}"

        exit 1
    fi
else
    initialize_session
fi

@llacroix
Copy link

llacroix commented Feb 4, 2021

I think you're all over thinking it. Healthcheck is pretty much an implementation detail.
If you use 0 workers, the healthcheck will need to be different as it's not using the same architecture as with workers > 1
With workers > 1 you're effectively spawning multiple forks of the preforkserver and you will also have multiple port to listen to. So you could have the longpolling up but xmlrcp_port down...
If you have cron workers they can't be exactly tested much...
The prefork server by itself would be the one managing its children anyway so you shouldn't even have to worry about it...

The other thing is that an healthy worker can be different things for different people. If I have a worker for which the response time increase to more than let say 50% slower than the average request time... I'd say the worker is unhealthy. Or may be the response time of the average request is increasing or maybe the amount of memory is way too high...

The issue is that a health check doesn't mean that the server is down, may be it's just slow and would tell you to scale the service up by adding more replicas.

@lemniskett
Copy link

lemniskett commented Aug 23, 2022

Is it a good idea to use XML-RPC instead? I uses this in my own customized Dockerfile.

HEALTHCHECK CMD curl -H 'Content-Type: text/xml' \
    -d "<?xml version='1.0'?><methodCall><methodName>version</methodName></methodCall>" \
    -X POST \
    --fail \
    http://localhost:${ODOO_PORT}/xmlrpc/2/common

EDIT: Apparently this doesn't work with odoo systems with multiple databases

@mamiu
Copy link

mamiu commented Aug 23, 2022

@lemniskett That is indeed much more elegant! 👍 Thanks for sharing!

The only problem would be if the Odoo API would response but not the web application.
But I think that's an edge case most people can ignore.

@mlaitinen
Copy link

Odoo now has (since version 15.0) an endpoint /web/health that can be used for health checks. See: https://github.com/odoo/odoo/blob/05abba7e58d5a49a42ace9338d785b90fd5f7671/addons/web/controllers/main.py#L911

@lemniskett
Copy link

Odoo now has (since version 15.0) an endpoint /web/health that can be used for health checks. See: https://github.com/odoo/odoo/blob/05abba7e58d5a49a42ace9338d785b90fd5f7671/addons/web/controllers/main.py#L911

Is it really a health check? It doesn't seem to check anything.

Also in Odoo 15, /web/database/selector doesn't even return internal error when it's not connected to PostgreSQL.

@llacroix
Copy link

@lemniskett It's still better than nothing. It will at least not create useless sessions and fill your session store folder after each seconds.

The downside is that it's not backward compatible with earlier versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants