-
Let’s run a test container, this container has an application that listens on a given port, but that’s not important for now:
podman run -d --rm --name reversewords-test quay.io/mavazque/reversewords:latest
-
We can always get capabilities for a process by querying the /proc filesystem:
# Get container's PID CONTAINER_PID=$(podman inspect reversewords-test --format {{.State.Pid}}) # Get caps for a given PID grep Cap /proc/${CONTAINER_PID}/status
-
We get the capability sets in hex format, we can decode them using
capsh
tool:capsh --decode=00000000800405fb
-
We can use podman inspect as well:
podman inspect reversewords-test --format {{.EffectiveCaps}}
-
Stop the container:
podman stop reversewords-test
-
Run our test container with a root uid and get it’s capabilities:
podman run --rm -it --user 0 --entrypoint /bin/bash --name reversewords-test quay.io/mavazque/reversewords:ubi8 # grep Cap /proc/1/status
-
We can see thread's permitted, effective and bound capability sets populated:
CapInh: 0000000000000000 CapPrm: 00000000800405fb CapEff: 00000000800405fb CapBnd: 00000000800405fb CapAmb: 0000000000000000
let's decode them:
capsh --decode=00000000800405fb
-
Exit the container:
$ exit
-
Same test but running the container with a nonroot UID:
podman run --rm -it --user 1024 --entrypoint /bin/bash --name reversewords-test quay.io/mavazque/reversewords:ubi8 $ grep Cap /proc/1/status CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 00000000800405fb CapAmb: 0000000000000000
-
We can see thread's permitted and effective capability sets cleared, we can exit our container now:
exit
-
We can requests extra capabilities and those will be assigned to the corresponding sets:
podman run --rm -it --user 1024 --cap-add=cap_net_bind_service --entrypoint /bin/bash --name reversewords-test quay.io/mavazque/reversewords:ubi8 $ grep Cap /proc/1/status
-
Since Podman supports ambient capabilities, you can see how we got the NET_BIND_SERVICE cap into the ambient, permitted and effective sets.
CapInh: 0000000000000400 CapPrm: 0000000000000400 CapEff: 0000000000000400 CapBnd: 00000000800405fb CapAmb: 0000000000000400
$ capsh --decode=0000000000000400 0x0000000000000400=cap_net_bind_service
-
We can exit the container now:
$ exit
-
We can control in which port our application listens by using the APP_PORT environment variable. Let’s try to run our application in a non-privileged port with a non-privileged user:
podman run --rm --user 1024 -e APP_PORT=8080 --name reversewords-test quay.io/mavazque/reversewords:ubi8
-
Stop the container with Ctrl+C and try to bind to port 80 this time:
podman run --rm --user 1024 -e APP_PORT=80 --name reversewords-test quay.io/mavazque/reversewords:ubi82022/07/06 14:28:52 Starting Reverse Api v v0.0.21 Release: NotSet 2022/07/06 14:28:52 Listening on port 80 2022/07/06 14:28:52 listen tcp :80: bind: permission denied
-
This time it fails, remember that since we're running as nonroot, permitted and effective capability sets were cleared (so NET_BIND_SERVICE present on podman's default cap set is not available).
-
We know that the capability NET_BIND_SERVICE allows unprivileged processes to bind to ports under 1024, let’s assign this capability to the container and see what happens:
podman run --rm --user 1024 -e APP_PORT=80 --cap-add=cap_net_bind_service --name reversewords-test quay.io/mavazque/reversewords:ubi8 2022/07/06 14:29:30 Starting Reverse Api v0.0.21 Release: NotSet 2022/07/06 14:29:30 Listening on port 80
-
This time it worked because the NET_BIND_SERVICE cap was added to the ambient, permitted and effective sets.
-
You can stop the container using Ctrl+C.
-
We added the NET_BIND_SERVICE capability to our binary when we built the image:
setcap 'cap_net_bind_service+ep' /usr/bin/reverse-words
-
Let's take a look inside the container:
podman run --rm -it --entrypoint /bin/bash --user 1024 -e APP_PORT=80 --name reversewords-test quay.io/mavazque/reversewords-captest:latest getcap /usr/bin/reverse-words
-
The capability is added to the effective and permitted file capability sets.
-
Let's review the thread capabilities:
grep Cap /proc/1/status
-
As you can see, the effective and permitted sets are cleared. Only the bounding capability set has NET_BIND_SERVICE.
-
Let's run our app:
/usr/bin/reverse-words &
-
We were able to bind to port 80, the binary had the file capabilities (effective and permitted) required to do that and it was present on the bounding set. Then the thread acquired the capability on its effective and permitted sets. We can check the effective and permitted sets:
$ grep Cap /proc/$!/status CapInh: 0000000000000000 CapPrm: 0000000000000400 CapEff: 0000000000000400 CapBnd: 00000000800405fb CapAmb: 0000000000000000
-
We can exit the container now.
exit
-
Does this mean that we can bypass thread capabilities? - Let's see:
podman run --rm -it --entrypoint /bin/bash --user 1024 --cap-drop=all -e APP_PORT=80 --name reversewords-test quay.io/mavazque/reversewords-captest:latest
-
Check the cointainer thread capabilities:
$ grep Cap /proc/1/status CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000000000000000 CapAmb: 0000000000000000
-
All sets are zeroed, let's try to run our app:
/usr/bin/reverse-words bash: /usr/bin/reverse-words: Operation not permitted
-
The kernel blocked the execution, since NET_BIND_SERVICE capability cannot be acquired since the capability requested was not in the bounding set.
-
That answers the question, NO. Now we can exit the container:
exit