Proposal: scale new functions down to zero which have never been invoked #979

cedricvidal · 2018-11-26T13:54:23Z

Expected Behaviour

When deploying a function for the first time with com.openfaas.scale.zero: true and faas-idler's -dry-run is false, I expect my function to be idled, ie scaled down to zero after deployment but it isn't.

Current Behaviour

Currently, the function replicas will be set upon deployment to com.openfaas.scale.min which defaults to 1.

Despite the fact that com.openfaas.scale.zero: true, the faas-idler is not in dry-run mode and my function is not used, its replica stays to its minimum and my function is not idled.

The faas-idler kicks in only when the metric is present in prometheus, the function is ignored otherwise. When a function is deployed for the first time, its metrics are not yet in prometheus, they are collected only after the function is used for the first time.

Possible Solutions

consider 0 for functions labeled with zero scaling but without any prometheus metrics
using an Init Container in the function, register the 0 metric to prometheus if it isn't set yet
in the gateway, register the 0 metric to prometheus for deployed functions which have no metric yet

It looks to me that solution 3 is the best, less moving parts.

Note from @alexellis :

The 0 metric would be collected in the deployment handler, that or in a similar way to the gateway_service_count which is polled periodically

Optionally, another consideration is I think that however it is implemented, one should not idle a non healthy function. If the function is not healthy, doesn't start for some reason and is idled by the faas-idler, one might never know that the function is broken until the function is first used.

Context

For the context, I would like to use OpenFAAS to deploy GPU deep learning models to multiple demo environments. Since they are very heavy and costly to run, I would like them to be only started when actually used.

Your Environment

FaaS-CLI version ( Full output from: faas-cli version ):

  ___                   _____           ____
 / _ \ _ __   ___ _ __ |  ___|_ _  __ _/ ___|
| | | | '_ \ / _ \ '_ \| |_ / _` |/ _` \___ \
| |_| | |_) |  __/ | | |  _| (_| | (_| |___) |
 \___/| .__/ \___|_| |_|_|  \__,_|\__,_|____/
      |_|

�[0mCLI:
 commit:  b24c5763d9b61e0c04018a722f8f2f765498f18a
 version: 0.7.8

Gateway
 uri:     http://127.0.0.1:8080
 version: 0.9.10
 sha:     b4c12f824bcea6b3038f5c878001f72e6a57de1e
 commit:  Make use of cache in scaling


Provider
 name:          faas-netes
 orchestration: kubernetes
 version:       0.6.3 
 sha:           62766ad0c4b2ce713df26172faa51f56b1a955ce

Docker version docker version (e.g. Docker 17.0.05 ):

Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:21:31 2018
 OS/Arch:           darwin/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:29:02 2018
  OS/Arch:          linux/amd64
  Experimental:     true
 Kubernetes:
  Version:          v1.10.3
  StackAPI:         v1beta2

Are you using Docker Swarm or Kubernetes (FaaS-netes)?
Kubernetes
Operating System and version (e.g. Linux, Windows, MacOS):
OSX
Link to your project or a code example to reproduce issue:
Verbatim java8 sample

The text was updated successfully, but these errors were encountered:

kenfdev · 2018-11-26T23:08:18Z

Thanks for your detailed proposal @cedricvidal !

I like solution 3 as well (would like to hear other's thoughts, too). I'm surprised replicas 0 isn't collected when it is deployed for the first time. I thought all the services were collected with Replicas count.

alexellis · 2018-11-27T09:47:31Z

I'm hoping for some more direction from @kenfdev then we'll try to find someone to help with this work.

kenfdev · 2018-11-27T12:52:10Z

Okay, having a more detailed look, I realized that I wasn't understanding the circumstance well and solution 3 probably won't be a easy change.

The metrics for replicas are collected, and that doesn't matter. The issue is talking about invocation count. And yes, this isn't collected without any invocation of the function. This is because the gateway_function_invocation_total metrics is only a fact about invocation results - the function name and what HTTP response code it got. That said, I don't think registering 0 to gateway_function_invocation_total for a function that hasn't been invoked yet makes much sense.

e.g.

gateway_function_invocation_total{code="???",function_name="echo"} 0

IMHO, I think this is faas-idler's responsibility. It already collects all the functions by requesting the gateway here. But unfortunately, it gets no results due to the PromQL targeted to gateway_function_invocation_total here.

I haven't looked deep into faas-idler yet but maybe we could tweak somewhere around that code to target a function without any gateway_function_invocation_total. In that way, functions without invocation should be scaled by the faas-idler.

WDYT?

/cc @alexellis @martindekov

alexellis · 2018-11-27T13:53:37Z

I would agree with your analysis @kenfdev. Implementing this in the idler is possible, but will be orthogonal since it will only cover an edge-case for functions which have never been invoked but have more than 0 replicas.

At the moment it's probably the best option we have and should be a reasonably small fix here:

https://github.com/openfaas-incubator/faas-idler/blob/master/main.go#L114

If there is no data, or the invocation count is zero, we'd have to still enter the checking code and then find out if there are replicas. If there are replicas and either we have no stats or the stats are zero, then we scale down.

Alex

cedricvidal · 2018-11-28T00:04:14Z

@kenfdev You're welcome
@alexellis I agree, orthogonal but straightforward, solves the problem now and easy to change later if a better solution is found.

Hum, that being said, is it possible to take into account the healthiness of the function? In case the function has no metrics, scale it down only if it's healthy? Otherwise, it could potentially hide startup problems until the function is used for the first time. This is contrary to the principle of detecting problems as early as possible.

alexellis · 2018-12-01T08:14:52Z

FYI @rgee0 started looking at this.

hotjunfeng · 2019-06-17T21:57:33Z

Is there any update for this issue? Thanks. @alexellis

alexellis · 2019-06-17T22:55:22Z

@hotjunfeng there's a new version of faas-idler which you can take from @rgee0.

hotjunfeng · 2019-06-18T23:17:44Z

@alexellis I found that the new version of @rgee0 has been merged to faas-idler. I have run the latest version but it still does not work. In other words, my function can not scale to zero when there is no request for some time.

alexellis · 2019-06-19T07:36:22Z

Which specific version did you use? Richard tests all his changes thoroughly, so I would be surprised if that were the case.

hotjunfeng · 2019-06-19T22:03:04Z

@alexellis The specific version is:

Your Environment

FaaS-CLI version ( Full output from: faas-cli version ):

CLI:
 commit:  25cada08609e00bed526790a6bdd19e49ca9aa63
 version: 0.8.14

Docker version docker version (e.g. Docker 17.0.05 ):

Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:50 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:42:13 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Are you using Docker Swarm or Kubernetes (FaaS-netes)?

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:08:12Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.7", GitCommit:"4683545293d792934a7a7e12f2cc47d20b2dd01b", GitTreeState:"clean", BuildDate:"2019-06-06T01:39:30Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Operating System and version (e.g. Linux, Windows, MacOS):
Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-150-generic x86_64)
Code example or link to GitHub repo or gist to reproduce problem:
hello-python

alexellis · 2019-07-06T09:23:40Z

@rgee0 PTAL

kevin-lindsay-1 · 2021-11-04T15:00:41Z

Is this still an issue? It's been open for quite some time, and in my experience functions are idled all the time in a manner in which I would consider mostly intuitive.

That, and scale-to-zero is an OpenFaaS Pro feature now, so I don't think the idler really lives in this specific project anymore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: scale new functions down to zero which have never been invoked #979

Proposal: scale new functions down to zero which have never been invoked #979

cedricvidal commented Nov 26, 2018

kenfdev commented Nov 26, 2018

alexellis commented Nov 27, 2018

kenfdev commented Nov 27, 2018

alexellis commented Nov 27, 2018

cedricvidal commented Nov 28, 2018

alexellis commented Dec 1, 2018

hotjunfeng commented Jun 17, 2019 •

edited

Loading

alexellis commented Jun 17, 2019

hotjunfeng commented Jun 18, 2019

alexellis commented Jun 19, 2019

hotjunfeng commented Jun 19, 2019

alexellis commented Jul 6, 2019

kevin-lindsay-1 commented Nov 4, 2021

Proposal: scale new functions down to zero which have never been invoked #979

Proposal: scale new functions down to zero which have never been invoked #979

Comments

cedricvidal commented Nov 26, 2018

Expected Behaviour

Current Behaviour

Possible Solutions

Context

Your Environment

kenfdev commented Nov 26, 2018

alexellis commented Nov 27, 2018

kenfdev commented Nov 27, 2018

alexellis commented Nov 27, 2018

cedricvidal commented Nov 28, 2018

alexellis commented Dec 1, 2018

hotjunfeng commented Jun 17, 2019 • edited Loading

alexellis commented Jun 17, 2019

hotjunfeng commented Jun 18, 2019

alexellis commented Jun 19, 2019

hotjunfeng commented Jun 19, 2019

Your Environment

alexellis commented Jul 6, 2019

kevin-lindsay-1 commented Nov 4, 2021

hotjunfeng commented Jun 17, 2019 •

edited

Loading