Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: scale new functions down to zero which have never been invoked #979

Open
cedricvidal opened this issue Nov 26, 2018 · 13 comments
Open

Comments

@cedricvidal
Copy link

Expected Behaviour

When deploying a function for the first time with com.openfaas.scale.zero: true and faas-idler's -dry-run is false, I expect my function to be idled, ie scaled down to zero after deployment but it isn't.

Current Behaviour

Currently, the function replicas will be set upon deployment to com.openfaas.scale.min which defaults to 1.

Despite the fact that com.openfaas.scale.zero: true, the faas-idler is not in dry-run mode and my function is not used, its replica stays to its minimum and my function is not idled.

The faas-idler kicks in only when the metric is present in prometheus, the function is ignored otherwise. When a function is deployed for the first time, its metrics are not yet in prometheus, they are collected only after the function is used for the first time.

Possible Solutions

  1. consider 0 for functions labeled with zero scaling but without any prometheus metrics
  2. using an Init Container in the function, register the 0 metric to prometheus if it isn't set yet
  3. in the gateway, register the 0 metric to prometheus for deployed functions which have no metric yet

It looks to me that solution 3 is the best, less moving parts.

Note from @alexellis :

The 0 metric would be collected in the deployment handler, that or in a similar way to the gateway_service_count which is polled periodically

Optionally, another consideration is I think that however it is implemented, one should not idle a non healthy function. If the function is not healthy, doesn't start for some reason and is idled by the faas-idler, one might never know that the function is broken until the function is first used.

Context

For the context, I would like to use OpenFAAS to deploy GPU deep learning models to multiple demo environments. Since they are very heavy and costly to run, I would like them to be only started when actually used.

Your Environment

  • FaaS-CLI version ( Full output from: faas-cli version ):
  ___                   _____           ____
 / _ \ _ __   ___ _ __ |  ___|_ _  __ _/ ___|
| | | | '_ \ / _ \ '_ \| |_ / _` |/ _` \___ \
| |_| | |_) |  __/ | | |  _| (_| | (_| |___) |
 \___/| .__/ \___|_| |_|_|  \__,_|\__,_|____/
      |_|

�[0mCLI:
 commit:  b24c5763d9b61e0c04018a722f8f2f765498f18a
 version: 0.7.8

Gateway
 uri:     http://127.0.0.1:8080
 version: 0.9.10
 sha:     b4c12f824bcea6b3038f5c878001f72e6a57de1e
 commit:  Make use of cache in scaling


Provider
 name:          faas-netes
 orchestration: kubernetes
 version:       0.6.3 
 sha:           62766ad0c4b2ce713df26172faa51f56b1a955ce
  • Docker version docker version (e.g. Docker 17.0.05 ):
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:21:31 2018
 OS/Arch:           darwin/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:29:02 2018
  OS/Arch:          linux/amd64
  Experimental:     true
 Kubernetes:
  Version:          v1.10.3
  StackAPI:         v1beta2
  • Are you using Docker Swarm or Kubernetes (FaaS-netes)?
    Kubernetes

  • Operating System and version (e.g. Linux, Windows, MacOS):
    OSX

  • Link to your project or a code example to reproduce issue:
    Verbatim java8 sample

@kenfdev
Copy link
Member

kenfdev commented Nov 26, 2018

Thanks for your detailed proposal @cedricvidal !

I like solution 3 as well (would like to hear other's thoughts, too). I'm surprised replicas 0 isn't collected when it is deployed for the first time. I thought all the services were collected with Replicas count.

@alexellis
Copy link
Member

I'm hoping for some more direction from @kenfdev then we'll try to find someone to help with this work.

@kenfdev
Copy link
Member

kenfdev commented Nov 27, 2018

Okay, having a more detailed look, I realized that I wasn't understanding the circumstance well and solution 3 probably won't be a easy change.

The metrics for replicas are collected, and that doesn't matter. The issue is talking about invocation count. And yes, this isn't collected without any invocation of the function. This is because the gateway_function_invocation_total metrics is only a fact about invocation results - the function name and what HTTP response code it got. That said, I don't think registering 0 to gateway_function_invocation_total for a function that hasn't been invoked yet makes much sense.

e.g.

gateway_function_invocation_total{code="???",function_name="echo"} 0

IMHO, I think this is faas-idler's responsibility. It already collects all the functions by requesting the gateway here. But unfortunately, it gets no results due to the PromQL targeted to gateway_function_invocation_total here.

I haven't looked deep into faas-idler yet but maybe we could tweak somewhere around that code to target a function without any gateway_function_invocation_total. In that way, functions without invocation should be scaled by the faas-idler.

WDYT?

/cc @alexellis @martindekov

@alexellis
Copy link
Member

I would agree with your analysis @kenfdev. Implementing this in the idler is possible, but will be orthogonal since it will only cover an edge-case for functions which have never been invoked but have more than 0 replicas.

At the moment it's probably the best option we have and should be a reasonably small fix here:

https://github.com/openfaas-incubator/faas-idler/blob/master/main.go#L114

If there is no data, or the invocation count is zero, we'd have to still enter the checking code and then find out if there are replicas. If there are replicas and either we have no stats or the stats are zero, then we scale down.

Alex

@cedricvidal
Copy link
Author

@kenfdev You're welcome
@alexellis I agree, orthogonal but straightforward, solves the problem now and easy to change later if a better solution is found.

Hum, that being said, is it possible to take into account the healthiness of the function? In case the function has no metrics, scale it down only if it's healthy? Otherwise, it could potentially hide startup problems until the function is used for the first time. This is contrary to the principle of detecting problems as early as possible.

@alexellis
Copy link
Member

FYI @rgee0 started looking at this.

@hotjunfeng
Copy link

hotjunfeng commented Jun 17, 2019

Is there any update for this issue? Thanks. @alexellis

@alexellis
Copy link
Member

@hotjunfeng there's a new version of faas-idler which you can take from @rgee0.

@hotjunfeng
Copy link

@alexellis I found that the new version of @rgee0 has been merged to faas-idler. I have run the latest version but it still does not work. In other words, my function can not scale to zero when there is no request for some time.

@alexellis
Copy link
Member

Which specific version did you use? Richard tests all his changes thoroughly, so I would be surprised if that were the case.

@hotjunfeng
Copy link

@alexellis The specific version is:

Your Environment

  • FaaS-CLI version ( Full output from: faas-cli version ):
CLI:
 commit:  25cada08609e00bed526790a6bdd19e49ca9aa63
 version: 0.8.14
  • Docker version docker version (e.g. Docker 17.0.05 ):
Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:50 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:42:13 2019
  OS/Arch:          linux/amd64
  Experimental:     false
  • Are you using Docker Swarm or Kubernetes (FaaS-netes)?
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:08:12Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.7", GitCommit:"4683545293d792934a7a7e12f2cc47d20b2dd01b", GitTreeState:"clean", BuildDate:"2019-06-06T01:39:30Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
  • Operating System and version (e.g. Linux, Windows, MacOS):
    Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-150-generic x86_64)

  • Code example or link to GitHub repo or gist to reproduce problem:
    hello-python

@alexellis
Copy link
Member

@rgee0 PTAL

@kevin-lindsay-1
Copy link

Is this still an issue? It's been open for quite some time, and in my experience functions are idled all the time in a manner in which I would consider mostly intuitive.

That, and scale-to-zero is an OpenFaaS Pro feature now, so I don't think the idler really lives in this specific project anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants