API to list all nodes in all projects #1054

naved001 · 2018-11-09T19:32:20Z

I think we should have an admin only API to list all nodes in every project. This is something that we need to know.

Currently we can achieve this by running list_project_nodes on all projects , but it's quite slow (depending on the number of projects we have).

@zenhack Wanted to get your thoughts before I implement this.

zenhack · 2018-11-12T10:35:50Z

So would this just be a variant on list all nodes where we also get a field telling us what project they're in?

Have you tried parallelizing the individual list_project_nodes calls? That should help some, maybe enough for a stopgap while we think about the longer term picture.

I worry about cases like this, where we have an obvious way to do something, but we're talking about adding a shortcut because composing existing operations is too slow. Minimalism was a big goal, but if we're not able to express the things we need in terms of the minimal operations, I'd rather step back for a moment and ask whether there's some structural change we can make to avoid this sort of scenario. I'd rather not get into a situation where we have calls for x y and z, and then also shortcuts for x-then-y, x-then-z, y-then-z, x-then-y-then-z... We should instead see if we can't find a way to provide primitives that actually compose nicely.

naved001 · 2018-11-16T19:59:31Z

So would this just be a variant on list all nodes where we also get a field telling us what project they're in?

Pretty much.

Have you tried parallelizing the individual list_project_nodes calls? That should help some, maybe enough for a stopgap while we think about the longer term picture.

Good point. I tried to parallelize my requests using python's multiprocessing module but the results were the same (took like 11-12 seconds to return nodes in 8 projects). Maybe I am doing it wrong. Relevant code snippet.

import requests
import os
import ast
from pprint import pprint
from multiprocessing import Pool

user = os.environ.get('HIL_USERNAME')
password = os.environ.get('HIL_PASSWORD')
url = os.environ.get('HIL_ENDPOINT') + '/v0/'
r = requests.get(url+'projects', auth=(user, password))
projects = ast.literal_eval(r.content)

def get_nodes(project):
    api = 'project/' + project + '/nodes'
    nodes = requests.get(url+api, auth=(user, password))
    key ={}
    key[project] = nodes.content
    return key

p = Pool(processes=5)
pprint(p.map(get_nodes, projects))

I'll get back to it later I guess, and for now just use the slower sequential version.

I worry about cases like this, where we have an obvious way to do something, but we're talking about adding a shortcut because composing existing operations is too slow. Minimalism was a big goal, but if we're not able to express the things we need in terms of the minimal operations, I'd rather step back for a moment and ask whether there's some structural change we can make to avoid this sort of scenario. I'd rather not get into a situation where we have calls for x y and z, and then also shortcuts for x-then-y, x-then-z, y-then-z, x-then-y-then-z... We should instead see if we can't find a way to provide primitives that actually compose nicely.

Good point. Let's see if I can do it with parallel calls, otherwise we'll think about the bigger picture first before adding yet another (similar) API.

zenhack · 2018-11-16T20:23:22Z

How many threads is the server using? the example apache config in INSTALL.rst specifies two, which is going to limit concurrency.

naved001 · 2018-11-16T20:29:30Z

I specified four threads there

WSGIDaemonProcess hil user=hil group=hil threads=4

zenhack · 2018-11-16T20:29:55Z

Another thought: the performance problem we hit with CI way back could be rearing its head when we're legitimately making a ton of api calls; password auth is going to be slow. If the parallelization doesn't help enough it might be interesting to see if that's a bottleneck.

zenhack · 2018-11-17T00:38:24Z

Ok -- see if cranking that number up makes it any better. Quoting Naved Ansari (2018-11-16 15:29:30)

…

I specified four threads there WSGIDaemonProcess hil user=hil group=hil threads=4 -- You are receiving this because you were mentioned. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread. Verweise 1. #1054 (comment) 2. https://github.com/notifications/unsubscribe-auth/AA18PsRyh8SZbGkPqM6ky352L1ZqcDmGks5uvyAqgaJpZM4YXYqo

naved001 · 2019-01-17T21:39:39Z

Ok -- see if cranking that number up makes it any better.

I increased the number there to 16, and also upped the number of vCPUs allocated to the HIL server to 16, and there was no change.

zenhack · 2019-01-17T22:01:23Z

A couple more ideas:

I noticed you have process=5 in the client -- if you up that to 16 as well does it make a difference?
Is it similarly problematic on a local dev instance? Does changing the auth backend (maybe to a noop, although obviously don't touch that on prod...) affect it at all?

naved001 · 2019-01-17T23:46:43Z

I noticed you have process=5 in the client -- if you up that to 16 as well does it make a difference?

Tried it and that didn't make a difference.

@apoorvemohan suggested that we try increasing the number of processes rather than threads, and that sped things up. Concurrent requests were served in 5 seconds as opposed to 25 seconds earlier.
Why did this work and would this be a problem?

zenhack · 2019-01-18T01:48:06Z

The python interpreter doesn't actually support multiple threads executing Python code at once; there's a single mutex that covers the entire interpreter state (the "Giant Interpreter Lock (GIL)"). Threads can still help if the workload is IO-bound, since blocking IO calls will typically release the lock, allowing other threads to run while blocked on IO. But if it's a cpu-bound workload extra threads will do nothing -- if you want Python CPU parallelism you need processes.

Just having the server configured with more processes sounds perfectly reasonable.

This throws a little more evidence towards the idea that the bottleneck is the auth backend -- that's all CPU bound, whereas the queries themselves should just be hitting the database.

I'd still like to know if swapping in the null auth backend fixes it, though obviously you need to reproduce the problem somewhere other than prod first.

If so, maybe we should be looking into more performant alternatives.

naved001 · 2019-01-18T21:31:44Z

Thanks for the explanation, Ian!!

I'd still like to know if swapping in the null auth backend fixes it, though obviously you need to reproduce the problem somewhere other than prod first.

Yep, I'll try that.

naved001 · 2019-01-18T21:32:42Z

So now that I know how to do multiple requests, I tried adding this as a CLI feature to get all nodes in all projects.

def _get_nodes(project):
    """Helper function to use with p.map"""
    key = {}
    nodes = client.project.nodes_in(project)
    key[project] = nodes
    return key


@project_node.command(name='listall')
@click.option('--json', 'jsonout', is_flag=True)
def project_node_list(jsonout):
    """List all nodes attached to a <project>"""

    projects = client.project.list()
    p = Pool(processes=len(projects))
    pprint(p.map(_get_nodes, projects))
    return

but it returns this

None: Max retries exceeded with url: /v0/project/<one-of-the-project-from-the-list>/nodes (Caused by None)

If I sleep for like 5 seconds after I get the list of projects, it works fine. Any insights on why this is happening? The other code in this thread which just uses requests directly works fine.

zenhack · 2019-01-18T22:34:14Z

Unrelated to your immediate question, but it got me looking more closely at the code so I'll bring it up:

Don't use ast.literal_eval to parse json; use json.loads. The Python syntax isn't a strict superset of json, so certain inputs will fail that shouldn't, chiefly those involving booleans or null, which don't share the same representation. There are also inputs that will parse when they should fail.

Also every time I see ast.literal_eval I almost have a heart attack until I realize that literal_eval isn't actually unsafe like eval is -- but you really have to know the API to be able to look at that and be sure it's not introducing an RCE; json.loads is more obviously tame and we use it everywhere so folks know what it does.

Also, a more minor point, instead of pprint I'd suggest using something like json.dumps(..., indent=' ', sort_keys=True); it's a little gross that we're showing python literal syntax to the end user.

Sorry for the lecture. To address your actual question, I'm not sure why there would be a difference in behavior between the client lib and the manual http code above. Is there anything interesting in the server logs about why it's refusing connections? Do we have a max concurrent connections limit set somewhere that you're exceeding?

naved001 · 2019-01-18T23:26:01Z

Sorry for the lecture

I always appreciate your insights! I have learned so much from you.

…

On Fri, Jan 18, 2019, 5:34 PM Ian Denhardt ***@***.*** wrote: Unrelated to your immediate question, but it got me looking more closely at the code so I'll bring it up: Don't use ast.literal_eval to parse json; use json.loads. The Python syntax isn't a strict superset of json, so certain inputs will fail that shouldn't, chiefly those involving booleans or null, which don't share the same representation. There are also inputs that will parse when they should fail. Also every time I see ast.literal_eval I almost have a heart attack until I realize that literal_eval isn't actually unsafe like eval is -- but you really have to know the API to be able to look at that and be sure it's not introducing an RCE; json.loads is more obviously tame and we use it everywhere so folks no what it does. Also, a more minor point, I'd suggest using something like json.dumps(..., indent=' ', sort_keys=True); it's a little gross that we're showing python literal syntax to the end user. Sorry for the lecture. To address your actual question, I'm not sure why there would be a difference in behavior between the client lib and the manual http code above. Is there anything interesting in the server logs about why it's refusing connections? Do we have a max concurrent connections limit set somewhere that you're exceeding? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1054 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ATLmqOt_bc2sANVf0XOUW3sl_g_GZk51ks5vEkvngaJpZM4YXYqo> .

naved001 closed this as completed Jan 18, 2019

naved001 reopened this Jan 18, 2019

naved001 closed this as completed Jan 18, 2019

naved001 reopened this Jan 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API to list all nodes in all projects #1054

API to list all nodes in all projects #1054

naved001 commented Nov 9, 2018

zenhack commented Nov 12, 2018

naved001 commented Nov 16, 2018

zenhack commented Nov 16, 2018

naved001 commented Nov 16, 2018

zenhack commented Nov 16, 2018

zenhack commented Nov 17, 2018 via email

naved001 commented Jan 17, 2019

zenhack commented Jan 17, 2019

naved001 commented Jan 17, 2019

zenhack commented Jan 18, 2019

naved001 commented Jan 18, 2019 •

edited

Loading

naved001 commented Jan 18, 2019 •

edited

Loading

zenhack commented Jan 18, 2019 •

edited

Loading

naved001 commented Jan 18, 2019 via email

API to list all nodes in all projects #1054

API to list all nodes in all projects #1054

Comments

naved001 commented Nov 9, 2018

zenhack commented Nov 12, 2018

naved001 commented Nov 16, 2018

zenhack commented Nov 16, 2018

naved001 commented Nov 16, 2018

zenhack commented Nov 16, 2018

zenhack commented Nov 17, 2018 via email

naved001 commented Jan 17, 2019

zenhack commented Jan 17, 2019

naved001 commented Jan 17, 2019

zenhack commented Jan 18, 2019

naved001 commented Jan 18, 2019 • edited Loading

naved001 commented Jan 18, 2019 • edited Loading

zenhack commented Jan 18, 2019 • edited Loading

naved001 commented Jan 18, 2019 via email

naved001 commented Jan 18, 2019 •

edited

Loading

naved001 commented Jan 18, 2019 •

edited

Loading

zenhack commented Jan 18, 2019 •

edited

Loading