Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 516 speed up search api #530

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

finist
Copy link

@finist finist commented Aug 28, 2017

Speed up search api more than 3 times

On start search api takes on my local machine 3200ms on 1k limit.

  • On first i fix n+1 on 'trait/yield' links. It speed up request from 3200ms to 2700ms on 1k limit

  • The second step is change rabl json template on jbuilder it speed up request from 2700ms to 2200ms

  • The third step is changing forming 'trait/yield' links from rails routes to string, it looks a little dirty but speed up from 2200ms to 1900ms

  • The last step is change yajl serializer to oj for search api, it speed up from 1900ms to 1000ms

Now in my local machine search api request with limit=1000 gets 1000ms

@dlebauer
Copy link
Member

Before merging please check the performance with 100,000 and 1,000,000 records.

@dlebauer dlebauer requested a review from gsrohde August 30, 2017 03:26
@gsrohde
Copy link
Contributor

gsrohde commented Aug 31, 2017

I'll try to get to looking at this more closely. But I do notice one thing so far, which may or may not be an issue: This change would eliminate support for results in XML format. If we do make this change, at the very least I would have to update the documentation to say that only JSON is supported.

@dlebauer
Copy link
Member

@finist is there a reason to drop support for XML?

@gsrohde
Copy link
Contributor

gsrohde commented Aug 31, 2017

I think this is a matter of jbuilder not supporting XML as rabl does (see step 2).

By the way, even before this PR, I had been considering exploring what speed-ups could be achieved by using PostgreSQL's built-in support for outputting query results in JSON and XML. I still think it's worth doing, I just didn't get to it.

@dlebauer
Copy link
Member

@gsrohde using postgres json/xml export sounds like a major overhaul ... is that correct?

@gsrohde
Copy link
Contributor

gsrohde commented Aug 31, 2017

Yes, probably. If we want to use JBuilder but continue to support XML, I think we could, for now, simply make the app continue to use Rabl for XML requests and use JBuilder for JSON. Then we'd at least get significant speed-up for what I take it is the more popular format (JSON).

@finist
Copy link
Author

finist commented Sep 1, 2017

It my mistake, i forgot about xml support. I returned rabl for index view, it works only for xml requests, but it is very slow, around 12 seconds for 1k limit in my local machine. For json format works jbuilder and it fast, 1 second for 1k limit.

@finist
Copy link
Author

finist commented Sep 1, 2017

I'm still working on benchmark for 1 millions limit, i took 'TERRA Ref' database with 1.5 millions trait_and_yield_view rows, but it works extremely slow. I found a problem with sql request to "view", and try to fix it.

@finist
Copy link
Author

finist commented Sep 4, 2017

Some benchmark for database with 1.5 millions trait_and_yield_view rows on my local machine:

current master:
1000 - 3876.1ms
10 000 - 42483.5ms
100 000 - 795403.8ms
1 000 000 - around 1 hour

speed up search api:
1000 - 950ms
10 000 - 8493.5ms
100 000 - 89241.5ms
1 000 000 - 884313.1ms

@finist
Copy link
Author

finist commented Sep 4, 2017

I need a discuss. As i said i took 'TERRA Ref' database with 1.5 million trait rows, and it work extremely slow. Main problem is here https://github.com/PecanProject/bety/blob/master/lib/data_access.rb#L55, request need
process every row in view. For prevent it, we need setup index on access_level field, but we cant setup index on view, need change view on materialized view. Requests start working very fast, but with materialized view we have one problem, it not refresh automatically. Here is two solution: 1) set callback for trait and yield, after save it refresh view. 2) setup cron task, for refreshing view every minute. I need to know, how usually data add to projects. If it small chunks (1 row add every minute), better use first solution. If a lot of data is added at the same time, second solution is better.

Benchmark:

speed up search api with materialized view:
1000 - 419.3ms
10 000 - 3733.9ms
100 000 - 42946.7ms
1 000 000 - 461925.2ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants