Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Responding to webhook causes emqx to drop connections #131

Open
Schweig opened this issue May 31, 2019 · 9 comments
Open

Responding to webhook causes emqx to drop connections #131

Schweig opened this issue May 31, 2019 · 9 comments
Assignees

Comments

@Schweig
Copy link

Schweig commented May 31, 2019

I have an emqx broker set up and have the webhook plugin pointed at another server running node .js with express. I have about 630 devices connected to the emqx broker with a throughput of ~60 messages a second ( each device sending a message every 10 seconds). When I have have my node server send back a status with res.sendStatus(200) but if i send the I see that my connections drop from 630 to around 550. Would this be a timeout issue where my server is taking too long to respond? I'm unsure why the webhook plugin is making my emqx broker unstable

@gilbertwong96 gilbertwong96 self-assigned this Jun 1, 2019
@gilbertwong96
Copy link
Contributor

In your case, there are 80 connections lost, it is very weird, could you please give me more details? For example, the emqx version and the log file.

@Schweig
Copy link
Author

Schweig commented Jun 3, 2019

EMQX version is v3.0-rc.5. Which log file do you need?

@lcofre
Copy link

lcofre commented Jun 25, 2019

I am experiencing a similar problem with the webhook plugin running.

Currently I cannot trigger the behaviour at will, but the client is connected and at some point the emqx server disconnects the client. The webhook body of the device disconnecting is:

{
    "action": "client_disconnected",
    "client_id": "the_client_id",
    "username": "undefined",
    "reason": "keepalive_timeout"
}

and /opt/emqx/log/emqx.log.1 shows

2019-06-25 21:50:03.150 [error] 10.244.2.16:60802 ** State machine <0.14975.39> terminating
** Last event = {timeout,15000}
** When server state  = {idle,
                            {state,esockd_transport,#Port<0.628528>,
                                {{10,244,2,16},60802},
                                undefined,running,100,
                                {pstate,external,
                                    #Fun<emqx_connection.0.73284863>,
                                    {{10,244,2,16},60802},
                                    nossl,4,<<"MQTT">>,<<>>,false,
                                    <0.14975.39>,undefined,undefined,
                                    undefined,undefined,false,#{},1048576,
                                    undefined,undefined,undefined,false,true,
                                    true,false,ignore,
                                    #{msg => 0,pkt => 0},
                                    #{msg => 0,pkt => 0},
                                    false,undefined,false,
                                    #{from_client => 0,to_client => 0},
                                    emqx_connection,#{},undefined},
                                {none,
                                    #{max_packet_size => 1048576,
                                      version => 4}},
                                {emqx_gc,
                                    #{cnt => {1000,1000},
                                      oct => {1048576,1048576}}},
                                undefined,true,undefined,undefined,undefined,
                                undefined,15000}}
** Reason for termination = exit:idle_timeout
** Callback mode = [state_functions,state_enter]
** Stacktrace =
**  [{gen_statem,loop_event_result,9,[{file,"gen_statem.erl"},{line,1158}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]
2019-06-25 21:50:03.154 [error] 10.244.2.16:60802 crasher:
    initial call: emqx_connection:init/1
    pid: <0.14975.39>
    registered_name: []
    exception exit: idle_timeout
      in function  gen_statem:loop_event_result/9 (gen_statem.erl, line 1158)
    ancestors: [<0.2116.0>,<0.2115.0>,esockd_sup,<0.1968.0>]
    message_queue_len: 0
    messages: []
    links: [<0.2116.0>]
    dictionary: [{force_shutdown_policy,
                      #{max_heap_size => 838860800,message_queue_len => 8000}},
                  {'$logger_metadata$',#{peername => "10.244.2.16:60802"}}]
    trap_exit: false
    status: running
    heap_size: 10958
    stack_size: 27
    reductions: 169299
  neighbours:
<here new error>

The client is written in python and uses the paho-mqtt==1.4.0 package.

After that I see this webhook every 3 to 4 seconds of the device connecting

{
    "action": "client_connected",
    "client_id": "the_client_id",
    "username": "undefined",
    "keepalive": 60,
    "ipaddress": "10.244.2.16",
    "proto_ver": 4,
    "connected_at": 1561499917,
    "conn_ack": 0
}

and emqx server showing the same two errors on the log
emqx.log

I am using emqx/emqx:v3.1.2 on kubernetes, a few instances on a cluster created with the kubernetes apiserver.

Thanks very much in advance for your help and hard work.

@lukasportal
Copy link

We are also having this issue. It seems like this plugin is making the requests synchronously. Maybe the author (or someone that knows Erlang :)) could have this plugin to send the http request asynchronously instead, so it doens't block any processing.
We are experiencing clients being disconnected and/or any action like connecting/subscribing taking a good while to complete.
All of these problems go away when we disable the plugin.

@notjulian
Copy link

notjulian commented Jun 16, 2020

We are also having this issue. It seems like this plugin is making the requests synchronously. Maybe the author (or someone that knows Erlang :)) could have this plugin to send the http request asynchronously instead, so it doens't block any processing.
We are experiencing clients being disconnected and/or any action like connecting/subscribing taking a good while to complete.
All of these problems go away when we disable the plugin.

Hi @gilbertwong96
any news on this? we have clusters (docker 4.1.0) and a webhook rule
there is a delay sending/receiving messages (QOS2) with rule enabled

thanks

@lcofre
Copy link

lcofre commented Jun 16, 2020

It never really worked for us @gilbertwong96,

In the end, we disabled the webhook altogether and created a specific topic where the device itself would report to be connected, and set an LW&T to change that value to connected: false when disconnected.

Another option could be to listen to the connected and disconnected topics that EMQX provides. Although it's not a broker-agnostic implementation, the webhook was not either.

Thanks for any feedback you can give us on this one!

@HJianBo
Copy link
Member

HJianBo commented Jun 18, 2020

Hi guys, sorry for the late reply.

In the current implementation, The way it does in synchronous sending HTTP requests, But no matter how it won't have so many stability problems.

@lcofre the keepalive_timeout or idle_timeout should be caused by the client, that is, the client does not send a packet to Broker in a timely manner. It is not caused by webhook.

But, the asynchronously feature is a more preferable way, We plan to implement it in the next version.

Thanks for your feedback there.

@HJianBo HJianBo self-assigned this Jun 18, 2020
@HJianBo HJianBo added this to the 4.2-alpha.3 milestone Jun 18, 2020
@HJianBo HJianBo modified the milestones: 4.2-alpha.3, 4.2-beta.1 Aug 3, 2020
@HJianBo HJianBo removed this from the 4.2-beta.1 milestone Aug 19, 2020
@nguyenduybinh
Copy link

nguyenduybinh commented Dec 6, 2020

Hi guys, sorry for the late reply.

In the current implementation, The way it does in synchronous sending HTTP requests, But no matter how it won't have so many stability problems.

@lcofre the keepalive_timeout or idle_timeout should be caused by the client, that is, the client does not send a packet to Broker in a timely manner. It is not caused by webhook.

But, the asynchronously feature is a more preferable way, We plan to implement it in the next version.

Thanks for your feedback there.

@HJianBo
Hi HJianBo, if emqx using synchronous to sending HTTP request, the performance of HTTP auth plugin and webhook is so bad. Have you benchmarked the EMQX HTTP auth and webhook with high concurrent connections yet? When i used emqx with HTTP auth and webhook, the performance is so bad, too many timeout, High CPU usage..

@HJianBo
Copy link
Member

HJianBo commented Dec 7, 2020

Indeed. We may release it in version 4.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants