Quantcast
Viewing all articles
Browse latest Browse all 23908

Re: httpchk failures

Benjamin Smith <lists@...> writes:

>
> Igor,
>
> Thanks for the response; I didn't see this email until just now as it
didn't
> go through the mailing list and so wasn't filtered as expected.
>
> I spent my morning trying everything I could think of to get haproxy's
agent-
> check to work consistently. The main symptom is that haproxy would
mark hosts
> with the status of "DRAIN" and provide no clues as to why, even with
log-
> health-checks on. After a *lot* of trial and error, I've found the
following
> that seem to be bugs, on the latest 1.5.11 release, running on CentOS
6.
>
> 1) agent-check output words sometimes handled inconsistently, ignored,
or
> misunderstood if " " was used instead of "," as a separator.
>
> This is understood:
> echo "ready,78%\r\n"
>
> This line often causes a DRAIN state. A restart of haproxy was
insufficient to
> clear the DRAIN state: (see #3)
> echo "ready 78%\r\n"
>
> 2) Inconsistent logging of DRAIN status change when health logging was
on.
> (the server would turn blue in the stats page without any logging as
to why.
> Logging status would sometimes say " Server $service/$name is UP
(leaving
> forced drain)" even as the stats page continues to report DRAIN state!
>
> 3) Even when the agent output was amended as above, for hosts that
were set to
> the DRAIN state pursuant to #1 issue were not brought back to ready/up
state
> until "enable health $service/$host" and/or "enable agent
$service/$host" was
> sent to the stats port.
>
> 4) Setting the server weight to 10 seems to help a significant amount.
If, in
> fact, haproxy can't handle 35% of 1 it should throw an error on
startup IMHO.
>
> See also my comments interspersed below:
>
> Thanks,
>
> Benjamin Smith
>
> On Tuesday, April 14, 2015 10:50:31 AM you wrote:
> > On Tue, Apr 14, 2015 at 10:11 AM, Igor Cicimov <
> >
> > igorc@...> wrote:
> > > On Tue, Apr 14, 2015 at 5:00 AM, Benjamin Smith <lists@...>
> > >
> > > wrote:
> > >> We have 5 Apache servers behind haproxy and we're trying to
enable use
> > >> the
> > >> "httpchk" option along with some performance monitoring. For some
reason,
> > >> haproxy keeps thinking that 3/5 apache servers are "down" even
though
> > >> it's
> > >> obvious that haproxy is both asking the questions and the servers
are
> > >> answering.
> > >>
> > >> Is there a way to log httpchk failures? How can I ask haproxy why
it
> > >> seems to
> > >> think that several apache servers are down?
> > >>
> > >> Our config:
> > >> CentOS 6.x recently updated, 64 bit.
> > >>
> > >> Performing an agent-check manually seems to give good results.
The below
> > >> result is immediate:
> > >> [root <at> xr1 ~]# telnet 10.1.1.12 9333
> > >> Trying 10.1.1.12...
> > >> Connected to 10.1.1.12.
> > >> Escape character is '^]'.
> > >> up 78%
> > >> Connection closed by foreign host.
> > >>
> > >>
> > >> I can see that xinetd on the logic server got the response:
> > >> Apr 13 18:45:02 curie xinetd[21890]: EXIT: calcload333 status=0
pid=25693
> > >> duration=0(sec)
> > >> Apr 13 18:45:06 curie xinetd[21890]: START: calcload333 pid=26590
> > >> from=::ffff:10.1.1.1
> > >>
> > >>
> > >> I can see that apache is serving happy replies to the load
balancer:
> > >> [root <at> curie ~]# tail -f /var/log/httpd/access_log | grep -i
"10.1.1.1 "
> > >> 10.1.1.1 - - [13/Apr/2015:18:47:15 +0000] "OPTIONS / HTTP/1.0"
302 - "-"
> > >> "-"
> > >> 10.1.1.1 - - [13/Apr/2015:18:47:17 +0000] "OPTIONS / HTTP/1.0"
302 - "-"
> > >> "-"
> > >> 10.1.1.1 - - [13/Apr/2015:18:47:19 +0000] "OPTIONS / HTTP/1.0"
302 - "-"
> > >> "-"
> > >> ^C
> > >
> > > I have a feeling you might have been little bit confused here. Per
my
> > > understanding, and your configuration:
> > >
> > > server server10 10.1.1.10:20333 maxconn 256 *check agent-check
agent-port
> > > 9333 agent-inter 4000*
> > >
> > > the HAP is doing a health check on the agent you are using and not
on the
> > > Apache so the apache response in this case looks irrelevant to me.
I don't
> > > know how did you setup the agent since you haven't posted that
part but
> > > this is an excellent article by Malcolm Turnbull, the inventor of
> > > agent-check, that might help:
> > >
> > >
> > > http://blog.loadbalancer.org/open-source-windows-service-for-
reporting-ser
> > > ver-load-back-to-haproxy-load-balancer-feedback-agent/
>
> We used this exact blog entry as our starting point. In our case, the
xinetd
> script compares load average, apache process count, cpu info and a
little salt
> to come up with a number ranging from 0% to 500%.
>
> > and press enter twice and check the output. Other option is using
curl:
> >
> > $ curl -s -S -i --http1.0 -X OPTIONS http://10.1.1.12:20333/
>
> [root <at> xr1 ~]# curl -s -S -i --http1.0 -X OPTIONS
http://10.1.1.12:20333
> HTTP/1.1 302 Found
> Date: Tue, 14 Apr 2015 23:39:40 GMT
> Server: Apache/2.2.15 (CentOS)
> X-Powered-By: PHP/5.3.3
> Set-Cookie: PHPSESSID=3ph0dvg4quebl1b2e711d8i5p1; path=/; secure
> Cache-Control: public, must-revalidate, max-age=0
> X-Served-By: curie.-SNIP-
> Location: /mod.php/index.php
> Vary: Accept-Encoding
> Content-Length: 0
> Connection: close
> Content-Type: text/html; charset=UTF-8
>
> > and some variations of the above that I often use to check the
headers only:
> >
> > $ curl -s -S -I --http1.0 -X OPTIONS http://10.1.1.12:20333/
> > $ curl -s -S -D - --http1.0 -X OPTIONS http://10.1.1.12:20333/
> >
> > You can also try the health check with HTTP/1.1 version which
provides
> > keepalive but you need to specify the Host header in that case.
> >
> > By the way, any errors in the haproxy logs? Maybe set the log mode
to debug?
>
> Originally there was very little useful data in the log files at all.
Adding
> the log-health-checks helped but it's still frustratingly incomplete.
>
>

I ran into a similar scenario yesterday on 'HA-Proxy version 1.5.4
2014/09/02'. We had a situation where a radosgw on one of the backend
servers was not releasing memory and apache was occasionally showing 500
error messages in the apache log, rados had stopped logging so we wanted
to stop new connections going to the server and then restart radosgw.
Initially we issued the 'server disable' command to the haproxy socket,
and saw in the health status that the server was in MAINT mode, but
didnt see new connections stop arriving at the server, so we waited and
watched it and about 30 minutes later we issued the 'status drain'
command to the socket, and immediately saw connections stop arriving.
We restarted the radosgw service at this point. We then issued the
'status ready' command to the socket and saw connections start arriving
again, but the server was flip-flopping between MAINT and DRAIN states
in the health status output. It was only after I re-issued the 'server
enable' command twice to the socket that the server appeared in the
health status output as being consistently colored green in its row, and
showing UP status. However I still saw the following in the logs after
issuing the last 'server enable' command.

from: haproxy-status.log:
+++++
Server swift_cluster/css-host1-036 is UP/READY (leaving forced
maintenance).
Server swift_nonssl_cluster/css-host1-036 is UP (leaving forced drain).
+++++

The hosts still appear to be receiving new connections, even before i
re-issued the 'server enable' command again to the socket and the health
status output was showing the host flip-flopping between the MAINT and
DRAIN states. Everything seems ok/UP now, and has been stable for the
last 12 hours, with no apparent change in the health status output.

....hope this helps...

Lucky

Viewing all articles
Browse latest Browse all 23908

Trending Articles