golang-http-client-time-wait

Golang HTTP 客户端连接池的性能问题

前一阵子使用Golang在开发一个微服务,通过HTTP协议与其他依赖模块交互。当时测试时遇到了一个性能问题,现在记录一下,以做备忘。

问题模型

服务端代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
package main

import (
"io/ioutil"
"net/http"
)

const (
BACKEND = "http://www.baidu.com"
)

func Handle(w http.ResponseWriter, r *http.Request) {
resp, err := http.Get(BACKEND)
if err != nil {
w.WriteHeader(500)
w.Write([]byte(err.Error()))
return
}
defer resp.Body.Close()
buf, _ := ioutil.ReadAll(resp.Body)
w.WriteHeader(200)
w.Write(buf)
}

func main() {
http.HandleFunc("/", Handle)
http.ListenAndServe(":8080", nil)
}

客户端代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
package main

import (
"io/ioutil"
"log"
"net/http"
"sync"
"sync/atomic"
"time"
)

func main() {
url := "http://localhost:8080"
dur := 1000 * time.Millisecond
var wg sync.WaitGroup
var count uint64
for i := 0; i < 10; i = i + 1 {
go func(url string, i int) {
wg.Add(1)
defer wg.Done()
for {
time.Sleep(dur)
resp, err := http.Get(url)
if err != nil {
continue
}
ioutil.ReadAll(resp.Body)
resp.Body.Close()
atomic.AddUint64(&count, 1)
//log.Printf("routine %d read successfully\n", i)
}
}(url, i)
}
go func() {
var last uint64
for {
last = count
time.Sleep(time.Second)
log.Printf("current count delta: %d\n", count - last)
}
}()
wg.Wait()
}

(Mac)操作系统上观察到的现象:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
➜  playground lsof -nP -iTCP@61.135.169.121:80
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
___go_bui 40920 zhaoxingli 6u IPv4 0xe8bf53cced43833f 0t0 TCP 192.168.199.102:59014->61.135.169.121:80 (ESTABLISHED)
___go_bui 40920 zhaoxingli 10u IPv4 0xe8bf53cce7d15d3f 0t0 TCP 192.168.199.102:59005->61.135.169.121:80 (ESTABLISHED)
➜ playground lsof -nP -iTCP@61.135.169.121:80
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
___go_bui 40920 zhaoxingli 6u IPv4 0xe8bf53cced43833f 0t0 TCP 192.168.199.102:59014->61.135.169.121:80 (ESTABLISHED)
___go_bui 40920 zhaoxingli 8u IPv4 0xe8bf53cce35fa33f 0t0 TCP 192.168.199.102:59022->61.135.169.121:80 (ESTABLISHED)
___go_bui 40920 zhaoxingli 10u IPv4 0xe8bf53cce7d15d3f 0t0 TCP 192.168.199.102:59005->61.135.169.121:80 (ESTABLISHED)
➜ playground lsof -nP -iTCP@61.135.169.121:80
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
___go_bui 40920 zhaoxingli 6u IPv4 0xe8bf53cced43833f 0t0 TCP 192.168.199.102:59014->61.135.169.121:80 (ESTABLISHED)
___go_bui 40920 zhaoxingli 8u IPv4 0xe8bf53cce35fa33f 0t0 TCP 192.168.199.102:59022->61.135.169.121:80 (ESTABLISHED)
➜ playground

上面是我在本地Mac上做的实验,分别贴了服务端与测试客户端代码。由上图可以看到,服务端收到10个并发请求并代理到后端baidu。但问题是,服务端与baidu之间的连接池却不是稳定的长连接,有的连接刚刚建立,有的一会儿就断开了,似乎长连接在这里并没有实际起作用?而在实际生产环境中,还能看到主机上的TIME_WAIT连接数在一直上升,这里就不贴实际代码和细节了。

寻找原因

经过Google搜索,发现golang HTTP Client的这个连接池问题已经有很多人吐槽过了,原因也很简单: 在HTTP库中有个Transport结构体,提供了底层的连接池,这个结构体中有个字段用于控制每个远端节点最大空闲连接数量默认为2(超出的部分就要释放,因此长连接便退化成了短链接)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// DefaultTransport is the default implementation of Transport and is
// used by DefaultClient. It establishes network connections as needed
// and caches them for reuse by subsequent calls. It uses HTTP proxies
// as directed by the $HTTP_PROXY and $NO_PROXY (or $http_proxy and
// $no_proxy) environment variables.
var DefaultTransport RoundTripper = &Transport{
Proxy: ProxyFromEnvironment,
DialContext: (&net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
DualStack: true,
}).DialContext,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
}

// DefaultMaxIdleConnsPerHost is the default value of Transport's
// MaxIdleConnsPerHost.
const DefaultMaxIdleConnsPerHost = 2
...
type Transport struct {
...
// MaxIdleConnsPerHost, if non-zero, controls the maximum idle
// (keep-alive) connections to keep per-host. If zero,
// DefaultMaxIdleConnsPerHost is used.
MaxIdleConnsPerHost int
...
}
...
func (t *Transport) maxIdleConnsPerHost() int {
if v := t.MaxIdleConnsPerHost; v != 0 {
return v
}
return DefaultMaxIdleConnsPerHost
}

当我们使用http.Get方法的时候,通过默认HTTP Client调用了默认Transport,因此连接数量也受这里的DefaultMaxIdleConnsPerHost所限制。

那么,在实际的HTTP请求过程中,这个MaxIdleConnsPerHost是如何导致长连接退化成短链接的呢?请看代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// tryPutIdleConn adds pconn to the list of idle persistent connections awaiting
// a new request.
// If pconn is no longer needed or not in a good state, tryPutIdleConn returns
// an error explaining why it wasn't registered.
// tryPutIdleConn does not close pconn. Use putOrCloseIdleConn instead for that.
func (t *Transport) tryPutIdleConn(pconn *persistConn) error {
...
key := pconn.cacheKey
...
idles := t.idleConn[key]
if len(idles) >= t.maxIdleConnsPerHost() {
return errTooManyIdleHost
}
...
return nil
}

当并发量大的情况下,连接池会创建较多的TCP连接,并且在请求完成以后连接池尝试通过tryPutIdleConn归还空闲连接。从代码中可以看出,对于超出maxIdleConnsPerHost数量的空闲长连接都不能再放回连接池了,因此这些长连接也只好将其关闭了。

解决方案

知道了问题的源头,解决也就简单了:创建新的HTTP Client,设置它的maxIdleConnsPerHost就行了。

1
2
3
4
5
6
7
8
9
10
defaultRoundTripper := http.DefaultTransport
defaultTransportPointer, ok := defaultRoundTripper.(*http.Transport)
if !ok {
panic(fmt.Sprintf("defaultRoundTripper not an *http.Transport"))
}
defaultTransport := *defaultTransportPointer // dereference it to get a copy of the struct that the pointer points to
defaultTransport.MaxIdleConns = 100
defaultTransport.MaxIdleConnsPerHost = 100

myClient := &http.Client{Transport: &defaultTransport}

结论

遇到问题要灵活运用基础知识,大胆试验。