之前有写过一篇博文:
PHP SOCKET_READ函数 PHP_NORMAL_READ和PHP_BINARY_READ两种读取模式的区别
在现实生活当中,为了方便客户端交互,选择二进制安全的读包可以避免留坑,读到什么就是什么。不过,这也就带来了另一个问题,就是经典的粘包问题。因为TCP是流式传输的,上层的应用程序只能从缓冲里面读,但是缓冲里面塞了多少条数据包,应用程序并不知道。TCP协议栈也没有义务来处理这种边界问题,协议层面首先要保证的是传输效率和安全收发的问题。
同样以上文的代码为例,我将SOCKET_READ函数的读取模式改为PHP_BINARY_READ模式。
客户端模拟连续发包:
for ($i = 0; $i < 10; $i++) { $bodyLen = 100; $data = json_encode([ "id" => $i, "data" => range(0, 10) ]); $data =json_encode($data); $message = pack("Na" . $bodyLen, $bodyLen, $data); $len = socket_write($socket, $message); printf("[%d]send len %d \n", $i, $len); waitReply($socket); }
我们可以抓包来看:
03:01:39.649358 IP (tos 0x0, ttl 64, id 61846, offset 0, flags [DF], proto TCP (6), length 156) localhost.59686 > localhost.3333: Flags [P.], cksum 0xfe90 (incorrect -> 0xf41b), seq 1:105, ack 1, win 342, options [nop,nop,TS val 3745655655 ecr 3745655655], length 104 0x0000: 4500 009c f196 4000 4006 4ac3 7f00 0001 E.....@[email protected]..... 0x0010: 7f00 0001 e926 0d05 2f25 8d71 1063 351c .....&../%.q.c5. 0x0020: 8018 0156 fe90 0000 0101 080a df42 2b67 ...V.........B+g 0x0030: df42 2b67 0000 0064 227b 5c22 6964 5c22 .B+g...d"{\"id\" 0x0040: 3a30 2c5c 2264 6174 615c 223a 5b30 2c31 :0,\"data\":[0,1 0x0050: 2c32 2c33 2c34 2c35 2c36 2c37 2c38 2c39 ,2,3,4,5,6,7,8,9 0x0060: 2c31 305d 7d22 0000 0000 0000 0000 0000 ,10]}".......... 0x0070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0090: 0000 0000 0000 0000 0000 0000 ............ 03:01:39.649364 IP (tos 0x0, ttl 64, id 18936, offset 0, flags [DF], proto TCP (6), length 52) localhost.3333 > localhost.59686: Flags [.], cksum 0xfe28 (incorrect -> 0x6967), seq 1, ack 105, win 342, options [nop,nop,TS val 3745655655 ecr 3745655655], length 0 0x0000: 4500 0034 49f8 4000 4006 f2c9 7f00 0001 E..4I.@.@....... 0x0010: 7f00 0001 0d05 e926 1063 351c 2f25 8dd9 .......&.c5./%.. 0x0020: 8010 0156 fe28 0000 0101 080a df42 2b67 ...V.(.......B+g 0x0030: df42 2b67 .B+g 03:01:39.649385 IP (tos 0x0, ttl 64, id 61847, offset 0, flags [DF], proto TCP (6), length 156) localhost.59686 > localhost.3333: Flags [P.], cksum 0xfe90 (incorrect -> 0xf3b2), seq 105:209, ack 1, win 342, options [nop,nop,TS val 3745655655 ecr 3745655655], length 104 0x0000: 4500 009c f197 4000 4006 4ac2 7f00 0001 E.....@[email protected]..... 0x0010: 7f00 0001 e926 0d05 2f25 8dd9 1063 351c .....&../%...c5. 0x0020: 8018 0156 fe90 0000 0101 080a df42 2b67 ...V.........B+g 0x0030: df42 2b67 0000 0064 227b 5c22 6964 5c22 .B+g...d"{\"id\" 0x0040: 3a31 2c5c 2264 6174 615c 223a 5b30 2c31 :1,\"data\":[0,1 0x0050: 2c32 2c33 2c34 2c35 2c36 2c37 2c38 2c39 ,2,3,4,5,6,7,8,9 0x0060: 2c31 305d 7d22 0000 0000 0000 0000 0000 ,10]}".......... 0x0070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0090: 0000 0000 0000 0000 0000 0000 ............
客户端发包没有走缓冲,直接一条条发过去了。那么服务端是不是也这样呢?
我们直接把收到的包打印出来:
buffer"{\"id\":2,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":3,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":4,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":5,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":6,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":7,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":8,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":9,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}" buffer"{\"id\":3,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":4,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":5,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":6,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":7,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":8,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":9,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}" buffer"{\"id\":6,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":7,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":8,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"d"{\"id\":9,\"data\":[0,1,2,3,4,5,6,7,8,9,10]}"
可见由于buffer的存在,所收到的(逻辑)包的个数完全找不到规律。
如此一来,我们只有在业务层自行把包分开。通常来说,包长是双方可以约定的,当然也不一定写死,我们可以把包的某一部分的固定长度定义为包长的存储位置,server端读到后以此为依据拆包。
下面我写了一段简单示例代码:
$data = socket_read($socket, 2048, PHP_BINARY_READ);//PHP_BINARY_READ PHP_NORMAL_READ $readCnt++; $packageLen = strlen($data); $isEnd = ($packageLen === 0);//没有收到下一个包了 处理完剩下的就可以退出了 if ($data === false) { $err = socket_last_error($socket); } $buf .= $data; $header = substr($buf, 0, 4);// 前4个字节为包体长 $bodyLen = unpack("Nlen", $header)['len']; $messageLen = 4 + $bodyLen; if ($packageLen >= $messageLen) { // 接收到了1-N个包 $remainLen = $packageLen; while ($remainLen >= $messageLen) { $body = substr($buf, 4, $bodyLen);// 数据段 $message = unpack(sprintf("a%dbody", $bodyLen), $body)['body'];//解包body handlePackage(trim($message)); $buf = substr($buf, 4 + $bodyLen);//读完的可以删掉了 $remainLen -= $messageLen; } }
如示例代码所示,每个包分两个部分,前面四个字节存储包体长度,然后就根据这个长度可以提取出包体内容。
这样一来就可以愉快的进行下一步的数据处理了。