Implementing WebSocket Protocol in Go
Implementing WebSocket Protocol in Go
The target of this post is to write a simple websocket echo server based on net/http
library. Also understanding HTTP hijacking, binary encoding/decoding in Go. Websocket is a relatively simple protocol to implement. It uses HTTP protocol for initial handshaking. After the handshaking it basically uses raw TCP to read/write data. We’ll be using the Websocket Protocol Specification as a reference to the implementation.
Full source code is available here
Overview
The implementation can divided into 4 parts:
- Opening handshake
- Receive data frames from client
- Send data frames to client
- Closing handshake
Limitations of this implementation:
- Doesn’t validate UTF-8 encoded fragments
- Doesn’t handle compression
Handshaking
At first we setup a HTTP server using Go’s net/http
package. Then we attach a handler to listen to any incoming http requests.
The initial handshake request has to be started by the client, so we need interpret the client request to make sure if it’s a websocket request or a normal http request. The handshake from the client looks as follows:
All the requirements for a valid opening handshake request is described in the spec here
Hijacking HTTP Request
Once we know that it’s a websocket request, the server needs to reply back with a handshake response. But we can’t write back the response using the http.ResponseWriter
as it will also close the underlying tcp connection once we start sending the response. What we need is called HTTP Hijacking. Hijacking allows us to take over the underlying tcp connection handler and bufioWriter. This gives us the freedom to read and write data at will without closing the tcp connection.
Server Handshake Response
Now to complete the handshake server must response back with appropriate headers. The handshake response looks like following
The value of Sec-WebSocket-Accept
is calculated as following:
For this header field(Sec-WebSocket-Key), the server has to take the value (as present in the header field, e.g., the base64-encoded [RFC4648] version minus any leading and trailing whitespace) and concatenate this with the Globally Unique Identifier (GUID, [RFC4122]) “258EAFA5-E914-47DA- 95CA-C5AB0DC85B11” in string form, which is unlikely to be used by network endpoints that do not understand the WebSocket Protocol. A SHA-1 hash (160 bits) [FIPS.180-3], base64-encoded (see Section 4 of [RFC4648]), of this concatenation is then returned in the server’s handshake.
We then write back these headers back to client. Note the \r\n
after each header and empty blank line after all the headers.
Data Frame Transfer
After completing the handshake without any error, we are ready to read/write data from/to the client. Websocket spec defines a specific frame format to be used between client & servers. Bit patterns of each frame is described below.
The spec also defines how to decode the client payload using the masking key here. Based on these information it’s pretty easy to define the decoder & encoder functions:
Decoding Steps:
- Read first two-bytes a. find if the frame is a fragment b. find opcode c. find if the payload is masked d. find the payload
length
- if
length
is less than 126, goto step#5 - if
length
equals to 126, read next two bytes in network byte order. This is the new payloadlength
value - if
length
equals to 127, read next eight bytes in network byte order. This is the new payloadlength
value - Read next 4 bytes as masking key
- Read next
length
bytes as masked payload data - Decode the masked payload with masking key
Encoding Steps:
- make a slice of bytes of length 2
- Save fragmentation & opcode information in first byte
- if payload
length
is less than 126, store thelength
in second byte - if payload
length
is greater than 125 and less than 2^16: a. store 126 in second byte b. convert the payloadlength
into a 2-byte slice in network byte order c. append the length bytes to the header bytes - if payload
length
is greater than 2^16 a. store 127 in second byte b. convert the payloadlength
into a 8-byte slice in network byte order c. append the length bytes to the header bytes - Finally append the payload data without masking
Closing Handshake
Closing is done by sending a close frame with close status as payload. An optional close reason can be also sent in the payload. If client initiates the closing sequence,then the server should also send a corresponding close frame in response. Finally the underlying TCP connection is closed.
Testing our implementation
AutobahnTestsuite has comprehensive testsuites for testing the Websocket protocol with the specification. The full report for our websocket implementation is available here https://hassansin.github.io/go-websocket-echo-server/reports/.
Resources:
- https://tools.ietf.org/html/rfc6455
- https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_servers