Skelanimals Blog

在了解浏览器工作原理时，理解HTTP协议是基础中的基础。在本篇文章中，我将手动实现一个HTTP请求和响应解析器。

参考资料：IETF2616 HTTP标准文档

HTTP协议基础回顾

在开始编码之前，我们先回顾一下HTTP协议的基本结构，这对后续的理解很重要。

请求报文首部的结构如下

请求行
- 请求方法
- PATH
- HTTP 版本
```
GET /path HTTP/1.1
```
请求首部字段
通用首部字段
实体首部字段
其他

响应报文首部

状态行
- HTTP 版本
- 状态码
- 原因短语
```
HTTP/1.1 200 OK
```
响应首部字段
通用首部字段
实体首部字段
其他

搭建测试环境

在开始实现解析器之前，我们需要一个简单的HTTP服务器来测试我们的代码。这里我用Node.js快速搭建了一个：

const http = require('node:http')

const server = http.createServer((req, res) => {
  res.setHeader('Content-Type', 'text/html')
  res.setHeader('X-Foo', 'bar')
  res.writeHead(200, { 'Content-Type': 'text/plain' })
  console.log('request received')
  res.end('ok')
})

server.listen(8088)
console.log('Server running at http://localhost:8088/')

这个服务器很简单，但足够我们测试HTTP请求和响应解析了。

核心实现：手动构建HTTP客户端

第一步：实现Request类

我选择使用Node.js的net模块而不是http模块，这样能更深入地理解底层实现：

我使用的node版本是16.14.0 不同版本可能会有差异

class Request {
  constructor(options) {
    // 设置默认值
    const defaultOptions = {
      method: 'GET',
      body: {},
      host: 'localhost',
      path: '/',
      port: 80,
      headers: {
        'Content-Type': 'application/x-www-form-urlencoded',
      },
    }

    // 合并用户配置和默认配置
    options = {
      ...defaultOptions,
      ...options,
    }

    // 将配置映射到实例属性
    Object.keys(defaultOptions).forEach((key) => {
      this[key] = options[key]
    })

    // 处理不同Content-Type的请求体
    this.processBody()
  }

  processBody () {
    if (this.headers['Content-Type'] === 'application/json') {
      // POST请求：JSON格式
      this.bodyText = JSON.stringify(this.body)
    }
    else if (this.headers['Content-Type'] === 'application/x-www-form-urlencoded') {
      // GET请求：URL编码格式
      this.bodyText = Object.keys(this.body)
        .map(key => `${key}=${encodeURIComponent(this.body[key])}`)
        .join('&')
      this.headers['Content-Length'] = this.bodyText.length
    }
  }

  // 格式化HTTP请求报文
  toString () {
    return `${this.method} ${this.path} HTTP/1.1\r
${Object.keys(this.headers)
        .map(key => `${key}: ${this.headers[key]}`)
        .join('\r\n')}\r
\r
${this.bodyText}`
  }

  // 发送HTTP请求
  send () {

      // 创建新的连接
      const connection = net.createConnection(
        {
          host: this.host,
          port: this.port,
        },
        () => {
          console.log('Connected to server!')
          connection.write(this.toString())
        }
      )

      // 处理响应数据
      connection.on('data', (data) => {
        console.log("data", data.toString())
        connection.end()
      })

      // 错误处理
      connection.on('error', (err) => {
        console.log("error", err)
        connection.end()
      })

      // 连接结束
      connection.on('end', () => {
        console.log('Disconnected from server')
      })
  }
}

测试Request类

让我们测试一下这个Request类：

(function () {
  let request = new Request({
    method: 'POST',
    host: '127.0.0.1',
    port: 8088,
    body: {
      a: '1',
    },
    headers: {
      'Content-Type': 'application/x-www-form-urlencoded',
    },
    path: '/',
  })

  request.send()
})()

运行后，我们得到了这样的HTTP响应：

HTTP/1.1 200 OK
Content-Type: text/plain
X-Foo: bar
Date: Mon, 18 May 2020 03:45:46 GMT
Connection: keep-alive
Transfer-Encoding: chunked

2
ok
0

关于响应格式的说明：

第一行是状态行：HTTP/1.1 200 OK
中间是响应头字段
空行分隔头部和响应体
2 和 0 是chunked编码的长度标识：
- 2 表示接下来有2个字符的内容
- ok 是实际内容
- 0 表示响应结束

核心：状态机解析HTTP响应

我们需要用状态机来解析这个HTTP响应。我设计了一个7状态的状态机：

状态机设计思路

class ResponseParser {
  constructor() {
    // 定义7个状态，对应HTTP响应的不同部分
    this.WAITING_STATUS_LINE = 0 // 等待状态行
    this.WAITING_STATUS_LINE_END = 1 // 等待状态行结束
    this.WAITING_HEADER_NAME = 2 // 等待头部字段名
    this.WAITING_HEADER_SPACE = 3 // 等待头部字段名后的空格
    this.WAITING_HEADER_VALUE = 4 // 等待头部字段值
    this.WAITING_HEADER_LINE_END = 5 // 等待头部行结束
    this.WAITING_HEADER_BLOCK_END = 6 // 等待头部块结束
    this.WAITING_BODY = 7 // 等待响应体

    this.current = this.WAITING_STATUS_LINE // 初始状态

    // 初始化解析结果
    this.headers = {}
    this.headerName = ''
    this.headerValue = ''
    this.statusLine = ''
    this.bodyParser = null
  }

  // 判断解析是否完成
  get isFinished() {
    return this.bodyParser && this.bodyParser.isFinished
  }

  // 获取格式化后的响应内容
  get response() {
    this.statusLine.match(/HTTP\/1.1 (\d+) ([\s\S]+)/)
    return {
      statusCode: RegExp.$1,
      statusText: RegExp.$2,
      headers: this.headers,
      body: this.bodyParser.content.join(''),
    }
  }

  // 接收数据的主入口
  receive(string) {
    for (let c of string) {
      this.receiveChar(c)
    }
  }

  // 状态机的核心：逐字符处理
  receiveChar(char) {
    if (this.current === this.WAITING_STATUS_LINE) {
      if (char === '\r') {
        // 遇到回车符，进入等待状态行结束状态
        this.current = this.WAITING_STATUS_LINE_END
      }
      else if (char === '\n') {
        // 遇到换行符，直接进入解析头部状态
        this.current = this.WAITING_HEADER_NAME
      }
      else {
        // 其他字符，拼接到状态行
        this.statusLine += char
      }
    }
    else if (this.current === this.WAITING_STATUS_LINE_END) {
      if (char === '\n') {
        // 确认状态行结束，进入解析头部状态
        this.current = this.WAITING_HEADER_NAME
      }
    }
    else if (this.current === this.WAITING_HEADER_NAME) {
      if (char === ':') {
        // 遇到冒号，头部字段名结束，等待空格
        this.current = this.WAITING_HEADER_SPACE
      }
      else if (char === '\r') {
        // 遇到回车符，头部即将结束
        this.current = this.WAITING_HEADER_BLOCK_END
        // 如果是chunked编码，创建对应的body解析器
        if (this.headers['Transfer-Encoding'] === 'chunked') {
          this.bodyParser = new TrunkedBodyParser()
        }
      }
      else {
        // 其他字符，拼接到头部字段名
        this.headerName += char
      }
    }
    else if (this.current === this.WAITING_HEADER_BLOCK_END) {
      if (char === '\n') {
        // 头部结束，进入等待响应体状态
        this.current = this.WAITING_BODY
      }
    }
    else if (this.current === this.WAITING_HEADER_SPACE) {
      if (char === ' ') {
        // 遇到空格，进入等待头部字段值状态
        this.current = this.WAITING_HEADER_VALUE
      }
    }
    else if (this.current === this.WAITING_HEADER_VALUE) {
      if (char === '\r') {
        // 遇到回车符，当前头部行结束
        this.current = this.WAITING_HEADER_LINE_END
        // 保存当前头部字段
        this.headers[this.headerName] = this.headerValue
        // 清空临时变量
        this.headerName = this.headerValue = ''
      }
      else {
        // 其他字符，拼接到头部字段值
        this.headerValue += char
      }
    }
    else if (this.current === this.WAITING_HEADER_LINE_END) {
      if (char === '\n') {
        // 确认头部行结束，继续解析下一个头部
        this.current = this.WAITING_HEADER_NAME
      }
    }
    else if (this.current === this.WAITING_BODY) {
      // 响应体解析交给专门的解析器
      this.bodyParser.receiveChar(char)
    }
  }
}

响应体解析器：处理Chunked编码

对于chunked编码的响应体，我们需要另一个状态机：

class TrunkedBodyParser {
  constructor() {
    // 定义5个状态，对应chunked编码的不同部分
    this.WAITING_LENGTH = 0 // 等待长度
    this.WAITING_LENGTH_LINE_END = 1 // 等待长度行结束
    this.READING_TRUNK = 2 // 读取数据块
    this.WAITING_NEW_LINE = 3 // 等待新行
    this.WAITING_NEW_LINE_END = 4 // 等待新行结束

    this.length = 0 // 当前数据块长度
    this.content = [] // 存储解析的内容
    this.isFinished = false // 是否解析完成
    this.current = this.WAITING_LENGTH // 初始状态
  }

  // 逐字符处理响应体
  receiveChar(char) {
    if (this.current === this.WAITING_LENGTH) {
      if (char === '\r') {
        // 长度行即将结束
        if (this.length === 0) {
          // 长度为0表示响应结束
          this.isFinished = true
        }
        this.current = this.WAITING_LENGTH_LINE_END
      }
      else {
        // 解析16进制长度
        this.length *= 16
        this.length += Number.parseInt(char, 16)
      }
    }
    else if (this.current === this.WAITING_LENGTH_LINE_END) {
      if (char === '\n') {
        // 长度行结束，开始读取数据
        this.current = this.READING_TRUNK
      }
    }
    else if (this.current === this.READING_TRUNK) {
      if (/[^\r\n]/.test(char)) {
        // 过滤掉换行符，只保存实际内容
        this.content.push(char)
      }
      this.length--
      if (this.length === 0) {
        // 当前数据块读取完毕，等待下一个数据块
        this.current = this.WAITING_NEW_LINE
      }
    }
    else if (this.current === this.WAITING_NEW_LINE) {
      if (char === '\r') {
        // 等待行结束
        this.current = this.WAITING_NEW_LINE_END
      }
    }
    else if (this.current === this.WAITING_NEW_LINE_END) {
      if (char === '\n') {
        // 行结束，回到等待长度的状态，处理下一个数据块
        this.current = this.WAITING_LENGTH
      }
    }
  }
}

整合测试：完整的HTTP客户端

现在让我们把Request类和ResponseParser整合起来：

// 修改Request类的send方法
 send () {
    return new Promise((resolve, reject) => {
      const parser = new ResponseParser()

      // 创建新的TCP连接
      const connection = net.createConnection(
        {
          host: this.host,
          port: this.port,
        },
        () => {
          console.log('Connected to server!')
          connection.write(this.toString())
        }
      )

      // 处理响应数据
      connection.on('data', (data) => {
        console.log('Response data:', data.toString())
        parser.receive(data.toString())
        console.log('isFinished:', parser?.isFinished)
        if (parser.isFinished) {
          console.log('Parsed response:', parser.response)
          resolve(parser.response)
          connection.end()
        }
      })

      // 错误处理
      connection.on('error', (err) => {
        console.log('Connection error:', err)
        reject(err)
        connection.end()
      })

      // 连接结束
      connection.on('end', () => {
        console.log('Disconnected from server')
        reject(new Error('Disconnected from server'))
      })

    })
  }

运行测试，我们得到了解析结果：

{
  statusCode: '200',
  statusText: 'OK',
  headers: {
    'Content-Type': 'text/plain',
    'X-Foo': 'bar',
    Date: 'Mon, 18 May 2020 04:52:08 GMT',
    Connection: 'keep-alive',
    'Transfer-Encoding': 'chunked'
  },
  body: 'ok'
}

总结

1. 状态机的应用

状态机是解析结构化文本的强大工具
每个状态都有明确的职责和转换条件
通过状态转换，我们可以优雅地处理复杂的解析逻辑

2. HTTP协议的细节

HTTP报文有严格的格式要求
每个字符都有其特定的含义
理解协议细节有助于调试和优化

后续

这次我们成功解析了HTTP响应，下次我将带大家学习如何用有限状态机对HTML进行词法和语法分析，从而构建一个DOM树。这将帮助我们理解浏览器是如何解析HTML的。