Go Panic

Table of Contents

重要性
#

当 panic 发生时，如果当前协程没有捕获 panic，go 进程就会推出
日志里不会记录某个协程的 panic 信息，导致错误溯源、排查不方便
服务挂了自动重启失败

对于 Java 语言，Exception 可以逐层向外抛出，直到最外层，而 Go 的设计思想则与之不同

运行示例
#

跨协程调用 recover() 函数
#

// Recover outside a goroutine
defer func() {
   log.Println(recover())
}()
...
go func(){
   // Panic occurs in a goroutine
   panic("A bad boy stole a server")
}()

结果：无法捕获 panic

defer 中调用包装了 recover() 的工具函数
#

func getItem() {
   go func() {
      // Call Recover() Tool Function.
      defer func() {
         Recover("Panic in goroutine")
      }()
      // Panic here.
      panic("A bad boy stole the server")
      // Test the panic result.
      fmt.Println("Will NOT Reach Here")
   }()
}

func Recover(funcName string) {
   if err := recover(); err != nil {
      // If the panic is catched, 
      log.Printf("panic para: %v, panic info: %v\n", funcName, err)
   }
}

结论：无法捕获 panic

Call the recover() in a deferred function.
#

go func() {
   defer func() { // f1
      recover()
   }()
   panic("A bad boy stole a server")
}()

结果：捕获成功

Call the recover() directly.
#

// Call the recover() directly.
go func() { // f2
   defer recover() 
   panic("A bad boy stole a server, again!")
}()

结果：无法捕获

#

官方文档中的说明： The return value of recover is nil if any of the following conditions holds:

panic’s argument was nil;
the goroutine is not panicking;
recover was not called directly by a deferred function.

而在 go 的源码中，recover() 函数有这样一段注释

// ...... If recover is called outside the deferred function it will
// not stop a panicking sequence. In this case, or when the goroutine is not
// panicking, or if the argument supplied to panic was nil, recover returns
// nil. Thus the return value from recover reports whether the goroutine is
// panicking.
func recover() any

可见，只有当有 deferred function 直接调用 recover() 时，recover 才能返回非 nil，否则不能捕获 panic，返回 nil

panic 过程
#

panic 的过程实际上是由编译器将关键字 panic 转换成 runtime.gopanic() 内置函数。在 runtime.gopanic() 中，首先会创建一个 _panic 结构体用来记录当前 panic，并且将当前 panic 加入当前 goroutine的_panic 链表，再采用循环从当前 goroutine 的 _defer 链表中获取 runtime._defer 并调用 runtime.reflectcall() 运行延迟调用函数，如果 _defer 链表为空，则会调用 runtime.fatalpanic 中止整个程序。

_panic 结构体：

type _panic struct {
    argp unsafe.Pointer
    arg interface {}
    link * _panic
    recovered bool
    aborted bool

    pc uintptr
    sp unsafe.Pointer
    goexit bool
}

runtime.gopanic() 内置函数：

// Transform the panic() to gopanic().
func gopanic(e interface {}) {
    gp := getg()
    ...
    // Build a _panic struct to describe this panic
    var p _panic
    p.arg = e
    p.link = gp._panic
    gp._panic = ( * _panic)(noescape(unsafe.Pointer( & p)))
    // Use a loop to retrieve the _defer linked list.
    for {
        d := gp._defer
        // If the linked list is empty, quit the loop.
        if d == nil {
            break
        }

        d._panic = ( * _panic)(noescape(unsafe.Pointer( & p)))
        // Run the deferred function.
        reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))

        d._panic = nil
        d.fn = nil
        gp._defer = d.link

        freedefer(d)
        if p.recovered { // Guess the role of this code block :)
            ...
        }
    }
    // No defer handle, exit the whole program.
    fatalpanic(gp._panic) * ( * int)(nil) = 0 // exit(2)
}

recover 函数实现原理：与 panic 相似，编译器会将关键字 recover 转换成 runtime.gorecover() 内置函数。在这个内置函数中，首先会检查是否当前有 panic 发生，如果有，则取出表示该次 panic 的 _panic 结构体，将其中的 recovered 字段置为 true，表示已 recover，随后返回即将调用的 defer 函数的参数列表起始地址（这句话难理解没关系，后文会进行解释）。

func gorecover(argp uintptr) interface {} {
    p := gp._panic // Get the _panic struct
    if p != nil && !p.recovered && argp == uintptr(p.argp) {
        p.recovered = true // Set the recovered field
        return p.arg
    }
    return nil
}

程序的恢复其实也是由 runtime.gopanic() 函数负责的，在上面 runtime.gopanic() 的代码中，有一块代码是检测 _panic 结构体的 recovered 字段作为执行条件的。

这段代码块首先从 _defer 结构体中（顺序执行到 defer 语句时生成的记录 defer 的结构体）取出了程序计数器 pc 和栈指针 sp，当 runtime.gorecover() 函数执行返回后，会进入程序块里调用 runtime.recovery() 内置函数，在这个内置函数里会继续调用 runtime.gogo() 内置函数，此时程序已经根据 pc 和 sp 回到了 defer 函数的调用位置。最后调用 runtime.deferreturn() 内置函数，在这个函数里，当前 goroutine 的 pc 和 sp 会被指向 defer 语句后面接着的函数返回前的指令位置，此时就不再会回到前面的 runtime.fatalpanic() 的执行了，故此时 panic 就被成功 recover 了。

// Get the pc and sp from _defer struct.
pc := d.pc
sp := unsafe.Pointer(d.sp)

...
if p.recovered { // Has handled before by runtime.gorecover()
    gp._panic = p.link
    for gp._panic != nil && gp._panic.aborted {
        gp._panic = gp._panic.link
    }
    if gp._panic == nil {
        gp.sig = 0
    }
    gp.sigcode0 = uintptr(sp)
    gp.sigcode1 = pc
    mcall(recovery) // Call runtime.recovery()
    throw ("recovery failed")
}

注意到在 runtime.gorecover() 内置函数中，有 argp == uintptr(p.argp) 这一个判定条件，阅读 runtime.gopanic() 的源码可知，p.argp 在 runtime.gopanic() 中已赋值为即将调用的 deferred 函数的参数列表起始地址，而 runtime.gorecover() 内置函数的参数 argp 则为 recover 的上一层函数的参数列表起始地址，此时若嵌套调用或直接调用，则他们的起始地址都不会一致，因此不会设置 recovered 字段，故也不会进行后续流程。