某运营商AP频繁上下线故障处理案例

  一、故障现象

  某运营商学校局点AP频繁上下线,导致用户体验很差,产生投诉。

  二、故障分析

  1、选中其中一个AP(yy-zyybfl1f_4),分别在AC侧和AP侧抓取debug信息。在AP的debug信息中并未查到AP下线的信息,在AC的debug信息中查询到AP Name:yy-zyybfl1f_4 频繁下线,下线原因均是:Neighbor Dead Timer Expire 。

  2、分析AC侧debug信息:

  *Nov 13 08:48:23:043 2012 SDZIB-WLAN-AC79-WX6108-AC1 LWPS/7/Timer: [APID: 64] Neighbor-Dead timer refreshed

  *Nov 13 08:48:23:043 2012 SDZIB-WLAN-AC79-WX6108-AC1 LWPS/7/Pkt_Send:

  Sent Echo Response to 172.18.95.141 (Length: 14) //AC向AP连续两次回应Echo Response 报文,但AP应该是没有收到该应答报文,所以AP下线。

  3、分析AP侧debug信息:

  *Nov 9 17:52:58:609 2012 WA2610E-GNP LWPC/7/Pkt_Send:

  Sent Echo Request to 111.17.235.189 (Length: 14)

  04 00 00 08 00 00 16 21 00 00 0c 51 20 37

  *Nov 9 17:52:58:613 2012 WA2610E-GNP LWPC/7/Pkt_Rcvd:

  Received Echo Response from 111.17.235.189 (Length: 14) //AP与189的AC心跳报文交互正常

  04 00 00 08 00 00 17 21 00 00 0c 51 20 37

  *Nov 9 17:52:58:613 2012 WA2610E-GNP LWPC/7/Timer:

  Deleted Nbr-Dead Timer

  *Nov 9 17:52:59:609 2012 WA2610E-GNP LWPC/7/Pkt_Send:

  Sent WTP Event Request to 111.17.235.190 (Length: 39)

  04 00 00 21 00 00 0e 30 00 19 0c 51 20 25 68 00

  16 00 00 63 a2 01 2d 00 02 01 00 01 00 02 00 06

  01 00 02 00 02 00 1b

  *Nov 9 17:53:01:609 2012 WA2610E-GNP LWPC/7/Pkt_Send:

  Sent WTP Event Request to 111.17.235.190 (Length: 39)

  04 00 00 21 00 00 0e 30 00 19 0c 51 20 25 68 00

  16 00 00 63 a2 01 2d 00 02 01 00 01 00 02 00 06

  01 00 02 00 02 00 1b

  *Nov 9 17:53:03:609 2012 WA2610E-GNP LWPC/7/Pkt_Send:

  Sent WTP Event Request to 111.17.235.190 (Length: 39) //AP向190的AC连续发送三次WTP心跳请求报文交互均未收到AC回应报文导致AP掉线。

  04 00 00 21 00 00 0e 30 00 19 0c 51 20 25 68 00

  16 00 00 63 a2 01 2d 00 02 01 00 01 00 02 00 06

  01 00 02 00 02 00 1b

  %Nov 9 17:53:05:609 2012 WA2610E-GNP LWPC/6/LWPC_AP_DOWN:

  Connection with AC 111.17.235.190 goes down by reason of Response Timer Expire.//该AP与190的AC连接断开。

  *Nov 9 17:53:05:609 2012 WA2610E-GNP LWPC/7/FSM :

  [Tunnel : Slave State : Run] AP LWAPP FSM machine TimeOut, result Timed out

  *Nov 9 17:53:05:610 2012 WA2610E-GNP LWPC/7/Timer:

  Created Response-TO Timer

  *Nov 9 17:53:05:610 2012 WA2610E-GNP LWPC/7/Pkt_Send:

  Sent WTP Event Request to 111.17.235.189 (Length: 30)

  04 00 00 18 00 00 0e 21 00 10 0c 51 20 37 68 00

  0d 00 00 63 a2 01 30 00 00 00 00 00 00 00

  *Nov 9 17:53:05:610 2012 WA2610E-GNP LWPC/7/Event:

  [State : Run] Clear Context

  *Nov 9 17:53:05:610 2012 WA2610E-GNP LWPC/7/Timer:

  Deleted Session backup Timer

  *Nov 9 17:53:05:611 2012 WA2610E-GNP LWPC/7/Event:

  Slave tunnel down event notified to all registered modules //AP与备AC的链路断开

  三、故障处理

  通过现场反馈的debug信息分析,AP注册时已经向AC发送Join Request报文,但AC并未回应Join Response报文,至此AP向AC发送Join Request报文超时。

  但是同时也发现AP向AC发送Discovery Request报文,AC给AP回应Discovery Response报文,而Discovery报文与Join报文的区别是Discovery报文是广播报文而Join报文是单播报文,所以建议现场排查AP与AC之间通信来回路径是否一致。

  经过用户自查发现是由于用户防火墙前期割接后,路由配置有问题,导致报文的来回路径不一致。经过修改路由,该问题得到解决。


2015年07月