某运营商AP频繁上下线故障处理案例
一、故障现象
某运营商学校局点AP频繁上下线,导致用户体验很差,产生投诉。
二、故障分析
1、选中其中一个AP(yy-zyybfl1f_4),分别在AC侧和AP侧抓取debug信息。在AP的debug信息中并未查到AP下线的信息,在AC的debug信息中查询到AP Name:yy-zyybfl1f_4 频繁下线,下线原因均是:Neighbor Dead Timer Expire 。
2、分析AC侧debug信息:
*Nov 13 08:48:23:043 2012 SDZIB-WLAN-AC79-WX6108-AC1 LWPS/7/Timer: [APID: 64] Neighbor-Dead timer refreshed
*Nov 13 08:48:23:043 2012 SDZIB-WLAN-AC79-WX6108-AC1 LWPS/7/Pkt_Send:
Sent Echo Response to 172.18.95.141 (Length: 14) //AC向AP连续两次回应Echo Response 报文,但AP应该是没有收到该应答报文,所以AP下线。
3、分析AP侧debug信息:
*Nov 9 17:52:58:609 2012 WA2610E-GNP LWPC/7/Pkt_Send:
Sent Echo Request to 111.17.235.189 (Length: 14)
04 00 00 08 00 00 16 21 00 00 0c 51 20 37
*Nov 9 17:52:58:613 2012 WA2610E-GNP LWPC/7/Pkt_Rcvd:
Received Echo Response from 111.17.235.189 (Length: 14) //AP与189的AC心跳报文交互正常
04 00 00 08 00 00 17 21 00 00 0c 51 20 37
*Nov 9 17:52:58:613 2012 WA2610E-GNP LWPC/7/Timer:
Deleted Nbr-Dead Timer
*Nov 9 17:52:59:609 2012 WA2610E-GNP LWPC/7/Pkt_Send:
Sent WTP Event Request to 111.17.235.190 (Length: 39)
04 00 00 21 00 00 0e 30 00 19 0c 51 20 25 68 00
16 00 00 63 a2 01 2d 00 02 01 00 01 00 02 00 06
01 00 02 00 02 00 1b
*Nov 9 17:53:01:609 2012 WA2610E-GNP LWPC/7/Pkt_Send:
Sent WTP Event Request to 111.17.235.190 (Length: 39)
04 00 00 21 00 00 0e 30 00 19 0c 51 20 25 68 00
16 00 00 63 a2 01 2d 00 02 01 00 01 00 02 00 06
01 00 02 00 02 00 1b
*Nov 9 17:53:03:609 2012 WA2610E-GNP LWPC/7/Pkt_Send:
Sent WTP Event Request to 111.17.235.190 (Length: 39) //AP向190的AC连续发送三次WTP心跳请求报文交互均未收到AC回应报文导致AP掉线。
04 00 00 21 00 00 0e 30 00 19 0c 51 20 25 68 00
16 00 00 63 a2 01 2d 00 02 01 00 01 00 02 00 06
01 00 02 00 02 00 1b
%Nov 9 17:53:05:609 2012 WA2610E-GNP LWPC/6/LWPC_AP_DOWN:
Connection with AC 111.17.235.190 goes down by reason of Response Timer Expire.//该AP与190的AC连接断开。
*Nov 9 17:53:05:609 2012 WA2610E-GNP LWPC/7/FSM :
[Tunnel : Slave State : Run] AP LWAPP FSM machine TimeOut, result Timed out
*Nov 9 17:53:05:610 2012 WA2610E-GNP LWPC/7/Timer:
Created Response-TO Timer
*Nov 9 17:53:05:610 2012 WA2610E-GNP LWPC/7/Pkt_Send:
Sent WTP Event Request to 111.17.235.189 (Length: 30)
04 00 00 18 00 00 0e 21 00 10 0c 51 20 37 68 00
0d 00 00 63 a2 01 30 00 00 00 00 00 00 00
*Nov 9 17:53:05:610 2012 WA2610E-GNP LWPC/7/Event:
[State : Run] Clear Context
*Nov 9 17:53:05:610 2012 WA2610E-GNP LWPC/7/Timer:
Deleted Session backup Timer
*Nov 9 17:53:05:611 2012 WA2610E-GNP LWPC/7/Event:
Slave tunnel down event notified to all registered modules //AP与备AC的链路断开
三、故障处理
通过现场反馈的debug信息分析,AP注册时已经向AC发送Join Request报文,但AC并未回应Join Response报文,至此AP向AC发送Join Request报文超时。
但是同时也发现AP向AC发送Discovery Request报文,AC给AP回应Discovery Response报文,而Discovery报文与Join报文的区别是Discovery报文是广播报文而Join报文是单播报文,所以建议现场排查AP与AC之间通信来回路径是否一致。
经过用户自查发现是由于用户防火墙前期割接后,路由配置有问题,导致报文的来回路径不一致。经过修改路由,该问题得到解决。
2015年07月
本期文章
-
刊首语
-
公司动态
-
行业聚集
-
产品推荐
-
案例介绍
-
经验共享
-
服务明星
-
培训天地