What is Ping Packet Loss
Ping packet loss refers to the phenomenon that some Ping packets are discarded during transmission in the network due to various reasons (such as long lines, network congestion, etc.). When using the Ping command and Ping packet loss occurs, the first step is to determine the network location of the Ping packet loss, and then determine the cause of the Ping packet loss, and then solve the problem based on the located cause.
When confirming the network location of Ping packet loss, the method of pinging each segment is generally used, which can ultimately determine that the Ping packet loss fault is between directly connected network segments. The method of traffic statistics is generally used to confirm the cause of Ping packet loss. Through traffic statistics, the specific location of the discarded packets can be determined and the cause of the fault can be determined.
There are many reasons for Ping packet loss, which are also very complicated. In actual fault location, various factors need to be considered comprehensively. This document analyzes common Ping packet loss faults and summarizes the following common faults:
Physical environment failure; network loop; ARP problem; ICMP problem.
It should be noted that Ping packet loss does not necessarily mean that the network quality is poor. In some cases, although Ping packet loss occurs, the service is normal. When analyzing Ping packet loss, pay attention to the following two points:
When the device forwards the message by hardware, the speed is very fast, and there will be no packet loss. For example, ping the computer connected to the device port. When the message needs to be processed by the CPU, the CPU will be lost if it is busy. For example: ping the IP address on the device.
To prevent network attacks from affecting the device, the device has a CPU protection function that discards ARP, ICMP, and other packets that exceed the CPCAR (Control Plane Committed Access Rate) value, resulting in ping packet loss. This phenomenon does not affect the normal operation of services.
Identify the Causes of Ping Packet Loss
As shown in Figure 1 above, this article uses an example of Ping packet loss to introduce how to locate a Ping packet loss fault.
Ping Packet Loss Fault Phenomenon
C:\Users> ping -n 100 192.168.4.41
Pinging 192.168.4.41 with 32 bytes of data:
The request timed out.
The request timed out.
Reply from 192.168.4.41: Bytes=32 Time<1ms TTL=128
…
Reply from 192.168.4.41: Bytes=32 Time<1ms TTL=128
Ping statistics for 192.168.4.41:
Packets: Sent = 100, Received = 80, Lost = 20 (20% loss),
Estimated round trip time in milliseconds:
Shortest = 0ms, Longest = 0ms, Average = 0ms
Ping Packet Loss Fault Location
Locate the fault based on the possible cause of the fault. The fault location method is as follows:
1. Configure Ping multi-packets.
To reproduce the packet loss phenomenon and facilitate troubleshooting, you need to send Ping packets continuously. You can configure the -c count parameter of Ping to send multiple Ping packets.
2. Reduce the scope of the fault.
When packet loss occurs when you directly ping IP address 192.168.4.41 on the PC, it is very difficult to directly determine the cause of the fault. In this case, you can first narrow down the scope of the fault and ping SwitchA, SwitchB, SwitchC, and SwitchD on the PC respectively. The Ping results can be used to determine which network segment has a fault. In this example, if packet loss also occurs when you ping SwitchB on the PC, you can preliminarily determine that the packet loss occurs between the directly connected network segments of SwitchA and SwitchB.
3. Configure traffic statistics.
By narrowing down the fault scope, the fault is finally located between SwitchA and SwitchB. To further identify the fault point, it is necessary to configure the traffic statistics function on SwitchA and SwitchB to observe the packet loss situation. For specific theoretical statistics configuration methods, please refer to the manual of each device.
4. Analyze the statistical results.
Ping SwitchB continuously from SwitchA.
If the number of packets leaving SwitchA exceeds the number of packets entering SwitchB, it indicates that packet loss occurs on the transmission link. Please handle the problem according to the section “Physical link failure causing ping packet loss” described later.
If the number of packets leaving SwitchA is equal to the number of packets entering SwitchB, but the number of packets leaving SwitchB is less than the number of packets entering SwitchB, it indicates that packet loss occurs on SwitchB. Possible causes of packet loss on SwitchB include network loops and ICMP problems.
Log in to the device and check whether the CPU and interface utilization is high and whether MAC address drift occurs. If high utilization or MAC address drift occurs, follow the steps below to handle the ping packet loss caused by network loops.
Log in to the device and check whether ICMP packets are discarded and whether the ICMP packet rate limit is configured too low. If packets are discarded or the ICMP packet rate limit is configured too low, handle the problem according to the ICMP problem causing ping packet loss described later.
If the number of packets leaving SwitchA is less than the number of packets sent by the Ping command, packets are lost on SwitchA. Possible causes of packet loss on SwitchA include network loops and ARP problems.
Log in to the device and check the CPU and interface utilization to see if MAC address drift occurs. If high utilization or MAC address drift occurs, handle the problem by following the instructions for network loops causing ping packet loss.
Log in to the device and check whether ARP packets are discarded. If packets are discarded, follow the instructions below to handle the problem of ping packet loss caused by ARP.
5. Analysis of ping packet loss caused by physical link failure
The ping packet loss fault locating method can be used to determine whether the packet loss is caused by a physical link failure. Common causes of physical link failures are:
The computer network card has a problem, the device interface is abnormal, the cable connector is in poor contact or loose, the network cable is too long or damaged, the optical fiber is too bent, the optical power received and sent by the optical module is too low, and the electrical port negotiation is inconsistent, such as one end is auto-negotiated and the other end is not auto-negotiated.
In the actual environment, physical environmental problems such as the device not being grounded resulting in static electricity not being released, and the fan being damaged resulting in device overheating can also cause ping packet loss.
Physical link failures can be discovered through observation, such as excessive fiber bending, too long physical connection lines, abnormal indicator lights on devices or computer network cards, etc. For physical link failures, the solution is generally to replace the physical device, after which the failure can be restored.
6. Analysis of ping packet loss caused by network loop failure
In Ethernet switching networks, redundant links are usually used to perform link backup and improve network reliability. However, using redundant links can cause loops on the switching network, leading to broadcast storms and MAC address table instability, which can cause poor user communication quality or even communication interruption. Network loops can cause high CPU and port utilization on devices, and Ping packets can be discarded.
When a device is in a network with a loop, the device’s response speed is relatively slow. The method to determine the loop problem is as follows:
1. Use the display interface brief | include up command to view the traffic on all UP interfaces. The InUti and OutUti count on the interface with a loop will gradually increase, even close to 100%, far exceeding the service traffic.
First query:
<SwitchA> display interface brief | include up
…
Interface PHY Protocol InUti OutUti inErrors outErrors
GigabitEthernet0/0/2 up up 0.56% 0.56% 0 0
…
Second query:
<SwitchA> display interface brief | include up
…
Interface PHY Protocol InUti OutUti inErrors outErrors
GigabitEthernet0/0/1 up up 76% 76% 0 0
…
2. Determine whether the switch has MAC address drift.
You can run the display trapbuffer command to view the MAC address flapping log to make a judgment.
You can run the mac-address flapping detection command to configure the MAC address flapping detection function and then use the display mac-address flapping record command to determine whether MAC address flapping occurs.
You can execute the display mac-address command multiple times to observe. If the MAC address is learned on different interfaces of the switch, MAC address drift exists.
3. Check CPU utilization.
Use the display cpu-usage command to check the CPU usage. A network loop can cause CPU usage to be high all the time, and Ping packets are discarded before they can be processed.
The solution to this ping packet loss problem is to eliminate network loops. You can deploy RRPP, SEP, Smart Link, STP/RSTP/MSTP, and other protocols on the device to handle loops.
7. Analysis of ping packet loss caused by ARP problem
Through the previously introduced ping packet loss fault location method, determine whether the ping packet loss is caused by ARP problems. Common fault phenomena of ARP problems: Ping packet loss occurs at first (due to ARP learning failure), then (after learning ARP) there is no packet loss for some time (ARP table aging time), and then (after ARP learning failure occurs again), packet loss will continue to occur.
Common ARP problems include the following:
The device is configured with ARP security functions, such as ARP Miss source suppression and ARP rate suppression, which will cause slow ARP learning and Ping packet loss. The device is attacked by ARP packets, and the number of ARP packets sent to the CPU exceeds the CPCAR value, causing some ARP packets to be discarded and Ping packet loss.
Common problems and solutions are as follows:
Use the display ARP packet statistics command to check whether ARP packets are discarded, analyze the ARP security configuration on the device, and identify the cause of the problem. If this problem occurs, you need to reconfigure ARP security so that the device can process ARP packets normally.
Run the display CPU-defend statistics command to check whether the CPU processes ARP packets and whether any packets are discarded.
For this problem, you need to check whether the device is attacked by ARP, correctly configure ARP security to prevent attacks, and increase the CPCAR value of ARP packets. The configuration sample is as follows:
<SwitchA> system-view
[SwitchA] CPU-defend policy arp
[SwitchA-cpu-defend-policy-arp] car packet-type arp-reply cir 32
Warning: Improper parameter settings may affect the stable operation of the system. Use this command with the assistance of Huawei engineers. Continue? [Y/N]:y
[SwitchA-cpu-defend-policy-arp] car packet-type arp-request cir 32
Warning: Improper parameter settings may affect the stable operation of the system. Use this command with the assistance of Huawei engineers. Continue? [Y/N]:y
[SwitchA-cpu-defend-policy-arp] quit
[SwitchA] CPU-defend-policy arp global
8. Analysis of ping packet loss caused by ICMP problem
Common symptoms of ICMP problems:
When pinging a device, packets are lost if the ping speed is high, but not if the speed is slow. Packets are lost regularly when pinging large packets. When pinging a device, the ping fails after a few packets are successful, then it succeeds after about two minutes and then fails again after a few packets are successful.
There are three common ICMP problems:
The device is attacked by ICMP packets. The number of ICMP packets sent to the CPU exceeds the CPCAR value, causing some ICMP packets to be discarded and ping packets to be lost. The device is configured with ICMP attack prevention. ICMP packets that exceed the rate limit are discarded and ping packets are lost. The device is configured with an ICMP rate limit. ICMP packets that exceed the rate limit are discarded and ping packets are lost.
Common problems and solutions are as follows:
1. Use the display ICMP statistics and display anti-attack statistics icmp-flood commands to check whether any ICMP packets are discarded.
To solve this problem, you need to reconfigure ICMP security so that the device can process ICMP messages normally.
2. Check the configuration of the ICMP rate-limit total threshold threshold-value command to understand the threshold of the ICMP traffic rate limit.
If the threshold is too small, you can modify it by running the ICMP rate-limit total threshold threshold-value command to allow more ICMP packets to pass through. The configuration sample is as follows:
<SwitchA> system-view
[SwitchA] ICMP rate-limit enable
[SwitchA] ICMP rate-limit total threshold 500
3. Run the display CPU-defend statistics packet-type icmp command to check whether the CPU processes ICMP packets and whether any packets are discarded.
For this problem, you need to check whether the device is attacked by ICMP, configure ICMP security correctly to prevent attacks and increase the CPCAR value of ICMP messages. The CPCAR value configuration example of ICMP messages is as follows:
<SwitchA> system-view
[SwitchA] CPU-defend policy ICMP
[SwitchA-cpu-defend-policy-icmp] car packet-type ICMP cir 256
Warning: Improper parameter settings may affect the stable operation of the system. Use this command with the assistance of Huawei engineers. Continue? [Y/N]:y
[SwitchA-cpu-defend-policy-icmp] quit
[SwitchA] CPU-defend-policy ICMP global
You can also use the ICMP-reply fast command to enable the fast ping reply function to resolve the problem of the CPU discarding ICMP packets.