首页 > 技术点滴 > 5 Key TCP Metrics for Performance Monitoring

5 Key TCP Metrics for Performance Monitoring

2015年10月12日 baoz 阅读评论

网络和应用

质量评估5要素

Application and network performance monitoring tools offer several metrics and counters intended to give you an indication of the health of your infrastructure and applications. With the wealth of data and options available in these tools, it can be difficult to know what’s important and what can safely be ignored. In this post, we’re going to identify some of the TCP metrics that will provide a baseline for your monitoring strategy using on-the-wire network data.

THE METRICS

  1. Connection Setup Time
  2. Server Connection Reset rate
  3. Application Response Time
  4. Retransmission Rate
  5. Network Round Trip Time

Why were these metrics chosen? First, they are accessible. Most modern packet-based performance management tools are looking at and keeping a count of these metrics. Second, they offer a broad indication of the health of the server, network, and application infrastructure.

That’s not to say that this is an exhaustive list. Depending on your architecture and application, there will be several other metrics you should be watching. These items are intended to be the building blocks on which you can develop a fully matured monitoring infrastructure.

Connection Setup Time

Indicates:

  • Server Health
  • Network Health

Definition:  Connection setup time is the amount of time it takes for the TCP three-way handshake to complete.

ConnectionSetupTime

When this metric spikes, it can be an indication of network slowness or an increase in the processing time within the TCP stack of a server. This metric is best monitored with a deviation from normal mindset. We’re most interested in servers that have a higher connection setup time than others. We also want to be notified when a server’s connection setup time increases dramatically from normal.

Server Reset Rate

Indicates:

  • Server Health
  • Application Health

Definition:  Server reset rate is a per-second counter that increments as TCP resets are sent by the server.

A TCP reset is a tool used by the TCP/IP stack in network devices to immediately close a connection. Technically, a TCP RST should indicate some type of error condition in the TCP stack. That could include the application actively closing a connection, or the server not being aware that a connection has been established. Unfortunately, this metric can be a bit of a red herring. A server will send a TCP reset if traffic is received on a port that’s not listening. This can lead to an increase in resets if, for example, a port scan is being done on a server.

This metric is one that we want to watch over the long-term.  When there is a dramatic increase in TCP resets, an investigation into the cause and impact should be done.

Application Response Time

Indicates:

  • Application Health

Definition:  Application response time is calculated as the time it takes for a server to respond to a data request from a client with a non-zero payload packet.

AppResponseTIme

The idea behind this metric is to measure the time it takes for a server to respond to a data request with application data.   This metric will give us an idea of how quickly the application is responding to requests. When the response time of this metric increases, the implication is that the application is slowing.

It’s key to ensure that we’re not counting the initial zero-payload TCP ACK in this calculation. We want to measure the time it takes for the application to return the requested payload from the client, so we are only interested in the first server response with payload.

Retransmission Rate

Indicates:

  • Network Health

Definition:  A retransmission is the resending of packets which have been damaged or lost. Retransmission rate is a representation of the count of retransmissions over a period of time, typically retransmissions per second.

Packet retransmissions are a necessary and healthy function of modern TCP networks.   They occur when a receiving node doesn’t acknowledge a packet that is sent from a sending node. While they are expected and occur on a healthy network, retransmissions can indicate delays and packet loss when the volume increases dramatically. A sustained increase in retransmissions should be investigated. It could indicate a saturated link, or packet discards in a switch. A large count of retransmissions will impact the performance of your applications.

Network Round Trip Time

Indicates:

  • Network Health
  • Server Health

Definition:  The amount of time taken by a server to respond to a packet sent by the client.

Network RTT (Round Trip Time) is a great indicator of the health of the network, as well as the health and response time of the TCP/IP stack of your server. It is calculated as the amount of time, typically in milliseconds, that it takes for a server to respond to a packet sent by a client. This response time can be measured at either the TCP handshake, or continually throughout the capture by measuring the response times of empty ACK responses.

Jasper Bongertz has created a great write-up of how to measure the RTT using wireshark here. He does a fantastic deep-dive on how RTT is calculated, the significance of the capture location, and the effect on your network.

Tying it All Together

Network traffic monitoring provides great insight into the performance of your applications. While these metrics should give a good starting point, each application should be individually reviewed and a customized monitoring strategy should be developed. On-the-wire monitoring is a great tool, and should be used in conjunction with other monitoring strategies like synthetic monitoring, SNMP, up/down, and agent–based tools. The combination of these strategies, personalized to your infrastructure and business needs, is the foundation of a solid and mature monitoring and performance assurance architecture.

  1. 本文目前尚无任何评论.