What are the effects of network latency and jitter on an H.323 video call
Network Latency When using H.323 there are 4 separate data streams. Two half-duplex video streams and two half-duplex audio streams. This can be synopsized as one set video+audio data streams from each endpoint to the other. When discussing network latencies relative to the operation of H.323, there are 3 general categories to consider:
- End-to-End latency in a given direction. This category addresses the total transit time for data of a given data stream to arrive at the remote endpoint. It is preferable for this transit time to not exceed 300 milliseconds. The average packet size of a video stream versus that of an audio stream is very different. Video packets sizes are usually large (800-1500 bytes) while audio packet sizes are generally small (480 bytes or less). This means that the average transit time for an audio stream can be less than that for a video stream if an intervening router or bridge prioritizes smaller over larger packets when encountering network congestion. Once can use "ping", with its packet size and flood options, and "tracert" as simple, convenient tools to determine average network transit times.
- Intra-stream latency. This category addresses latencies within a given data stream which boils down to inter-packet latencies that deviate outside of the normal transmit time by more than 30-35 milliseconds, additional, in a 30 FPS stream, or 60 milliseconds in a 15 FPS stream. An example would be a data stream in a 30 FPS H.323 session that has an average transit time of 115 milliseconds. If a single packet in this stream encountered a transit time of 145 milliseconds or more (relative to a prior packet), it could cause a receive underrun condition at the receiving endpoint potentially causing either blocky, jittery video or undesirable audio artifacts. Also, intra-stream latencies can cause inter-stream latencies, which are discussed next.
- Inter-stream latency. This category addresses the relative latencies that can be encountered between the audio and video data streams. This is where the relative average transit time for the given streams, at any given point, vary from each other. In this case the relative latency variations are not symmetrical. This is due to the fact that the human brain already compensates for audio latency relative to video. Due to this fact, an audio stream that starts arriving at an endpoint 30+ millseconds ahead of its video stream counterpart(s) will produce detectable lip-synchronization problems for most participants. An audio stream that arrives later than its associated video stream data has a slightly higher tolerance of 40+ milliseconds before the loss of audio and video synchronization becomes generally detectable.