QOS, an acronym for Quality of Service is used to allocate portions of the available bandwidth to selected applications or types of traffic. It is an unfortunate acronym because the word Quality seems to imply a high caliber of service when in actuality it is invoked at times of network congestion when network performance is degradaded.
QOS is a broad term used for many different types for network traffic manipulation. In a broader sense it is used to allocate a portion of the bandwidth to select applications, thus provisioning for some degree of connectivity to the applications when congestion is experienced. This is achieved by queueing packets when congestion is experienced and releasing (or dropping) packets from that queue based on a pre-determined policy.
QOS Models
QOS can be broadly categorized into two models: IntServ (Integrated Services RFC 1633) and DiffServ (Differentiated Services RFC 2475). Whilst each model attempts to solve the same problem, they are both implemented in different ways.
Intserv was designed with real-time applications such as voice over IP and video in mind. It works by reserving network resources using the RSVP protocol (Resource Reservation Protocol). Every switch and router between the source and destination needs to be RSVP enabled in order for this model to be effective. The biggest problem with Intserv is that once a flow has been reserved that bandwidth cannot be used by any other application, even if the reserved flow is not currently transmitting anything. This makes the Intserv model severely restrictive in terms of scalability.
Diffserv, in contrast, is widely used because its per-hop-behavior is highly scalable. Diffserv works by allowing each device to apply an action such as policing, queuing or marking based on the individual switch or router policy. Quality of Service markings on the IP header allow classifications to be propagated end-to-end but the action taken based on these markings is left up to each device.
Quality of Service (QOS) - Queuing
Figure 1. An overview of QOS. As packets arrive at an interface they are identified (classified) and queued accordingly. The scheduler then releases packets from each queue based on a pre-defined policy. The number of packets released is dependant on how many other queues have queued packets and what bandwidth they have been assigned.
Congestion Management (Queuing Strategies)
First in First Out (FIFO) - This strategy is simple, packets are released out of the queue in the same order that they arrived. This is the default queuing strategy on most Cisco router interfaces.
Weighted Fair Queuing (WFQ) - This strategy prioritizes smaller flows over very bandwidth intensive flows.
It is the default queuing strategy for serial interfaces running at E1 speed (2048kb/s) or less.
Class Based Weighted Fair Queuing (CBWFQ) - This strategy utilizes WFQ but it also takes IP QOS packet markings into consideration. This gives the administrator some control over which traffic will be given precedence when it is queued.
Low Latency Queuing (LLQ) - Low latency queueing is CBWFQ with the addition of one priority queue. The priority queue gives administrators the ability to have one queue with delay guarantees. This queue is usually used for real time traffic such as Voice Over IP (VOIP).
Custom Queuing (CQ) - Custom queueing enables the network administrator to service different queues at different ratios. This way the minimum bandwidth can be calculated for each queue.
Priority Queuing (PQ) - Priority queueing enables the network administrator to place traffic into a pre-determined number of queues such as high, medium, normal and low. All packets in the high priority queue must be sent before the medium queue is serviced. All packets in the high queue and the medium queue must be sent before the normal queue is serviced and so on.
Congestion Avoidance
When queues begin to fill there are a number of strategies that can be implemented on a Cisco router in order to prevent the queue from completely filling. The main purpose of congestion avoidance is to scatter TCP retransmits between queues thus avoiding a problem known as TCP Global Synchronization.
Tail Drop - Drops the packets that cannot fit into buffers at the end of the queue.
RED -(Random Early Detection) Drops packets randomly.The drop rate increases as the queue fills. This technique is used to avoid TCP Global Synchronization.
WRED - (Weighted Random Early Detection). Same as the RED strategy but also takes a weighting value (such as the packet DSCP) into account giving the network administrator more control over what types of packets are dropped first.
Policing and Shaping
Policing and shaping are used to throttle the output speed of an interface or class of traffic. Policing simply drops packets as the configured threshold is reached. Shaping buffers packets so that they can be sent during unused router timeslots. If there is too much traffic to buffer, shaping will also result in dropped packets but it is generally considered the more gentle of the two traffic throttling methods.
QOS Markings
Layer 2
COS (Class of Service) markings are Layer 2 markings. In ethernet the COS markings are three bits allowing a total of 7 classes.
Layer 3
TOS (Type of Service) markings are IP packet Layer 3 markings known as the TOS byte.