Ethernet switches make their forwarding decision based on what field of the ethernet frame?

High-Performance Computing Networks

Gary Lee, in Cloud Networking, 2014

Switch

Ethernet switches that are used in HPC applications use cut-through operation in order to reduce latency. In this mode of operation, incoming frame headers are inspected and forwarding decisions can be made in a few hundred nanoseconds. Transmission of the frame can then start before the entire frame is received. Although this eliminates the possibility of checking the frame for errors using the frame check sequence before transmission, most fabrics of this type expect that the fabric interface adapters on the receiving compute nodes will perform this check when the packet is received and flag any errors.

Infiniband switches have less functional overhead than Ethernet switches in that they simply forward packets based on header address information. Forwarding tables simply consist of destination address—output port pairs that are populated during initialization and may be modified if something changes in the fabric. Unlike Ethernet, no routing or address learning features are required, which simplifies the design and allows Infiniband switches to achieve very low cut-through latencies on the order of 100-200 nS. Infiniband switches support multiple virtual channels per port which allow traffic types to be treated differently based on class of service indicators. Both Ethernet and Infiniband switches support link-level flow control with Infiniband using a credit based mechanism compared to Ethernet’s priority based flow control described earlier. Today, Infiniband switches are available with up to 32 54Gbps ports allowing the construction of large, low-latency fat-tree fabrics for massively parallel computing installation.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128007280000102

Ethernet

William Buchanan BSc (Hons), CEng, PhD, in Computer Busses, 2000

26.11 Switches and switching hubs

A switch is a very fast, low-latency, multiport bridge that is used to segment LANs. They are typically also used to increase communication rates between segments with multiple parallel conversations and also communication between technologies (such as between FDDI and 100BASE-TX).

A 4-port switching hub is a repeater that contains four distinct network segments (as if there were four hubs in one device). Through software, any of the ports on the hub can directly connect to any of the four segments at any time. This allows for a maximum capacity of 40 Mbps in a single hub.

Ethernet switches overcome the contention problem on normal CSMA/CD networks. They segment traffic by giving each connect a guaranteed bandwidth allocation. Figure 26.14 and Figure 26.15 show the two types of switches; their main features are:

Ethernet switches make their forwarding decision based on what field of the ethernet frame?

Figure 26.14. Desktop switch

Ethernet switches make their forwarding decision based on what field of the ethernet frame?

Figure 26.15. Segment switch

Desktop switch (or workgroup switch) – These connect directly to nodes. They are economical with fixed configurations for end-node connections and are designed for standalone networks or distributed workgroups in a larger network.

Segment switch – These connect both 10 Mbps workgroup switches and 100 Mbps interconnect (backbone) switches that are used to interconnect hubs and desktop switches. They are modular, high-performance switches for interconnecting workgroups in mid- to large-size networks.

26.11.1 Segment switch

A segment switch allows simultaneous communication between any client and any server. A segment switch can simply replace existing Ethernet hubs. Figure 26.15 shows a switch with five ports each transmitting at 10 Mbps; this allows up to five simultaneous connections giving a maximum aggregated bandwidth of 50 Mbps. If the nodes support 100 Mbps communication then the maximum aggregated bandwidth will be 500 Mbps. To optimise the network, nodes should be connected to the switch that connects to the server with which it most often communicates. This allows for a direct connection with that server.

26.11.2 Desktop switch

A desktop switch can simply replace an existing 10BASET/100BASET hub. It has the advantage that any of the ports can connect directly to any other. In the network in Figure 26.14, any of the computers in the local workgroup can connect directly to any other, or to the printer, or the local disk drive. This type of switch works well if there is a lot of local traffic, typically between a local server and local peripherals.

26.11.3 Store-and-forward switching

Store-and-forwarding techniques have been used extensively in bridges and routers, and are now used with switches. It involves reading the entire Ethernet frame, before forwarding it, with the required protocol and at the correct speed, to the destination port. This has the advantages of:

Improved error check – Bad frames are blocked from entering a network segment.

Protocol filtering – Allows the switch to convert from one protocol to another.

Speed matching – Typically, for Ethernet, reading at 10 Mbps or 100 Mbps and transmitting at 100 Mbps or 10 Mbps. Also, can be used for matching between ATM (155 Mbps), FDDI (100Mbps), Token Ring (4/16 Mbps) and Ethernet (10/100 Mbps).

The main disadvantage is:

System delay – As the frame must be totally read before it is transmitted there is a delay in the transmission. The improvement in error checking normally overcomes this disadvantage.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780340740767500262

Packet-Switched Networks

Jean Walrand, Pravin Varaiya, in High-Performance Communication Networks (Second Edition), 2000

Ethernet Switches

An Ethernet switch is a multiport bridge that selectively forwards packets from one LAN port to another port. The bit rate on different ports may be different. Like hubs and bridges, switches may be interconnected to form larger networks. A switch's forwarding decision is based solely on layer 2 information. Switches do not modify the received packet. (Routers, by contrast, base their forwarding decision on layer 3 or network layer information, and also modify the received packets.)

Packets destined for different ports may be simultaneously forwarded by the switch, so a switch can increase the overall bit rate many times compared with a single shared LAN. But packets destined for the same port must be buffered by the switch. Thus a switch consists of a switching fabric, buffers, and the forwarding control mechanism. The switch fabric may be blocking or nonblocking, and the buffers may be segregated by either input port or output port or ports may share buffers. The fabric design and buffer management affect switch performance as discussed in Chapter 12.

The forwarding mechanism builds a table relating the MAC address of a computer on a LAN to the port number to which the LAN is connected. Such a table is built by associating the MAC source address of an incoming packet with the number of the incoming port. If a destination address cannot be resolved by the table, the switch, similarly to a bridge, sends the packet out on all the other ports. This can happen if the destination computer is beyond the LANs directly connected to the switch. Broadcast packets are sent to all the ports, except in the case of VLANs (see next page).

Performance depends on other features of switch design. A switch may forward a packet after it has been fully received (store-and-forward), or it may start forwarding as soon as the output port has been determined (cut-through). Although cut-through forwarding can clearly reduce latency, the switch cannot carry out a CRC check and so corrupted packets are forwarded.

Switches are deployed to improve LAN performance because they can increase the bit rate available to alleviate congestion and to match the bit rate to the traffic on different LANs. Thus, for example, in client/server networks, the clients would be connected to a switch's 10 Mbps ports and the servers would be connected to its 100 Mbps ports. However, if the traffic on switched LANs does not match port speeds, congestion can occur and degrade performance, as we discuss next.

The switch buffers can temporarily store packets contending for the same port. But if contention persists, the buffers will fill up and the switch will drop packets unless there is a flow control mechanism that sends a signal stopping the source from sending additional packets. (In the OSI model flow control is a function of the transport layer, layer 4, and not the link layer.) However, link-based flow control stops traffic on an entire link rather than stopping only the particular source responsible for the congestion. Link-based flow control can thereby interfere with an uncongested path. In the switched LAN of Figure 3.18, A is sending packets to X and B to Y. Both paths share link L. If the port at Y is congested (possibly because another source is also sending packets to Y), switch 2 will send a flow control signal to switch 1, disrupting the flow on the uncongested path from A to X. Virtual LANs provide a more flexible way to manage switched LANs.

Ethernet switches make their forwarding decision based on what field of the ethernet frame?

FIGURE 3.18. Congestion at Z causes flow control on link L, which interferes with uncongested flow from A to X.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080508030500083

Gary Lee, in Cloud Networking, 2014

Frame overhead

High-bandwidth Ethernet switch fabrics are now the leading networking solution used to interconnect servers and storage within large data center networks. With the movement toward modular rack scale architectures, some limitations with Ethernet are becoming exposed. As we discussed in Chapter 9, there are various methods for sending storage traffic across Ethernet networks such as iSCSI and FCoE. But for applications such as processor-to-processor transactions in high-performance compute clustering or for communication between the CPU and modular rack scale memory resources, Ethernet is not very efficient.

Ethernet has a fairly large frame overhead when transporting small segments of data. Consider transporting 64-bits of data between two processors in a cluster or between a processor and memory. Ethernet has a minimum frame size of 64-bytes. When you include the frame preamble and the minimum interframe gap, it can take 80-bytes to transport 64-bits of data which is an efficiency of 10%. In other words, a 10GbE link is carrying only 1Gbps of data. One could argue that data can be combined from multiple transactions to improve the frame payload utilization, but this just adds to the communication latency. In contrast, a CPU communication protocol, such as CPI, can transport 64-bits of data within 10-bytes which is an efficiency of 80%.

One way to solve this problem is to develop a more efficient communication protocol within the data center rack. This protocol should have low frame overhead to improve link bandwidth utilization along with high bandwidth and low latency. In addition, it could take advantage of link technologies used in high-volume products such as 100GbE in order to reduce cable costs. It is expected that, as rack scale architectures evolve, new fabric technologies like this will be employed in order to allow memory disaggregation from the CPUs and improve overall rack performance.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128007280000114

Metro and Carrier Class Networks

Chris Janson, in Handbook of Fiber Optic Data Communication (Fourth Edition), 2013

10.2 Ethernet virtual LANs

A LAN constructed with simple Ethernet switches is limited in terms of its bandwidth capacity and congestion tendency as the number of stations increases within a single network. To address this, LANs may be partitioned from each other and, in many cases, switched through overlapping physical switches. This partitioning creates virtual LANs (VLANs). Early implementations of VLANs broadcast certain packets to assigned ports, with dedicated cabling constructed to support connection to simple switch or router devices. Also, achieving high reliability in a heavily switched network requires the use of redundant cabling paths among switches in a spanning tree configuration. This becomes very difficult to manage as network size increases. To address these constraints, IEEE developed the 802.1Q standard, which defines a VLAN tag that is added to the basic Ethernet frame between the source address and user protocol type information. This tag includes a 2-byte VLAN protocol type identifier (or TPID) followed by a 2-byte set of tag control information that includes a priority level, canonical form indicator, and a unique VLAN identifier set between 0 and 4095. This tag allows a user to set up various VLAN configurations such as Port, MAC, ATM, or protocol-based VLANs as determined by the needs of the end network operator and the equipment used.

Ethernet switches make their forwarding decision based on what field of the ethernet frame?

Once established, a VLAN allows a relatively large number of devices to communicate as if they were connected to the same physical network at an optimum speed with better management of wiring and bandwidth resources. However, as the number of devices present on any one network increased and the aggregate bandwidth needs also increased, basic 802.1Q VLANs became challenged in their ability to scale.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124016736000106

Client Layers of the Optical Layer

Rajiv Ramaswami, ... Galen H. Sasaki, in Optical Networks (Third Edition), 2010

Spanning Tree

If we have a network of Ethernet switches, the forwarding mechanisms of the Ethernet star topology can still be used if the network topology is a spanning tree. A spanning tree is a connected network topology that does not have any loops or cycles; that is, it is acyclic. An acyclic topology has the property that between any pair of switches X and Y there is a unique path. A consequence of this property is that a switch X will forward and receive frames to and from switch Y through only one port. This will let the Ethernet switches maintain their forwarding tables.

If the physical topology of the network is an arbitrary mesh and not a tree, then links are blocked so that the unblocked network forms a spanning tree. Links that are blocked do not forward data frames. Figure 6.16(a) shows a spanning tree of switches in a mesh topology network.

Ethernet switches make their forwarding decision based on what field of the ethernet frame?

Figure 6.16. (a) An Ethernet spanning tree and (b) a tree showing the root, root ports, and designated ports.

The spanning tree protocol (STP) is a distributed algorithm run by the switches to form the spanning tree. The Ethernet links have weights assigned to them. The protocol creates the tree by first having the switches elect one of the switches as the root switch (see Figure 6.16(b)). If there are multiple candidate root switches, then ties are broken by comparing the candidates' Ethernet addresses. After the election, each of the other switches determines a shortest path toward the root based on link weights. For each nonroot switch, its port that leads to the shortest path toward the root is its root port as shown in Figure 6.16(b). The corresponding link is part of the tree, and the port at the other end of the link is called a designated port. All other ports are blocked. The unblocked links form a spanning tree as shown in Figure 6.16(b). Note that root ports are used to forward packets toward the root switch, while designated ports are used to forward packets away from the root switch to the outlying switches.

To determine a root and to compute shortest paths, the switches periodically exchange control messages called bridge protocol data units (BPDUs). These messages carry at a minimum the Ethernet address that the transmitting switch believes to be the root, and the weight of the shortest path to the root from the transmitting switch.

There have been a number of improvements to STP, so that the original STP is now obsolete. One of the improvements, rapid spanning tree protocol (RSTP) (see Section 9.3.2) reduces the convergence time to compute a spanning tree when there is a topological change. RSTP precomputes backup paths to the root, so that they can be switched to when necessary. An extension of RSTP to VLANs is the multiple spanning tree protocol (MSTP). Each VLAN has its own spanning tree and blocks the links for its VLAN group. Links that are blocked by some VLANs may be part of the spanning trees of other VLANs. Unlike STP, all links can be utilized as long as each link is covered by some VLAN.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012374092250014X

Hypervisors, Virtualization, and Networking

Bhanu Prakash Reddy Tholeti, in Handbook of Fiber Optic Data Communication (Fourth Edition), 2013

This chapter discusses the role of hypervisors and virtual Ethernet switches in modern data center networks. Topics include Type 1 and Type 2 hypervisors, and discussion of the major hypervisors in use today (including PowerVM, KVM, VMware, Xen, and zVM). This chapter also discusses virtual local area networks and other types of network virtualization and encapsulation; virtual Ethernet adapters (VNICs) including VLAG and IPv6 considerations; shared I/O adapters including SR-IOV and MR-IOV; ESX Server virtualization; sockets and VDE industry standards such as IEEE 802.1Q, EVB, VDP, VEB, and VEPA for virtual Ethernet switches; and examples such as the Open vSwitch, Cisco Nexus 5000V, and IBM 5000V virtual switches. Enterprise applications such as the Open System Adapter and HiperSockets are also discussed as they apply to mainframe logical partitions.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124016736000167

Industrial Network Design and Architecture

Eric D. Knapp, Joel Thomas Langill, in Industrial Network Security (Second Edition), 2015

Introduction to industrial networking

In this book, an “industrial network” is any network that supports the interconnectivity of and communication between devices that make up or support an ICS. These types of ICS networks may be local-area switched networks as common with distributed control system (DCS) architectures, or wide-area routed networks more typical of supervisory control and data acquisition (SCADA) architectures. Everyone should be familiar with networking to some degree (if not, this book should probably not be read before reading several others on basic network technology and design). The vast majority of information on the subject is relevant to business networks—primarily Ethernet and IP-based networks using the TCP transport that are designed (with some departmental separation and access control) primarily around information sharing and collaborative workflow. The business network is highly interconnected, with ubiquitous wireless connectivity options, and are extremely dynamic in nature due to an abundance of host-, server-, and cloud-based applications and services, all of which are being used by a large number of staff, supporting a diversified number of business functions. There is typically a network interface in every cubicle (or access to a wireless infrastructure), and often high degrees of remote access via virtual private networks (VPN), collaboration with both internal and external parties, and Internet-facing web, e-mail, and business-to-business (B2B) services. Internet connectivity from a business network is a necessity, as is serving information from the business to the Internet. In terms of cyber security, the business network is concerned with protecting the confidentiality, integrity, and availability (in that order) of information as it is transmitted from source generation to central storage and back to destination usage.

An industrial network is not much different technologically—most are Ethernet and IP based, and consist of both wired and wireless connectivity (there are certainly still areas of legacy serial connectivity using RS-232/422/485 as well). The similarities end there. In an industrial network the availability of data is often prioritized over data integrity and confidentiality. As a result, there is a greater use of real-time protocols, UDP transport, and fault-tolerant networks interconnecting endpoints and servers. Bandwidth and latency in industrial networks are extremely important, because the applications and protocols in use support real-time operations that depend on deterministic communication often with precise timing requirements. Unfortunately, as more industrial systems migrate to Ethernet and IP, ubiquitous connectivity can become an unwanted side effect that introduces significant security risk unless proper design considerations are taken.

Table 5.1 addresses some of the many differences between typical business and industrial networks.

Table 5.1. Differences in Industrial Network Architectures by Function

FunctionIndustrial Network (control and process areas)Industrial Network (supervisory areas)Business Network
Real-time operation Critical High Best effort
Reliability/Resiliency Critical High Best effort
Bandwidth
Sessions
Latency
Low
Few, explicitly defined
Low, Consistent
Medium
Few
Low, consistent
High
Many
N/A, retransmissions are acceptable
Network Serial, Ethernet Ethernet Ethernet
Protocols Real-time, Proprietary Near real-time, Open Non real-time, Open

Note that these differences dictate network design in many cases. The requirement for high reliability and resiliency dictates the use of ring or mesh network topologies, while the need for real-time operation and low latency requires a design that minimizes switching and routing hops or may dictate purpose-built network appliances. Both of these requirements may result in a vendor requiring the use of specific networking equipment to support the necessary configuration and customization necessary to accomplish the required functionality. The use of specific protocols also drives design, where systems dependent solely upon a given protocol must support that protocol (e.g. serial network buses).

The network shown in Figure 5.1 illustrates how the needs of a control system can influence design (redundancy will not be shown on most drawings for simplicity and clarity). While on the surface the connectivity seems straightforward (many devices connected to Layer 2 or Layer 3 Ethernet devices, in a star topology), when taking into account the five primary communication flows that are required, represented as TCP Session 1 through 5 in Figure 5.1, it becomes obvious how logical information flow maps to physical design. In Figure 5.2, we see how these five sessions require a total of 20 paths that must be traversed. It is therefore necessary to minimize latency wherever possible to maintain real-time and deterministic communication. This means that Ethernet “switching” should be used where possible, reserving Ethernet “routing” for instances where the communication must traverse a functional boundary. This concept, represented in Figure 5.1 and 5.2 as subnets, is important when thinking about network segmentation and the establishment of security zones (see Chapter 9, “Establishing Zones and Conduits). It becomes even more obvious that the selection of Ethernet “firewalls” deployed low in the architectural hierarchy must be designed for industrial networks in order to not impact network performance. One common method of accomplishing this is through the use of “transparent” or “bridged” mode configurations that do not require any IP routing to occur as the data traverses the firewall.

Ethernet switches make their forwarding decision based on what field of the ethernet frame?

Figure 5.1. Communication flow represented as sessions.

Ethernet switches make their forwarding decision based on what field of the ethernet frame?

Figure 5.2. Communication flow represented as connections.

Figures 5.1 and 5.2 illustrate a common design utilizing Ethernet switches for low-latency connectivity of real-time systems, such as data concentrators and controllers, and a separate router (typically implemented as a Layer 3 switch) to provide connectivity between the multiple subnets. Note that in this design, the total end-to-end latency from the HMI client to the controller would be relatively high—consisting of 11 total switch hops and 3 router hops. An optimized design, represented in Figure 5.3, would replace the router with a Layer 3 switch (an Ethernet switch capable of performing routing functions2). Layer 3 switches provide significantly improved performance, and by replacing separate Layer 2 and Layer 3 devices with a single device, several hops are eliminated.

Ethernet switches make their forwarding decision based on what field of the ethernet frame?

Figure 5.3. Optimized Ethernet network design.

In Figure 5.4, a design typical of one vendor’s systems has been provided. Redundancy is provided here by connecting systems to two separate Ethernet connections. While Figure 5.4 shows a very simple redundant network, more sophisticated networks can be deployed in this manner as well. The use of spanning tree protocol will eliminate loops (in a switched environment) and dynamic routing protocols will enable multipath designs in a routed environment. In more sophisticated designs, redundant switching and routing protocols, such as VSRP and VRRP, enable the use of multiple switches in high-availability, redundant configurations.

Ethernet switches make their forwarding decision based on what field of the ethernet frame?

Figure 5.4. Redundant Ethernet in a vendor reference architecture.

As we get lower into the control environment, functionality becomes more specialized, utilizing a variety of open and/or proprietary protocols, in either their native form or adapted to operate over Ethernet. Figure 5.5 illustrates a common fieldbus network based on FOUNDATION Fieldbus using serial two-wire connectivity, and reliant upon taps (known as couplers) and bus terminations. Many fieldbus networks are similar, including PROFIBUS-PA, ControlNet, and DeviceNet.

Ethernet switches make their forwarding decision based on what field of the ethernet frame?

Figure 5.5. FOUNDATION Fieldbus H1 network topology.

It should be evident by now that specific areas of an industrial network have unique design requirements, and utilize specific topologies. It may be helpful at this point to fully understand some of the topologies that are used before looking at how this affects network segmentation.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124201149000058

The Ethernet Landscape

Sachidananda Kangovi, in Peering Carrier Ethernet Networks, 2017

3.2 Evolution of Ethernet

The 10Base-T standard in 1990 spurred the adoption of Ethernet in LANs because it allowed the use of widely available UTP wire pair with RJ45 connector using hubs, and since use of wire pair provided separate paths for transmitting and receiving signals, collision was not detected due to elevated voltage in the medium, but MAU had to indicate to MAC if it detected simultaneous activities on both Tx and Rx paths because back plane of the hub acted as shared medium, and therefore the MAC layer had to implement backoff algorithm in case of collision. The use of hubs allowed the use of star topology with distance of each LAN link of 100 m and a collision diameter of 2500 m. This standard was based on a physical layer implementation of 10-MHz clock speed and a data path of 1 bit in the PLS layer as shown in Fig. 3.3. It should be noted that the master clock speed for Manchester encoding, which was used for encoding at 10 Mbps, always matches the data speed, and this determines the carrier signal frequency, so for 10-Mbps Ethernet, the carrier frequency is 10 MHz.

With the rapid growth of Ethernet LANs and evolution of Ethernet switch, it became clear that higher bandwidth would be required to support aggregation of 10-Mbps links. The IEEE 802.3u task group was tasked with developing 100 Mbps Ethernet specification. After deliberations, they set the following goals that have served well in the Ethernet evolution:

1.

ease of migration and seamless integration with installed base;

2.

implementation over widely available UTP wire pair;

3.

a 10-fold increase in performance should not increase price by more than a factor of 2;

4.

leverage existing technology from (fiber distributed data interface) FDDI; and

5.

analyze market research to ensure adoption by various industries and research organizations.

The goal of 100 Mbps bandwidth could be achieved by increasing the clock speed from 10 to 100 MHz but that would reduce the slot time to 5.12 μs from 51.2 μs, and as a consequence, the collision diameter would be reduced from 2500 to 250 m. Both were not acceptable. Therefore, the 100 Mbps standard changed the clock speed to 25 MHz and increased the data path to 4 bits. Increasing clock speed from 10 to 25 MHz was an increase by 2.5 times, and combining this with 4-bit–wide data path gave a 2.5 × 4 = 10-fold increase in the bandwidth without reducing the collision diameter substantially. In addition to this, the encoding was changed from Differential Manchester to first non-return-to-zero (NRZ) and later to non-return-to-zero, inverted (NRZI) to meet the needs of increased bandwidth. This encoding/decoding function was moved from PLS layer to a physical coding sublayer (PCS). Also, the MAC layer was decoupled from the physical layer (PHY) by replacing PLS layer with a Medium-Independent Interface (MII) layer which is equivalent to AUI layer as far as functionality is concerned except that MII functions at higher clock speed of 25 MHz and has a 4-bit data path. This MII layer allowed to keep MAC and LLC layer unchanged from 10 Mbps to 100 Mbps evolution. Instead of MII, one could use Reduced MII (RMII) also which uses 50-MHz clock instead of 25-MHz clock and 2-bit data path instead of 4-bit data path. RMII allows the use of eight-pin interface instead of 16-pin interface needed for MII. For discussion purposes, however, we will refer to MII only. To allow for ease of migration, the IEEE 802.3u standard made MII layer support both 10 and 100 Mbps by autonegotiation (AN) sublayer to detect attached end station or device to determine the speed that the device supported and also to detect if device supported duplex or HDX capability. To keep MAC layer unchanged by the introduction of 4-bit data path, the standard introduced a reconciliation sublayer to account for this 4-bit data path. Although with the use of FDX wire pair and switch, CSMA/CD for collision handling became redundant, there still was a need for flow control which thus far was handled indirectly by CSMA/CD function. For this flow control, the standard provided a MAC control sublayer in the MAC layer. This sublayer detected and processed a special MAC control frame or pause frame to affect flow control. Functionally, MAU, which is the combination of PMA and MDI sublayers, remained same and provided an MDI. In addition a physical medium–dependent (PMD) sublayer was added below PMA sublayer so that PMA can get unserialized data from PCS during Tx path and pass serialized data to PMD and receive serialized data on the receive path from PMD and pass unserialized data to PCS. Fig. 3.9 shows the comparison of 10 and 100 Mbps data link and physical layers and also maps to OSI seven-layer model. This figure also shows other standards including 1, 10, 40, and 100 Gbps.

Ethernet switches make their forwarding decision based on what field of the ethernet frame?

Figure 3.9. Comparison of various IEEE 802.3 standards for Ethernet.

By 1995, 100 Mbps also known as Fast Ethernet was widely adopted and implemented. By that time, the emergence of high-bandwidth applications, which were time critical, required further increase in bandwidth. This time, the IEEE 802.3z task group was tasked to develop the 1000 Mbps or 1 Gbps Ethernet standard. They kept similar goals in mind as the earlier task group had kept while developing 100-Mbps standard. The proposal became standard in 1998 to meet these demands. With the support for VLAN adopted in 1998, the 1-Gbps standard also included support for VLAN and was based on using 8-bit data path and a clock speed of 125 MHz. These two combinations gave a 1000-Mbps bandwidth. However, increasing the clock speed to 125 MHz reduced the slot time to 0.512 μs and collision diameter to 25 m. Both of these were not acceptable. So this time, the standard increased the Tx time to 4096-bit times from 512-bit times by adding a carrier extension while keeping the frame size unchanged. This effectively increased the slot time by a factor of 8 and collision diameter to 200 m. It should be noted that the slot size issue and the frame extension is really only an issue for HDX transmission which is exceedingly rare at 1000 Mbps and somewhat rare today even at 100 Mbps.

The MII sublayer was modified to gigabit MII (GMII) sublayer to handle higher clock speed and 8-bit data path. This standard also introduced frame bursting to improve performance. Here, if an end station had multiple Ethernet frames waiting for transmission, then the end station would keep the channel occupied by transmitting a control character rather than initiate IFG. This control character was followed by an IFG of 0.096 μs which is the IFG at 1000 Mbps and then transmits the next frame. This control character prevented another end station from capturing the channel. This process is followed to a maximum of 8000 bytes of data before relinquishing the channel. The encoding in PCS sublayer was also changed from NRZI to 8B/10B, and the AN functionality was combined within PCS layer. One Gbps standard also allowed for asymmetric flow control in the sense that switch controlled the flow from an attached device, but the device was not allowed to control traffic from switch. This mechanism allowed buffering on the device in case of flow control and switch did not have to add buffers for flow control. The physical layer used fiber channel technology for encoding in PCS layer and connectors in the MDI layer. The PMD layer defined the standard for converting electrical signal to optical signal for transmitting over fiber optic cable including multimode fiber (MMF) or single-mode fiber (SMF). The distance that the signal could travel depended on the wavelength of the light source, fiber diameter, and type of the fiber. One-Gbps Ethernet used Gigabit Ethernet Interface Converter (GBIC) type of transceivers which supported short- and long-wavelength lasers using MMF (62.5- and 50-μm diameter) and SMF (10-μm diameter) optical fibers as well as short copper wire–pair physical interfaces. This also was based on fiber channel technology.

With the success of 1 Gbps standard and increasing need for higher bandwidths to aggregate 1-Gbps links, there was a need to further increase bandwidth. IEEE 802.3ae task group started work, and in 2002 the standard for 10 Gbps was published. Ten Gbps offered considerable saving over Ethernet over synchronous optical network (SONET) deployments. Ten Gbps was based on 64B/66B encoding in PCS sublayer, and in 10-gigabit MII sublayer, clock speed was increased to 156.25 MHz, and the data path was increased to 64-bit path with 32 bit for Tx data path and 32 bit for receive data path. Increase of clock speed from 125 to 156.25 MHz was a 1.25 times increase this coupled with eight times increase in data path increased the bandwidth by 1.25 × 8 = 10 times from 1 to 10 Gbps. This standard operated only on fiber optics medium in FDX mode with 1310- and 1550-nm lasers giving a range of 40 km which extended the reach of Ethernet from LAN to MAN. For larger distances covered by WAN, this standard also provided for a WAN Interface Sublayer (WIS) that formatted the frames to be SONET compatible, so that SONET infrastructure could be used for transport over WAN. This standard also needed a new PMD layer because fiber channel technology–based PMD sublayer used in 1-Gbps standard could not be used. Later, this 10-Gbps standard was modified to support copper wire pair for up to 100 m for data center applications and also to support transport over DWDM with Reconfigurable Optical Add–Drop Multiplexer (ROADM) so that frames could be sent over RAN and WAN without using SONET technology and that provided considerable cost savings. The GBIC transceivers and (10 gigabit small form factor pluggable (X is Roman character for 10)) XFP transceivers were replaced with small-form factor pluggable (SFP) and SFP + type of transceivers.

With the spread of 10-Gbps Ethernet, stage was set for another 10-fold increase in bandwidth to aggregate 10-Gbps links and also to provide support to increasing data center applications due to the onset of cloud-based applications. The IEEE 802.3ba task group started work on this 40 and 100 Gbps standard in 2008 and published the standard in 2010. The deployment of these 40 and 100 Gbps switches has commenced just now in 2015. The layers for these are also shown in Fig. 3.9. The increase in bandwidth was achieved by retaining 64B/66B encoding, and 156.25-MHz clock speeds in 40- and 100-gigabit MII sublayers from 10 Gbps standard but by increasing the number of lanes or wavelengths in PMA layer to 4- for 40-Gbps and 10- for 100-Gbps bandwidths. This standard also provides for the use of SFP, SFP+, and QSFP (Quad SFP, i.e., 4 SFP transceivers in one) type of transceivers. This standard removed the WIS layer (the layer that provided SONET support) so that Ethernet could use DWDM for WAN applications.

By tweaking clock speeds, data bit path, encoding/decoding methods, slot time, number of lanes or wavelengths for serialization/deserialization, and by leveraging improved ASIC chip based hardware, and enhancements to software, the IEEE 802.3 task groups were able to increase bandwidth from 10 Mbps to 100 Gbps while retaining the original Ethernet frame format shown in Fig. 3.7. Table 3.3 below gives a list25 of Ethernet evolution, and it is very interesting to track all the important developments related to Ethernet that is so profoundly changing the data networks. It also shows that the work has started on 400 Gbps. Table 3.3 also shows that the standard using multiple 25/50Gbps lanes is expected by 2017. There are already talks on starting the work on developing 1-Tbps (Terabit per second) Ethernet standard. So far, the basic Ethernet frame format shown in Fig. 3.7 has remained unchanged, and it will be interesting to see if it will remain so in the case of 1 Tbps as well.

Table 3.3. Evolution of Ethernet

Serial NumberYearNameIEEE StandardDescription
1 1973 Experimental Ethernet 2.94 Mbit/s over coaxial cable (coax) bus
2 1982 Ethernet II (DIX v2.0) 10 Mbit/s over thick coax. Frames have a type field. This frame format is used on all forms of Ethernet by protocols in the Internet protocol suite.
3 1983 10BASE5 802.3 10 Mbit/s over thick coax with a maximum distance of 500 m. Same as Ethernet II (above) except type field is replaced by length, and an 802.2 LLC header follows the 802.3 header. Based on the CSMA/CD process.
4 1985 10BASE2 802.3a 10 Mbit/s over thin coax (a.k.a. thinnet or cheapernet) with a maximum distance of 185 m
5 1985 10BROAD36 802.3b 10-Mbps baseband Ethernet over three channels of a cable television system with a maximum cable length of 3600 m
6 1985 Repeater 802.3c 10-Mbit/s (1.25-MB/s) repeater specs
7 1987 FOIRL 802.3d Fiber-optic inter-repeater link
8 1987 1BASE5 802.3e Star LAN (use of hubs)
9 1990 10BASE-T 802.3i 10 Mbit/s over twisted pair
10 1993 10BASE-F 802.3j 10 Mbit/s (1.25 MB/s) over fiber-optic
11 1995 100BASE-TX, 100BASE-T4, 100BASE-FX 802.3u Fast Ethernet at 100 Mbit/s w/autonegotiation. TX is two or four pair category 3 or higher unshielded twisted-pair cable. FX is over two multimode optical fibers. T4 is over four pairs of category 3 or higher unshielded twisted-pair cable.
12 1997 802.3x Full duplex and flow control; also incorporates DIX framing, so there is no longer a DIX/802.3 split.
13 1998 100BASE-T2 802.3y 100-Mbps baseband Ethernet over two pairs of category 3 or higher unshielded twisted-pair cable
14 1998 1000BASE-X 802.3z 1-Gbit/s Ethernet over fiber-optic. X is a generic name for 1000-Mbps Ethernet systems
15 1998 802.3-1998 A revision of base standard incorporating the above amendments and errata
16 1999 1000BASE-T 802.3ab 1-Gbit/s (1000 Mbps) baseband Ethernet over four pairs of category 5 unshielded twisted-pair cable
17 1998 Includes 802.1Q VLAN Tags 802.3ac Max frame size extended to 1522 bytes (to allow “Q-tag”) the Q-tag includes 802.1Q VLAN information and 802.1p priority information.
18 2000 Includes LAG 802.3ad Link aggregation for parallel links, since moved to IEEE 802.1AX
19 2002 802.3-2002 A Revision of base standard incorporating the three prior amendments and errata
20 2002 10GBASE-SR, 10GBASE-LR, 10GBASE-ER, 10GBASE-SW, 10GBASE-LW, 10GBASE-EW 802.3ae 10-gigabit Ethernet over fiber
21 2003 802.3af Power over Ethernet (15.4 W)
22 2004 802.3ah Ethernet in the first mile
23 2004 10GBASE-CX4 802.3ak 10 Gbit/s (1250 MB/s) Ethernet over twin-axial cables.
24 2005 802.3-2005 A Revision of base standard incorporating the four prior amendments and errata.
25 2006 10GBASE-T 802.3an 10 Gbit/s Ethernet over unshielded twisted pair
26 2007 Backplane Ethernet 802.3ap Backplane Ethernet (1 and 10 Gbit/s over printed circuit boards)
27 2006 10GBASE-LRM 802.3aq 10 Gbit/s Ethernet over multimode fiber (MMF)
28 2006 802.3as Frame expansion
29 2009 802.3 at Power over Ethernet enhancements (25.5 W)
30 2006 802.3au Isolation requirements for power over Ethernet (802.3-2005/Cor 1)
31 2009 EPON 802.3av 10 Gbit/s EPON
32 2007 802.3aw Fixed an equation in the publication of 10GBASE-T (released as 802.3–2005/Cor 2)
33 2008 802.3-2008 A revision of base standard incorporating the 802.3an/ap/aq/as amendments, two corrigenda and errata. Link aggregation was moved to 802.1AX.
34 2010 802.3az Energy-efficient Ethernet
35 2010 40 & 100 Gbps 802.3ba 40 and 100 Gbit/s Ethernet. Forty Gbit/s over 1-m backplane, 10-m Cu cable assembly (4 × 25 Gbit or 10 × 10 Gbit lanes) and 100 m of MMF, and 100 Gbit/s up to 10 m of Cu cable assembly, 100 m of MMF, or 40 km of SMF, respectively
36 2009 802.3-2008/Cor 1 Increase pause reaction delay timings which are insufficient for 10 Gbit/s (workgroup name was 802.3bb)
37 2009 802.3bc Move and update Ethernet-related TLVs (type, length, values), previously specified in Annex F of IEEE 802.1AB (LLDP) to 802.3.
38 2010 802.3bd Priority-based flow control. An amendment by the IEEE 802.1 data center bridging task group (802.1Qbb) to develop an amendment to IEEE standard 802.3 to add a MAC control frame to support IEEE 802.1Qbb priority-based flow control.
39 2011 Ethernet MIB consolidation 802.3.1 MIB definitions for Ethernet. It consolidates the Ethernet-related MIBs present in Annex 30A&B, various IETF RFCs, and 802.1AB annex F into one master document with a machine readable extract (workgroup name was P802.3be).
40 2011 802.3bf Provide an accurate indication of the transmission and reception initiation times of certain packets as required to support IEEE P802.1AS.
41 2011 802.3bg Provide a 40 Gbit/s PMD which is optically compatible with existing carrier SMF 40 Gbit/s client interfaces (OTU3/STM-256/OC-768/40G POS).
42 2012 802.3-2012 A revision of base standard incorporating the 802.3at/av/az/ba/bc/bd/bf/bg amendments, a corrigenda, and errata.
43 2014 802.3bj Define a 4-lane 100 Gbit/s backplane PHY for operation over links consistent with copper traces on “improved FR-4” (as defined by IEEE P802.3ap or better materials to be defined by the task force) with lengths up to at least 1 m and a 4-lane 100-Gbit/s PHY for operation over links consistent with copper twin-axial cables with lengths up to at least 5 m.
44 2013 802.3bk This amendment to IEEE Standard 802.3 defines the physical layer specifications and management parameters for EPON operation on point-to-multipoint passive optical networks supporting extended power budget classes of PX30, PX40, PRX40, and PR40 PMDs.
45 2015 802.3bm 100G/40G Ethernet for optical fiber
46 2014 1000BASE-T1 802.3bp Gigabit Ethernet over a single twisted pair, automotive and industrial environments
47 2016 40GBASE-T 802.3bq For 4-pair balanced twisted-pair cabling with 2 connectors over 30 m distances
48 2017 400 Gbps over optical fiber 802.3bs 400 Gbit/s Ethernet over optical fiber using multiple 25G/50G lanes
49 2017 802.3bt Power over Ethernet enhancements up to 100 W using all 4-pairs balanced twisted-pair cabling, lower standby power and specific enhancements to support IoT applications (e.g. lighting, sensors, building automation).
50 100BASE-T1 802.3bw 100 Mbit/s Ethernet over a single twisted pair for automotive applications
51 2015 802.3-2015 802.3bx—a new consolidated revision of the 802.3 standard including amendments 802.3bk/bj/bm
52 2015 802.3by 25G Ethernet
53 2017 802.3bz 2.5 gigabit and 5 gigabit Ethernet over Cat-5/Cat-6 twisted pair—2.5GBASE-T and 5GBASE-T
54 TBD 1 Tbps and beyond TBD 1 Tbps and beyond

EPON, Ethernet passive optical network; IoT, internet of things; LLDP, link layer discovery protocol; MIB, management information base; RFC, request for comments; TBD, to be determined.

Now that we have explored the definition of the Ethernet and its evolution, it will be helpful to briefly examine the components related to hardware of a typical Ethernet switch because it is the hardware that forms the data network and it is the network that moves bits from source to destination and on this movement of bits, ride the services needed by customers.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128053195000034

SDN in the Data Center

Paul Goransson, Chuck Black, in Software Defined Networks, 2014

7.5 Ethernet Fabrics in the Data Center

Traffic engineering in the data center is challenging using traditional Ethernet switches arranged in a typical hierarchy of increasingly powerful switches as one moves up the hierarchy closer to the core. Although the interconnecting links increase in bandwidth as we move closer to the core, these links are normally heavily oversubscribed, and blocking can easily occur. By oversubscription we mean that the aggregate potential bandwidth entering one tier of the hierarchy is greater than the aggregate bandwidth going to the next tier.

An alternative to a network of these traditional switches is the Ethernet fabric [7]. Ethernet fabrics are based on a nonblocking architecture whereby every tier is connected to the next tier with equal or higher aggregate bandwidth. This is facilitated by a topology referred to as fat-tree topology. We depict such a topology in Figure 7.8. In a fat-tree topology, the number of links entering one switch in a given tier of the topology is equal to the number of links leaving that switch toward the next tier. This implies that as we approach the core, the number of links entering the switches is much greater than those toward the leaves of the tree. The root of the tree will have more links than any switch lower than it. This topology is also often referred to as a Clos architecture. Clos switching architecture dates from the early days of crossbar telephony and is an effective nonblocking switching architecture. Note that the Clos architecture is not only used in constructing the network topology but may also be used in the internal architecture of the Ethernet switches themselves.

Ethernet switches make their forwarding decision based on what field of the ethernet frame?

Figure 7.8. Fat-tree topology.

For Ethernet fabrics, the key characteristic is that the aggregate bandwidth not decrease from one tier to the next as we approach the core. Whether this is achieved by a larger number of low-bandwidth links or a lower number of high-bandwidth links does not fundamentally alter the fat-tree premise.

As we approach the core of the data center network, these interconnecting links are increasingly built from 10 Gbps links, and it is costly to simply try to overprovision the network with extra links. Thus, part of the Ethernet fabric solution is to combine it with ECMP technology so that all of the potential bandwidth connecting one tier to another is available to the fabric.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124166752000073

What field does a switch use to forward a frame?

To forward the frame, the switch examines the destination MAC address and compares it to addresses found in the MAC address table. If the address is in the table, the frame is forwarded out the port associated with the MAC address in the table.

How do switches make forwarding decisions?

since the switch make forwarding decisions based on the destination address which is at the header of the packet, the switch can make the forwarding decision before receiving the complete packet, this process is called cut-through, the switch forwards part of the packet before receiving the complete packet.

How does an Ethernet switch forward data?

It takes in packets sent by devices that are connected to its physical ports, and forwards them to the devices the packets are intended to reach. Switches can also operate at the Network Layer (Layer 3) where routing occurs.

How does Ethernet make its decisions as to where frames go in the network?

The frame is sent onto the network where an Ethernet switch checks the destination address of the frame against a MAC lookup table in its memory. The lookup table tells the switch which physical port, i.e., RJ45 port, is associated with the device whose MAC address matches destination address of the frame.