BGP design options for EVPN in Data Center Fabrics

BGP design options for EVPN in Data Center Fabrics

Ivan Pepelnjak described his views regarding BGP design options for EVPN-based Data Center Fabrics in this article. In comments to following blog post we briefly discussed sanity of eBGP underlay + iBGP overlay design option and come to conclusion that we disagree on this subject. In this blog post I try to summarize my thoughts about this design options.

Let’s start with the basics – what’s the idea behind underlay/overlay design? It’s quite simple – this is logical separation of duties of “underlying infrastructure” (providing simple IP transport in case of DC fabrics) and some “service” overlayed on top of it (be it L3VPN or EVPN or any hypervisor-based SDN solution). The key word in previous sentence – “separation”. In good design overlay and underlay should be as separate and independent from each other as possible. This provides a lot of benefits – most important of which is the ability to implement “smart edge – simple core” network design, where your core devices (= spines in DC fabric case) doesn’t need to understand all complex VPN-related protocols and hold customer-related state.

We use this design option for a long time – OSPF for underlay and iBGP for overlay is de-facto standard for majority of networks all over the world. This design provides a very clear separation of underlay and overlay – it’s just two completely different protocols.

But quite a while ago Petr Lapukhov introduced a new way of building large DC fabrics – using BGP as the only routing protocol. This provides a lot of benefits, which described in RFC 7938. This routing design allows to build robust and scalable IP fabric, which provides plain L3 connectivity between connected endpoints.

But what to do if you need not only plain IP connectivity, but more complex stuff like multi-tenancy or L2 services? You have a lot of options to solve this problem, but let’s examine a single one here – network-based EVPN overlay.

For implementing this you have two design options:

  • implement EVPN overlay using existing eBGP sessions between leaf and spine switches;
  • use existing eBGP for underlay only and establish new iBGP EVPN sessions on top.

Let’s examine each option in greater detail.

 

  1. eBGP-only design

In this design option we need to add additional address family to existing eBGP sessions between leaf and spine switches (EVPN signalling – AFI/SAFI 25/70). But unfortunately that’s not all – for proper forwarding of EVPN traffic the BGP next hop in an EVPN update must not be changed on the path between ingress and egress switch – the EVPN BGP next hop should always point to the egress fabric edge switch. So your spine switch, in addition to ability to understand BGP EVPN routes, need to be able to selectively change (or not change) next-hop for different types of BGP routes (and of course it will not do it by default, so you need to configure respective policy or option).

So I see a few disadvantages in this approach:

  • spine switches need to be able to understand and process all EVPN BGP routes from all connected leaf switches, which entails additional control plane complexity on spines and related CPU/memory load (not to mention additional problems you encounter if, for example, you plan to use spine switches from different vendors or thinking about different lifecycles for leaf and spine switch gear);
  • there is no clear separation of underlay and overlay routing in this design – only one BGP protocol which uses single session for transport of both address-families;
  • additional configuration (or internal BGP implementation) complexity related to need to change/not change next-hops for different types of BGP routes.

 

2. eBGP underlay + iBGP overlay

In this design option leaf switches exchange EVPN routes over dedicated iBGP sessions established between loopback adresses. Although in this picture spine switches act as Route Reflectors for iBGP topology, this is just for simplification of picture, and in real life dedicated out-of-forwarding-path RRs should be used (virtual routers seems a perfect fit for this role).

The main complexity of this design is the fact that every leaf switch need “to be” in two different BGP AS simultaneously (or, better said, present different AS numbers to different BGP peers) – unique-per-switch underlay AS number to eBGP underlay peers and common iBGP AS number to iBGP overlay peers.

In JunOS is pretty easy to achieve – you need to use “local-as” option:

alex@leaf1# show protocols bgp
group underlay {
    type external;
    export direct;
    local-as 201;
    family inet {
        unicast;
    }
    multipath multiple-as;
    neighbor 192.168.0.0 { ### SPINE1
        peer-as 101;
    }
    neighbor 192.168.0.4 { ### SPINE2
        peer-as 102;
    }
}
group overlay {
    type internal;
    local-address 11.11.11.11; ### loopback
    local-as 64512;
    family evpn {
        signaling;
    }
    multipath;
    neighbor 2.2.2.2; ### EVPN RR1
    neighbor 1.1.1.1; ### EVPN RR2
}

So I see only one disadvantage in this design option – additional configuration (or internal BGP implementation) complexity related to need to present two different AS numbers to different BGP peers. This approach may be hard to grasp if you see it first time.

But this design provides a couple of valuable benefits (compared to eBGP only design):

  • first of all, your spine switches doesn’t know anything at all about EVPN overlay ( = simple network core);
  • you can add/change/delete your overlay as you wish at any time, because it uses dedicated iBGP sessions (try to add another AFI/SAFI to already running eBGP session);
  • you have some logical separation between underlay and overlay routing – although it is still one routing protocol, underlay and overlay use separate and easily distinguishable BGP sessions, separate address families ( = separate internal routing tables), separate AS numbers. This provides a lot of benefits in real-life network operations and troubleshooting.

 

In summary:

  • if you need small, simple and easy EVPN DC fabric – use OSPF underlay + iBGP overlay design option;
  • if you need to use eBGP for underlay routing (for whatever reason – be it scale or need for traffic engineering or anything else) and you need to implement network-based EVPN on top of it – use iBGP EVPN overlay;
  • don’t use eBGP only design – it have some fundamental flaws, such as spine switches control plane complexity/scalability and inability (or at least high complexity) to gracefully add/change/delete EVPN overlay in already running network. If you have no other option but to use this design, at least make sure that EVPN routes doesn’t blow up forwarding tables on your spine switches (don’t install EVPN routes into forwarding tables).

One thought on “BGP design options for EVPN in Data Center Fabrics

  1. I have come to the same conclusion you have in my own testing. eBGP underlay + iBGP overlay appears to provide the most scale and flexibility.

    My only other thoughts around the idea are:

    1. In everything short of the hyper scale networks why limit the spines to just underlay functionality? Yes, this gets you away from the ‘clos’ or 2 hops anywhere design. eBGP+EVPN to the spines allow them to serve as the traditional distribution switch. Haven’t decided the good, bad, or ugly in regards to this but it is flexible.

    2. Even in the mid to larger enterprise data centers. Let’s say 72 TOR switches. (The limit of the QFX10002) OSPF or ISIS plus the Xeon processors in the QFX10k have no problem processing the updates.

    3. Something I have put zero thought into. What about spanning multiple data centers?

Leave a Reply

Your email address will not be published. Required fields are marked *