Sample transit BGP core configuration

I will be doing a a series of labs exploring BGP and MPLS, how they could or could not work together, what problems MPLS solves and causes, and how its configuration and maintenance are done. I will start with a simulated segment of a sample ISP, running BGP on all routers, and will move to an environment running MPLS on the core devices and BGP on the provider edges. I will further explore the different options presented after the MPLS migration, like VPNs, Traffic Engineering and VPLS.

This post covers the first lab, about bringing the sample BGP core up, and will give me an option to practice some BGP related stuff, including hierrarchical route reflectors with (just a bit of 😉 ) redundancy, route filtering with prefix-lists, as-paths and route-maps, and some very basic stuff with BGP communities.

Contents

Topology
Addressing
Configuration
Internal routing within ISP1
iBGP sessions
eBGP sessions and route filters
Inbound route filters
Outbound route filters
Route tagging with BGP community strings
BGP configuration on PE devices
Complete configurations

This is the simple topology we are starting with:

Sample ISP BGP core

Sample ISP BGP core

We have a sample ISP environment, significantly simplified for lab purposes (that was the total number of routers I was able to run in dynamips in my current configuration). The provider is running BGP on all devices, for AS65001. The physical links in the toplogy could change as we go along, and at the start are exactly following the BGP sessions – we are running a pair of redundant core route reflectors in the backbone (Core_RR1 and Core_RR2), to reflect routes between the POPs. There are a total of three POPs – two for customer connectivity and one, called IXP, to represent an external peering point, similar to a real-world Internet Exchange Point. To limit the number of devices we need to run in dynamips, the BGP design is significantly simplified – each client POP is consisting of a single POP route reflector (POP_RRCS), fully meshed with iBGP to the core route reflectors, and running iBGP sessions to the PE routers in the POP. The POP route reflector routers are clients to the core route reflectors and servers for the PE routers. The PE routers are running eBGP to the customers – there are two customers, each of them having two locations, each of which is connected to a different POP. We will use this further to run MPLS VPNs between the individual customer locations. To limit the number of devices we need to run on dynamips, the IXP POP is consisting of a single edge router only, running iBGP as a client to the core route reflectors and eBGP to the external peer EXT_PE in AS65002.

In the real world, and depending on the scale of the provider network, such a design could be much more redundant, with a pair of route reflectors in each POP, including the IXP POPs, or running confederations in some places, instead of hierrarchical route reflectors. However, the presented limited topology should be enough to illustrate the main things MPLS is used for in an ISP core environment.

The addressing scheme is documented in the topology, including the addresses for the internal core interconnections, the address space delegated to each POP, and the address space used by the customers internally. Loopbacks on the customer edge routers are used to inject some customer routes from both locations into the network, simulating some internal customer network connectivity. (more…)

Getting started with MPLS, getting rid of BGP in the core

In this lab, we will do a very simple MPLS lab, to illustrate how we can have a network without BGP on the core transit routers in an AS.  Here’s how the simple network looks like:

MPLS start-up topology

MPLS start-up topology

The complete configurations are listed at the end of the post.

We will start with just 3 routers in our AS 65001:

RLeft:

Loopback0: 192.168.255.10/32  –> Management, routed in EIGRP
Loopback1: 192.168.10.1/24 –> Simulated LAN, routed in iBGP
Fa0/0: 172.16.0.1/30 –> Interconnection to RCentre0, routed in EIGRP

RCentre0:

Loopback0: 192.168.255.11/32 –> Management, routed in EIGPR
Fa0/0: 172.16.0.2/30 –> Interconnection to RLeft, routed in EIGRP
Fa0/1: 172.16.0.5/30 –> Interconnection to RRight, routed in EIGRP

RRight:

Loopback0: 192.168.255.12/32 –> Management, routed in EIGPR
Loopback1: 192.168.20.1/24 –> Simulated LAN, routed in iBGP
Fa0/0: 172.16.0.6/30 –> Interconnection to RCentre0, routed in EIGRP

We have already enabled EIGRP across our network, and we have routed only the loopbacks and interconnection subnets (to make sure we have our neighborships up).  (more…)

Terminating a GRE/IPSEC tunnel behind NAT

Suppose I need to (for whatever reason ;)) site-to-site VPN but also need to terminate the GRE/IPSEC tunnel on a device which is behind a NAT. The following diagram illustrates the scenario:

GRE over IPSEC, terminating both behind NAT

GRE/IPSEC terminating behind NAT

We need to have an IPSEC SA between RLeft and RRight and we need to have a GRE VTI between RLeft and RRight, running over this SA. The SA will secure and encapsulate the GRE traffic.

Some initial notes

If we want RRight to be behind NAT, there are some challenges to the normal GRE/IPSEC operation. The source address of the packets coming from RRight is changed to the public IP that NATRouter assigns it after it does the address translation. Being completely unaware of any NATs in the way, RLeft would only see packets coming from the NAT public IP 172.16.20.3 and not from the actual interface of RRight. This means RRight will have to authenticate as 172.16.20.3, so we need to bring up a loopback on RRight holding this IP address. This is why this could not work with NAT interface overloading on NATRouter, as in that case the IP of the loopback on RRight would be the same as the IP address of the fa0/0 interface on NATRouter, and there would be issues with routing the packets back to RRight.

NAT traversal

A note worh mentioning is that the NAT-transparency feature, also known in some sources as “UDP wrapper” or “UDP encapsulation” is enabled by default since 12.2(13)T. This feature allows the IPSEC endpoints to detect whether a NAT is present somewhere along the way, by exchanging hashes of the source and destination IP address and port at each end of the IPSEC SA. By recalculating the hashes locally and then comparing the values, the endpoints can detect whether a NAT is present along the way or not. Then, if both endpoints support NAT-T (which in our case they do), they will negotiate whteher to use NAT-T. The final (and most important) step is to encapsulate every IPSEC SA and ISAKMP packet within a new UDP header. IPSEC NAT-T uses UDP port 4500.

Configurations

(more…)

The original TCP specification

This is an interesting read that i found recently – the original proposed specification for the TCP protocol. It’s a presentation by Vint Serf and Robert Kahn from 1974, when during the IEEE transaction of communcations they propose a design of a new protocol for packet switched communication.  It’s called “A protocol for packet network intercommunication” and is interesting in terms of the initial plans of the protocol. Initially it includes only a TCP protocol, with the separation of functionality between TCP and IP to follow later.  It’s quite interesting to see some forecasted problems and quite a few differences from the current protocol state, including for example the proposed address space size.

Cisco IOS banner tokens

Since Cisco IOS Release 12.0(3)T there is an option to use tokens when configuring different types of banner messages on routers.  The tokens are replaced with the respective parameter value when the banner is displayed, and they can reference different parameters of the device and/or configuration.

The table below lists the possible tokens and their support in different banner messages:

Банерtoken
motd
login
exec
incoming
slip-ppp
Description
$(hostname)
YES
YES
YES
YES
YES
Hostname of the device
$(domain)
YES
YES
YES YES YES Domain
$(peer-ip)
NO
NO
NO
NO
YES IP Address of the peer device
$(gate-ip)
NO
NO
NO
NO
YES
IP Address of the gateway
$(encap)
NO
NO
NO
NO
YES Encapsulation type
$(encap-alt)
NO
NO
NO
NO YES Encapsulation shown as SL/IP instead of SLIP.
$(mtu)
NO
NO
NO
NO
YES
MTU
$(line)
YES
YES
YES
YES NO Number of the line hosting the incoming connection
$(line-desc)
YES
YES
YES
YES
NO Description of the line hosting the incoming connection

Here’s an exmaple of using the $(hostname) token when configuring a login banner:

R1(config)#banner login # You are connected to $(hostname). Authorized users only! #

This option comes in quite handy in cases of a global rollout including banner configuration on multiple devices (using a tool for configuration management/automated rollout, such as Expect, or a commercial one such as CiscoWorks).

BGP RIB-Failure

BGP RIB-Failure is a situation where some routes from the BGP process cannot be installed in the main routing table due to different reasons::

  • A route with a better administrative distance is already installed in the routing table (i.e. a static route/route from an IGP with a better administrative distance);
  • The route in question is a directly connected network (special case of the previous point with the lower administrative distance);
  • The route is for a host IP address configured on the receiving BGP speaker;
  • Installing the route in the routing table will cause the configured route-limit for a particular VRF to be exceeded (when using VRF);
  • A memory problem on the BGP speaker prevents the route from getting installed in the RIB;

In such cases, the failed routes are marked with an r> in the list of BGP routes:

R1#sh ip bgp
BGP table version is 9, local router ID is 192.168.0.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

Network          Next Hop            Metric LocPrf Weight Path
r>i192.168.0.1/32   192.168.1.1              0    100      0 i
r>i192.168.1.0/30   192.168.1.1              0    100      0 i
*>i192.168.10.0     192.168.1.1              0    100      0 i
r>i192.168.30.0     192.168.1.1              0    100      0 i
R1#

We can check all RIB-Failure routes on our BGP router with show ip bgp rib-failure :

R1#sh ip bgp rib-failure
Network Next Hop RIB-failure RIB-NH Matches
192.168.0.1/32 192.168.1.1 Own IP address n/a
192.168.1.0/30 192.168.1.1 Higher admin distance n/a
192.168.30.0 192.168.1.1 Higher admin distance n/a
R1#

If we have a look at the show ip bgp information for each of these network, we will see the individual RIB-Failure code but not much else:

R1#sh ip bgp 192.168.0.1/32
BGP routing table entry for 192.168.0.1/32, version 6
Paths: (1 available, best #1, table Default-IP-Routing-Table, RIB-failure(4) - next-hop mismatch)
Advertised to update-groups:
1
Local, (received & used)
192.168.1.1 from 192.168.1.1 (192.168.10.1)
Origin IGP, metric 0, localpref 100, valid, internal, best
R1#sh ip bgp 192.168.1.0/30
BGP routing table entry for 192.168.1.0/30, version 7
Paths: (1 available, best #1, table Default-IP-Routing-Table, RIB-failure(17))
Advertised to update-groups:
1
Local, (received & used)
192.168.1.1 from 192.168.1.1 (192.168.10.1)
Origin IGP, metric 0, localpref 100, valid, internal, best
R1#sh ip bgp 192.168.30.0
BGP routing table entry for 192.168.30.0/24, version 9
Paths: (1 available, best #1, table Default-IP-Routing-Table, RIB-failure(17))
Advertised to update-groups:
1
Local, (received & used)
192.168.1.1 from 192.168.1.1 (192.168.10.1)
Origin IGP, metric 0, localpref 100, valid, internal, best
R1#

If we have a look at the routing table as far as each of these networks is concerned, we can find out th reason for the RIB-Failure:

R1#sh ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

D    192.168.30.0/24 [90/156160] via 192.168.1.9, 01:40:21, FastEthernet2/0
B    192.168.10.0/24 [200/0] via 192.168.1.1, 01:46:46
C    192.168.255.0/24 is directly connected, FastEthernet1/0
192.168.0.0/32 is subnetted, 1 subnets
C       192.168.0.1 is directly connected, Loopback0
192.168.1.0/30 is subnetted, 3 subnets
C       192.168.1.8 is directly connected, FastEthernet2/0
C       192.168.1.0 is directly connected, FastEthernet0/0
C       192.168.1.4 is directly connected, FastEthernet0/1
R1#

Each of the three routes is present in the routing table, but one of them is R1’s own IP address on Loopback0, the other one is learnt via EIGRP and is thus with a better admin distance, and the the third one is a directly connected network via Fa0/0. Even though the routes are set to RIB-Failure status, they are actually being advertised to R1’s  eBGP peer, which differs a bit from what is mentioned in Cisco’s BGP FAQ document. The lab topology is shown below for illustration.

Sample topology for BGP RIB Failure

Sample lab for BGP RIB-Failure

In our test scenario R0 is announcing the three relevant networks into the iBGP process to R1, but R1 has better routes already present for all of them – either from EIGRP (.30.0/24), or a directly connected one (.1.0./30), or is it a route to its own IP address (.0.1/32). R1 itself has an eBGP peering session with R2. Let’s have a look at what R2 is receiving over this eBGP session and whether the RIB-Failure routes are being advertised and accepted over it:

R2#sh ip bgp
BGP table version is 5, local router ID is 192.168.20.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

Network          Next Hop            Metric LocPrf Weight Path
*> 192.168.0.1/32   192.168.1.5                            0 100 i
*> 192.168.1.0/30   192.168.1.5                            0 100 i
*> 192.168.10.0     192.168.1.5                            0 100 i
*> 192.168.30.0     192.168.1.5                            0 100 i
R2#

As we can see, all the RIB-Failure routes are being advertised and accepted normally over the eBGP peering regardless of their RIB-Failure state. They are not flagged as RIB-Failure routes on R2 and they are installed in the routing table at R2:

R2#sh ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

B    192.168.30.0/24 [20/0] via 192.168.1.5, 02:27:02
B    192.168.10.0/24 [20/0] via 192.168.1.5, 02:27:02
C    192.168.20.0/24 is directly connected, Loopback0
192.168.0.0/32 is subnetted, 1 subnets
B       192.168.0.1 [20/0] via 192.168.1.5, 02:27:02
192.168.1.0/30 is subnetted, 2 subnets
B       192.168.1.0 [20/0] via 192.168.1.5, 02:27:02
C       192.168.1.4 is directly connected, FastEthernet0/0
R2#

VRF Lite – VRFs without MPLS

MPLS VRFs are both a popular and widely advertised solution for provider VPNs and for segregating traffic through a backbone network, and they provide and rely on a few features to sell well 😉

  • Segmentation of the network traffic from different client devices on both sides of a cloud, usually a provider cloud;
  • The option of keeping the clients routing information intact while carrying it through the cloud to another client device;
  • The option for the provider to carry several client routing tables through its cloud in parallel, even if they contain overlapping address spaces, including overlapping RFC1918 addressing;

The traditional solution for such a situation, based on MPLS VPNs with separate VRF instances for each client, provides all of the features mentioned above, on the basis of a few technical items:

  • A routing relationship between the client (Customer Edge, aka CE) router and the border router of the provider – Provider Edge aka PE router, using the clients routing protocol of choice ;
  • The advertisement of all client networks over this routing relationship, just in the same way as if the PE router were part of the client network itself ;
  • The existence of at least one VRF instance per client;
  • Marking on behalf of the PE device of all client routes with a specific Route Distinguisher, aka RD tag, thus forming the so-called VPNv4 address;
  • Redistribution, done by the PE device, of all client routes from the client routing protocol of choice into MP-iBGP, running only between the PE devices on both sides of the provider cloud;
  • The existence of a stable IGP, sustaining the internal routing in the provider cloud, and consequently the MPLS core and MP-iBGP;
  • The existence of a stable and scalable MPLS core;

All this is great if you are an ISP having a ready and stable MPLS core. However, if you are trying to develop a custom solution for your specific corporate network needs, there are a few challengfes you will face in terms of MPLS VPNs. Most of these challenges are related to device resource planning:

  • You need 3 routing protocols on every PE device in your network (protocol that the client is running, MP-iBGP and a separate internal routing protocol to hold your own core network together);
  • The PE devices need to be powerful and reliable enough to run and scale the 3 routing protocols needed, as well as the required number of VRFs or parallel routing tables;
  • The core MPLS routers need to be powerful and scalable enough themselves;;

What the VRF Lite feature can offer is VRF functionality without the need for MPLS and its inherent complexity and resource usage, and without the need of multiple routing protocols on the PE devices. This way we can successfully carry several routing tables in parallel through a cloud without having MPLS inside it.

Using VRFs with MPLS implies that the segregation of the different client routing tables is achieved through using a separate protocol between the PE devices, carryng these client routing tables, tagged accordingly with the right RD tag.  In order to enable the same segregation of routing tables with VRF Lite, that is without MPLS, we need to have separate connectivity over which to run multiple routing neighborships between our devices, with each neighborship only carrying a single VRFs routing table.

We can use the following two lab scenarios to illustrate the differences between the two VRF configurations.

(more…)

weird commands on CISCO3825 boot

I was quite disturbed at first when I recently saw the following commands being logged after a router boot-up:

idx   sess           [email protected]      Logged command
1     1        [email protected]  |access-list 199 permit icmp host 10.10.10.10 host 20.20.20.20
2     1        [email protected]  |crypto map NiStTeSt1 10 ipsec-manual
3     1        [email protected]  |match address 199

4     1        [email protected]  |set peer 20.20.20.20

5     1        [email protected]  |exit
6     1        [email protected]  |no access-list 199
7     1        [email protected]  |no crypto map NiStTeSt1

In this case I am fully certain that there was nothing connected to the console of the device during the aforementioned boot process… a brief google later it turned out the crypto map in question was part of the autotest process of the crypto accelerator when the router boots up. 🙂

http://www.securityfocus.com/archive/75/474377/30/180/threaded

cable testing on a cisco 3750

Catalyst 2960, 2970, 3560/3560-E, и 3750/3750-E switches have a built-in Time-Domain Reflectometer that can be used to test cabling directly from the switchports. TDR is not supported neither on 10Gig interfaces, nor FastEthernet or SFP interfaces.

Example:

SW01#test cable-diagnostics tdr interface gigabitethernet2/1
TDR test started on interface Gi2/1

A TDR test can take a few seconds to run on an interface

Use ‘show cable-diagnostics tdr’ to read the TDR results.
SW01#

SW01#sh cable-diagnostics tdr interface gig2/1
TDR test last run on: March 02 00:31:15

Interface Speed Local pair Pair length Remote pair Pair status
——— —– ———- —————— ———– ——————–
Gi2/1 100M Pair A 47 +/- 4 meters Pair A Normal
Pair B 48 +/- 4 meters Pair B Normal
Pair C 48 +/- 4 meters Pair C Open
Pair D 48 +/- 4 meters Pair D Open
SW01#

The output shows the outgoing interface, the working bandwidth of the interface, and also which local pair corresponds to which remote one, approximated length of the cabling pair, and the pair status:

  • Normal;
  • Open – not connected;
  • Not Completed – TDR testing was unsuccessful;
  • Not supported – TDR testing not supported on the interface;
  • Shorted – cable short ;);

TCP congestion control and queueing

TCP mechanisms for traffic regulation and congestion-control

Standard TCP implementations have several mechanisms for congestion-avoidance:

  • TCP Slow-start – this is the standard method for increasing the amount of data being transported – starting exponentially at one segment, and after a certain point – in a linear way (congestion-avoidance behaviour), and for regulating this amount in cases of congestion after the traffic source;
  • TCP congestion-avoidance – works together with TCP Slow-start and defines a means of regulating the volume of data to be sent in TCP – de facto providing something like flow-control at the source;
  • TCP Fast retransmit – defines an accelerated retransmission of part of the data, upon receipt of a minimum of 3 duplicate acknowledgements, without waiting for the TCP retransmission timeout to expire;
  • TCP Fast recovery – after sending the missing segment of the fast-restransmit mechanism, TCP goes on with linear growing of the amount of information in transport (congestion avoidance), instead of starting a new slow-start from the beginning;

TCP Slow-start

During a TCP session establishment one of the variables being initialized at the end nodes is the cwnd – congestion window. (more…)