Converting an MX Virtual Chassis to MX MC-LAG

The MX960 platform is known as one of Juniper’s flagship platforms in that it is flexible enough for deployments across services providers and enterprises alike. Recently I had the experience of converting an MX960 Virtual Chassis (VC) to an MX960 Multi-Chassis Link Aggregation Group (MC-LAG). This post discusses how the migration was performed.

Before I begin diving into the actual migration we need to first understand the MC-LAG configuration on the MX’s. I chose to utilize an Active/Active topology with MAC Synchronization for Layer 3 routing for this deployment because we had the correct hardware components and JUNOS software. The Juniper MX Series book available on O’Reilly is an excellent resource to understand the requirements, configuration, and maintenance of the MC-LAG as well as an overall resource for the MX Series. I cannot recommend it enough!

In an Active/Active MC-LAG topology the MX MC-LAG configuration requires a link for the Inter-Chassis Control Protocol (ICCP), the Inter-Chassis Link (ICL), and the LAG configuration.

The ICCP configuration is straightforward; configure the ICCP protocol and switch options:

MX-01:

interfaces {
    ge-0/1/1 {
        gigether-options {
            802.3ad ae20;
        }
    }
    ge-0/1/2 {
        gigether-options {
            802.3ad ae20;
        }
    }
    ge-0/2/1 {
        gigether-options {
            802.3ad ae20;
        }
    }
    ge-0/2/2 {
        gigether-options {
            802.3ad ae20;
        }
    }
    ae20 {
        aggregated-ether-options {
            lacp {
                active;
                periodic fast;
            }
        }
        unit 0 {
            family inet {
                address 10.19.211.1/30;
            }
        }
    }
}
protocols {
    iccp {
        local-ip-addr 10.19.211.1;
        peer 10.19.211.2 {
            redundancy-group-id-list 1;
            liveness-detection {
                minimum-interval 150;
                minimum-receive-interval 60;
                multiplier 3;
            }
        }
    }
}
switch-options {
    service-id 1;
}

MX-02:

interfaces {
    ge-0/1/1 {
        gigether-options {
            802.3ad ae20;
        }
    }
    ge-0/1/2 {
        gigether-options {
            802.3ad ae20;
        }
    }
    ge-0/2/1 {
        gigether-options {
            802.3ad ae20;
        }
    }
    ge-0/2/2 {
        gigether-options {
            802.3ad ae20;
        }
    }
    ae20 {
        aggregated-ether-options {
            lacp {
                active;
                periodic fast;
            }
        }
        unit 0 {
            family inet {
                address 10.19.211.2/30;
            }
        }
    }
}
protocols {
    iccp {
        local-ip-addr 10.19.211.2;
        peer 10.19.211.1 {
            redundancy-group-id-list 1;
            liveness-detection {
                minimum-interval 150;
                minimum-receive-interval 60;
                multiplier 3;
            }
        }
    }
}
switch-options {
    service-id 1;
}

The ICL is then configured by adding links into a LAG similar to the configuration below. Since the ICL is used to send traffic between the two MX’s we will also need to configure VLAN tags to be passed between the two members:

MX-01 and MX-02:

xe-1/0/0 {
    gigether-options {
        802.3ad ae21;
    }
}
xe-2/0/0 {
    gigether-options {
        802.3ad ae21;
    }
}
xe-7/0/0 {
    gigether-options {
        802.3ad ae21;
    }
}
xe-8/0/0 {
    gigether-options {
        802.3ad ae21;
    }
}
ae21 {
    flexible-vlan-tagging;
    encapsulation flexible-ethernet-services;
    aggregated-ether-options {
        lacp {
            active;
            periodic fast;              
        }
    }
    unit 0 {
        family bridge {
            interface-mode trunk;
            vlan-id-list 1-4094;
        }
    }
}

Finally, the actual LAG configuration is shown below and I will highlight some of the key points:

MX-01:

interfaces {
    xe-1/1/1 {
        gigether-options {
            802.3ad ae1;
        }
    }
    xe-2/1/1 {
        gigether-options {
            802.3ad ae1;
        }
    }
    xe-7/1/1 {
        gigether-options {
            802.3ad ae1;
        }
    }
    xe-8/1/1 {
        gigether-options {
            802.3ad ae1;
        }
    }
    ae1 {
        vlan-tagging;                   
        multi-chassis-protection 10.19.211.2 {
            interface ae21;
        }
        encapsulation flexible-ethernet-services;
        aggregated-ether-options {
            link-speed 10g;
            lacp {
                active;
                periodic fast;
                system-id 00:00:00:00:00:02;
                admin-key 1;
            }
            mc-ae {
                mc-ae-id 2;
                redundancy-group 1;
                chassis-id 0;
                mode active-active;
                status-control active;
            }
        }
        unit 0 {
            family bridge {
                interface-mode trunk;
                vlan-id-list 1-4094;
            }
        }
    }
}

MX-02:

interfaces {
    xe-1/1/1 {
        gigether-options {
            802.3ad ae1;
        }
    }
    xe-2/1/1 {
        gigether-options {
            802.3ad ae1;
        }
    }
    xe-7/1/1 {
        gigether-options {
            802.3ad ae1;
        }
    }
    xe-8/1/1 {
        gigether-options {
            802.3ad ae1;
        }
    }
    ae1 {
        vlan-tagging;                   
        multi-chassis-protection 10.19.211.1 {
            interface ae21;
        }
        encapsulation flexible-ethernet-services;
        aggregated-ether-options {
            link-speed 10g;
            lacp {
                active;
                periodic fast;
                system-id 00:00:00:00:00:02;
                admin-key 1;
            }
            mc-ae {
                mc-ae-id 2;
                redundancy-group 1;
                chassis-id 1;
                mode active-active;
                status-control standby;
            }
        }
        unit 0 {
            family bridge {
                interface-mode trunk;
                vlan-id-list 1-4094;
            }
        }
    }
}

In the configuration above, I chose to set the Multi-Chassis Protection on each interface and in each instance the peer address is the same address as that of the ICCP peer configuration. In addition, the ICL (configured on ae21) is configured to send cross-chassis traffic. Underneath the LACP configuration we need to set a unique System ID for each LACP Bundle that matches the same System ID on both MX’s. The MC-AE-ID must also be unique to the configured LAGs as well as match MC-AE-ID on both MX’s. Finally, the Chassis-ID must be unique to the MX configured (in this case MX-01 is Chassis-ID 0, and MX-02 is Chassis-ID 1), and the Status-control must be set to either active or standby (in this case MX-01 is active, and MX-02 is standby).

For routing on the MX chassis I configured MAC Synchronization on each Integrated Routing and Bridging interface (IRB). This is done by configuring the same IRB interface and IP on each MX, and then configuring mc-mac-synchronization inside each bridge domain:

interfaces {
    irb {
        unit 1254 {
            family inet {
                filter {
                    input ri-filter;
                }
                address 1.1.1.1/24;
            }
        }
    }
}
bridge-domains {
    vlan_1254 {
        vlan-id 1254;
        mcae-mac-synchronize;
        routing-interface irb.1254;
    }
}

With that out of the way, consider diagram below. This is a simplified version of the topology that was reconfigured:

MC-LAG Topology

MC-LAG Topology

Since the MX’s were originally configured in a Virtual Chassis (VC) topology, the first objective is to prepare MX-02 to be separated from the chassis. I began this by first removing all but one of the VC ports from both MX’s using the following commands:

request virtual-chassis vc-port delete fpc-slot 2 pic-slot 0 port 0
request virtual-chassis vc-port delete fpc-slot 7 pic-slot 0 port 0
request virtual-chassis vc-port delete fpc-slot 8 pic-slot 0 port 0

The VC Ports will be used later to create the Inter-Chassis Link (ICL) later in the migration. This left xe-1/0/0 as the remaining VC Port. From there, all physical ports on MX-02 (currently configured as VC Member 1) are disabled using the commands below:

set interfaces ge-12/0/0 disable
<..snip..>
set interface xe-20/3/3 disable

This prevents MX-02 from forwarding traffic on the network – a key point in preparing for the migration. Once MX-02 was removed from the Virtual Chassis configuration and the MC-LAG is applied, the interfaces would become active and cause severe problems on the network. I created an apply-group to ensure that the interfaces would not come up until later:

groups {
     disable-interfaces {
         interfaces {
             <*> {
                 disable;
             }
         }
     }
 }
interfaces {
     apply-groups disable-interfaces;
}

With the interfaces disabled, I proceeded to remove the last MX-VC interface from both MX’s to sever the VC completely:

request virtual-chassis vc-port delete fpc-slot 1 pic-slot 0 port 0

Once the VC is severed I can now make changes to MX-02 without affecting the configuration on MX-01. I then loaded the MC-LAG configuration on MX-02 (similar to the LAG configuration show above) and committed the configuration. From there I deleted the VC configuration using the command below and rebooted MX-02:

request virtual-chassis member-id delete

Once MX-02 came back up, and upon confirming that the interfaces are still disabled, I began to stage the MC-LAG configuration and set the apply-group to disable all interfaces on MX-01. Once this was complete I deleted my apply-group on MX-02 to enable the interfaces, then committed the changes to both MX-01 and MX-02. At this point MX-02 became active on the network, and MX-01 is now inactive. From there I had to clear ARP on all the switches and hosts that were originally connected because of the change in the IRB’s MAC address, or simply wait for it to expire on hosts that I could not access. Lastly, I deleted the VC configuration and rebooted MX-01 using the command below:

request virtual-chassis member-id delete

From there, I removed the apply-groups configuration on MX-01 and confirmed the MC-LAG configuration and that traffic was flowing properly:

On MX-01:

root@MX-01> show iccp  

Redundancy Group Information for peer 10.19.211.2
  TCP Connection       : Established
  Liveliness Detection : Up
  Redundancy Group ID          Status
    1                           Up   

Client Application: lacpd
  Redundancy Group IDs Joined: 1 

Client Application: l2ald_iccpd_client
  Redundancy Group IDs Joined: 1 

root@MX-01> show interfaces mc-ae id 2 
 Member Link                  : ae1
 Current State Machine's State: mcae active state
 Local Status                 : active
 Local State                  : up
 Peer Status                  : active
 Peer State                   : up
     Logical Interface        : ae1.0
     Topology Type            : bridge
     Local State              : up
     Peer State               : up
     Peer Ip/MCP/State        : 10.19.211.2 ae21.0 up

On MX-02:

root@MX-02> show iccp  

Redundancy Group Information for peer 10.19.211.1
  TCP Connection       : Established
  Liveliness Detection : Up
  Redundancy Group ID          Status
    1                           Up   

Client Application: lacpd
  Redundancy Group IDs Joined: 1 

Client Application: l2ald_iccpd_client
  Redundancy Group IDs Joined: 1 


root@MX-02> show interfaces mc-ae id 2 
 Member Link                  : ae1
 Current State Machine's State: mcae active state
 Local Status                 : active
 Local State                  : up
 Peer Status                  : active
 Peer State                   : up
     Logical Interface        : ae1.0
     Topology Type            : bridge
     Local State              : up
     Peer State               : up
     Peer Ip/MCP/State        : 10.19.211.1 ae21.0 up

And that’s all she wrote! Please feel free to leave comments and questions below.

12 thoughts on “Converting an MX Virtual Chassis to MX MC-LAG

  1. Pingback: Quick Tip – DHCP Relay on MC-LAG | Tank on Networking

    • The biggest driver to split the Virtual Chassis was stability of the network. While VC provides easier management, it has the disadvantage of being a single point of failure on the network. Between software bugs and hardware failures there are plenty of opportunities for a single failure to bring down the entire core of the network. MC-LAG allowed us to provide many of the benefits of VC (minimizing spanning tree on the network, active/active flow capabilities) while keeping each core switch independent from the other.

      • Just out of curiosity, have you had that many software bugs that caused problems when either upgrading or hardware failed (specifically on the MX series – since we have a pair of new MX104’s – I’m curious)? I would still consider the MC-LAG implementation itself as a single point of failure. If a bug exists on one unit, it will exist on the other. We have had this situation present itself on a SAN, where independent controllers in an active/passive configuration failed the active controller, so the passive controller became active, then the newly active controller crashed due to the same bug.

      • Hey Eric,
        Good question! While I can’t get into all the details, we ran into 4 individual PR’s that affected the stability of the MX Virtual Chassis, and all were due to replication tasks failing for the VC, or the RE not handing the RE role to the backup switch. There was also one odd issue where the configuration failed to validate upon rebooting the RE. In an active/active MC-LAG both switches are actively forwarding traffic, and in those instances a hardware failure would only bring down one switch instead of the brains of of the entire VC. Bad traffic would certainly bring down both switches, but in our case it was due to partial hardware failures and buggy implementations of the VC. Moreover, while certainly not best practice we can run two separate JUNOS images on each MX to address those ‘bad traffic’ issues and/or fix problematic code. I hope that answers your question!

      • Thank you! Great information. We actually were recommended by Juniper to implement MC-LAG versus VC. Adding the VC license would have cost $36k+, but it sounds like it would have added instability, which obviously isn’t good. So, we have implemented MC-LAG this week and it is working perfectly. After thinking about how we can potentially have a different version of JUNOS on each router, test failures easier, have different routing tables (if necessary), and not have to deal with any VC fail-over issues, it simply made more sense to use MC-LAG. We could even replace the MX104’s with MX240’s with a similar configuration, one at a time, without downtime, whereas we couldn’t do this with a VC config. So far, we’re happy with the decision. Thanks for the reply!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s