Come Work With Me! We're Hiring a SysAdmin!

Date January 28, 2014

So, one of my coworkers has left, and we're currently looking for someone to take over as the primary Linux admin on the team. There are currently three of us, including David Blank-Edelman, who you might know as the author of a few Perl and SysAdmin books.

Here's the job posting. Submit a resume and come work with me at Northeastern University's College of Computer and Information Science!

Who Do We Want to Hire?

We're looking for a well rounded Linux administrator to join our team in a full-time position.

This person's primary responsibility will be to administer and evolve our Linux infrastructure and the services it provides using sysadmin/devops best practices. We're currently an Ubuntu/Puppet shop. Our Linux infrastructure runs the standard sets of services you would expect (DNS, mail, web, LDAP, etc) and is back- ended by NetApp and Nimble served storage connected over a largely Cisco network.

We are currently exploring being able to burst some of our workloads up to public cloud providers (AWS, etc). The ideal candidate will have experience with all of the above so they can immediately start contributing to our group.

Similarly, we take on new challenges as needed by the College so having a good generalist "I can learn anything" mindset is key for long-term success in the job. Our group and this position also provides direct support to our user population (in person, email, etc.).

What Do You Have to Know/Be?

Advanced knowledge/experience in Linux infrastructure administration (Ubuntu), general system administration knowledge and experience, general computer security knowledge, programming skills related to system administration automation, excellent communication skills (for user support), architectural thinking, and professional planning all required.

Minimum of 3-5 years of experience with Linux systems administration in medium to large computing infrastructures, preferably with some experience in a research-intensive environment is required. Bachelor’s degree in technology related major is required.

About Us

This position is part of the Northeastern University College of Computer and Information Science's Systems Group, a team of individuals that take great pride in creating and administering the state-of-the-art computing and networking infrastructure crucial to our College's success. Our College's infrastructure supports a 2300+ user population including faculty, staff, and students. Our computing environment is optimized for the teaching and research needs of the College and as a result we run many of our own core services that must integrate well with the resources provided and supported by the central IS organization. It is common for us to be in the vanguard of new technology at Northeastern University.

Why You Would Want To Work Here

It's an academic environment with all of the significant perks that entails (e.g. tuition reimbursement, casual dress and happy work environment, working with students, training/conferences, etc.) For a list of benefits, seehttp://www.northeastern.edu/hrm/benefits/index.html.

We're also really into seeing what we can do to further the field of system administration (do research, present papers at conferences, write books, serve on the USENIX and LOPSA boards, mentor student sysadmins, etc.). We work in a lovely building on Huntington Ave in Boston, MA accessible via two different MBTA lines.

It's a good gig.

Sounds Great, How Do I Apply?

Candidates Only, please apply at https://neu.peopleadmin.com/postings/28381.

If you have questions about the job, please feel free to email linuxadm-hire at ccs.neu.edu.

We are not looking for contact from recruiters at this time (that address will go away after this candidate is hired, so please don't add it to your contact list even if your horoscope says that some day you'll find the perfect candidate for us).

Thanks for taking the time to read this ad and consider working at CCIS.

Help me debug a switching issue?

Date January 24, 2014

I've mentioned before here that I am moving from our legacy Catalyst 6500-based switching infrastructure to a new Nexus 5548-based infrastructure, and I'm in the early stages of the actual migration. Before I can actually migrate things physically from the old switches to the new, I need to make sure that things work as I think they should, and that has been a voyage of discovery for me, let me tell you.

The most recent thing I've had appear that I didn't understand is this: When running a pair of CAT-6 cables from the Cat6500 to a shared FEX (using a vPC), I had massive packet loss, in the ballpark of 50%. Depending on whether all of the links were up, or one of the links was down, the packet loss might be from all sources, or possibly just traffic crossing a subnet boundary. Here's the diagram for how it was set up:

PacketLoss

Po3 in this case, was a trunk which had almost all of the VLANs going over it. The behavior was such that if both links were up, and the server and the laptop were on the same subnet, then there would be no packet loss, but if they were on separate subnets, then the packet loss might be 50-60%. And the packets that were dropped weren't some recognizable pattern, it was "clumpy". Four to five miss, then one or two hit, then a miss, then four or five hit, and so on.

The layer 3 switching between VLANs in those cases was being done by the Catalyst (in fact, all of the L3 switching right now is being done by the Catalyst).

I talked with Cisco, and they suggested that I move the Cat from the FEX and directly attach it to the Nexuses (Nexii?). The reason they gave me was STP related, but I'd already found that out the hard way and enabled BPDU filtering on the Catalyst's port-channel members.

The ports on a Nexus5548 are SFP+, and I was worried that I didn't have any transceivers that would work, but luckily I found some, so I then ran two fiber connections from the Cat, one to each Nexus. As soon as I did that, my packet loss stopped across the board. Here's how it's wired now:

Switching Working

So, the question that I don't know the answer to is, "why did I see such a strange pattern of packet loss before?". What was I doing wrong? I asked for clarification from the ticket holder at Cisco, and here's what I got back:

The main issue is that connecting a 6500 to a FEX is not a supported topology so any unexpected behavior may have no explanation because it may or may not work correctly. Checking the show tech I could not find anything indicating any issues, everything looks correct but that is the issue when having a not supported topology.

I refuse to believe that there is magic here. Yes, it's an unsupported topology, but why is it unsupported - what produces the packet loss here?

The closest thing I've come to an answer is in the vPC Gotchas You Need to Know blog entry by Peter Triple, aka Routing over a Nexus 7000 VPC Peer Link, although I'm not doing any routing adjacencies over the peer link. It's all static, and mostly resembles Diagram 4, but without the OSPF.

So basically, can anyone shed light on why the response I got was what it was? Thanks, I appreciate it!

Nagios-Plugins Brouhaha

Date January 21, 2014

I'm really not a fan of political infighting. It's bad for an organization, and it's bad for the people who rely on the organization. But it happens, because we're people, and people have ideas and egos and goals that are mutually exclusive of each other.

Such as it is with Nagios at the moment. Although there's been some strife for the IP and trademarks surrounding Nagios for a while, the most recent thing is that the plugins site was...reassigned, I suppose you would say.

For a brief timeline, Nagios began as NetSaint, back in 1999. It was renamed Nagios in 2001 according to the WayBack Machine, apparently because of potential trademark issues. The plugins were apparently spun off from the main NetSaint project around this time as well, although the domain creation date is 2008-05-23 for nagios-plugins.org and 2007-01-15 for nagiosplugins.org.

So what's going on now? Well, according to a news entry on the Nagios.org site, the Nagios Plugin Team had some changes:

The Nagios Plugin team is undergoing some changes, including the introduction of a new maintainer. The www.nagios-plugins.org website will remain the official location of the Nagios Plugins, and development of the plugins will continue on github at https://github.com/nagios-plugins.

Changes are being made to the team as the result of unethical behavior of the previous maintainer Holger Weiss. Weiss had repeatedly ignored our requests to make minor changes to the plugins website to reflect their relation to Nagios, rather than unrelated projects and companies. After failing to acknowledge our reasonable requests, we updated the website to reflect the changes we had requested. Rather than contacting us regarding the change, Weiss decided to embark on a vitriolic path of attacking Nagios and spreading mistruths about what had happened.

We believe that this type of unethical behavior is not beneficial for the Nagios community nor is it in keeping with the high standards people have come to rely on from Nagios. Thus, we have decided to find a new maintainer for the plugins. A new maintainer has already stepped forward and will be announced shortly.

We would like to thank all current and past plugin developers for their contributions and welcome anyone new who is interested in contributing to the project moving forward.

So that's what Nagios has to say.

The reason that they specify the official location for the plugins is because the original team is continuing development on the (their?) project, at https://www.monitoring-plugins.org. According to a news post there:

In the past, the domain nagios-plugins.org pointed to a server independently maintained by us, the Nagios Plugins Development Team. Today, the DNS records were modified to point to web space controlled by Nagios Enterprises instead. This change was done without prior notice.

This means the project can no longer use the name "Nagios Plugins". We, the Nagios Plugins Development Team, therefore renamed the Nagios Plugins to Monitoring Plugins.

We're not too happy having to make this move. Renaming the project will lead to some confusion, and to quite a bit of work for others and for ourselves. We would've preferred to save everyone this trouble.

However, we do like how the new name indicates that our plugins are also used with various other monitoring applications these days. While the Nagios folks created the original implementation of the core plugins bundle, an independent team has taken over development more than a decade ago, and the product is intended to be useful for all users, including, but not limited to, the customers of Nagios Enterprises.

It'll probably take us a few days to sort out various issues caused by the new project name, but we're confident that we can resume our development work towards the next stable releases very soon.

We'd like to take the chance to thank you, our community, for your countless contributions, which made the plugins what they are today. You guys are awesome. We're looking forward to the next chapter of Monitoring Plugins development, and we hope you are, too!

Throwing gasoline onto the fire is Michael Friedrich, Lead Developer of Icinga, a Nagios Fork, who submitted a RedHat bug claiming:

The nagios-plugins.org website has been compromised, and the project team therefore moved to https://www.monitoring-plugins.org including the tarball releases. They also renamed their project from 'nagios-plugins' to 'monitoring-plugins' and it's most likely that the tarball release names will be changed to the new name in future releases.

https://www.monitoring-plugins.org/archive/help/2014-January/006503.html

Additional info:

While I wouldn't suggest to rename the package unless there's an immediate requirement, the source and URL location should be updated in order to use official releases provided by the Monitoring Plugins Development Team. Further, users should use official online references and stay safe.

then later in the thread

Actually the old Nagios Plugins Development Team was required to rename their project due to the fact that Nagios Enterprises hijacked the DNS and website and kicked them out.

Whilst the original memebers are now known as 'Monitoring Plugins Development Team', the newly formed 'Nagios Core Plugin Development Team' is actually providing a fork of their work under the old name.

For any clarifications required, you should follow the discussion here: https://www.monitoring-plugins.org/archive/devel/2014-January/009417.html

Imho the official former nagios plugins are now provided by the same developers under a new URL and that should be reflected for any future updates.

Though, I'm leaving that to you who to trust here. I'm just a community member appreciating the work done by the original Nagios Plugins Development team, accepting their origin and the fact that there's no censorship of Icinga, Shinken or Naemon (as Nagios forks) needed.

Clearly, the nagios-plugins.org site was not compromised - neither Nagios nor the Monitoring Plugins team is claiming that. I'll kindly assume that Mr Freidrich was mistaken when he posted the original bug. Hanlon's Razor and all.

So what's my opinion? Glad you asked.

The Nagios news post states that there had been continued requests to change small content. When that didn't happen, they pulled the rug out from under half a dozen community contributors who have collectively done a great deal of good for the project. That's not the way you show appreciation where I'm from, but hey, I don't know the particulars. I only know what I see on the web, just like you.

What does this all mean for us? Well, if you run anything that uses Nagios plugins, it means you've got a choice - go with the official package, or go with the community-maintained version. Which will be better? Which will the distros use? Probably the official plugins, though I expect the more rapidly-moving distros to offer a package from the monitoring-plugin team as soon as there's any noticeable difference.

But on the bigger picture scale, Nagios's previously solid position as the principle Open Source monitoring solution isn't as unassailable as it seems they think it is. Cutting off a volunteer team that produces a big part of your product isn't really a good way to advertise stability and unification. There are a lot of options for monitoring today, and a lot more of them are way more viable than was the case 4 or 5 years ago. Instead of this political crap that does nothing to advance the project, I think Nagios should focus on improving the core product. But what do I know? I'm just some blogger on the internet.

Strangely, as I write this, the Nagios Exchange is down. I don't know what that means.

CentOS joins forces with RedHat

Date January 8, 2014

This might sound a little bit gossipy, but I think it's important all the same, because it's going to have technical ramifications.

As announced on the CentOS list, CentOS and RedHat have joined forces. What that means is that CentOS Board now has several RedHat employees on it, and RedHat now has several CentOS developers on staff, including Karanbir Singh:

There are a lot of people scratching their heads on this, because there's a widely perceived notion that CentOS is a direct competitor to RHEL, but that's actually not the case. While I'm sure RedHat would love to make paying customers out of every one of the CentOS users out there, their REAL competitor is Oracle.

Because of the licensing of the software in question, RedHat needs to make all of the changes that it makes public. This includes all of the original releases, plus things like patches and updates.

CentOS Linux became a "thing" by taking the software that RedHat released, stripping the RedHat branding, applying their own, and releasing it for free as CentOS.

Oracle Linux became a "thing" by taking the software that RedHat released, stripping the RedHat branding, applying their own, and charging a buttload for support on it. Which means it's basically doing exactly what RedHat does, and pointing at exactly the same market.

A while back, RedHat stopped releasing discrete patches and updates, opting instead to issue bulk updates which would still follow the letter of the law in the software license, but be much harder for third parties like Oracle to pick apart when they needed to backport features from those patches. Unfortunately for the community of CentOS users, this hit them too, and the effect was dramatic.

Now, however, I would have to imagine that this puts CentOS in a bit of a better position. I don't know the legal ramifications (if you do, then comment and share) but they may be able to take advantage of privileged information (and not need to use the public blobs of patches, but to have access to the developers who are issuing discrete changes internally).

So, did RedHat do this out of the goodness of their hearts? I'm going to guess probably not. Not that I'm sure they're not all great (and the RedHat employees that I know of are, in fact, all really nice people), but I imagine that this is a way to poke Oracle in the eye. Every potential Oracle customer who chooses CentOS instead is a win for RedHat - The Enemy of My Enemy, and all that.

So what changes is CentOS going to get to encourage that kind of thing? Time will tell, but the new CentOS website suggests things of interest:

Over the coming year, the CentOS Project will expand its mission to establish CentOS Linux as a leading community platform for emerging open source technologies coming from other projects such as OpenStack. These technologies will be at the center of multiple variations of CentOS, as individual downloads or accessed from a custom installer. Read more about the variants and Special Interest Groups that produce them.

So, exciting times! I'd previously been a bit bearish on CentOS, just because they had made a commitment to match RHEL as closely as possible, and that's hard to do when someone's shooting at the guy behind you like RedHat was with Oracle. At this point, I'd feel much, much better about an infrastructure I had that ran CentOS.

What does this mean for Scientific Linux, the other widely-used RHEL clone? I don't imagine much immediately, because they're not going to be the beneficiaries of the increased RHEL/CentOS communication, but over time, they may find that people make it difficult to justify choosing their release over CentOS since RedHat clearly has given their blessing to the latter.

What do you think? Comment below!

Affecting Change or Knowledge Workers as a Political Class?

Date December 31, 2013

In my heart of hearts, I'm apolitical. In an ideal world, I just wouldn't care about the political dealings on a national scale, and I used to look at shows like Meet the Press as the equivalent of E!TV - Sunday morning gossip shows talking about things that I don't care about. It's with a sense of irony that I realize that I treat politics in rather the same way that bad companies treat IT - as long as it's working well enough that it doesn't cause me major problems, I can pretend it's not there.

But lately, politics has stopped being quite so "set it and forget it" for me. Not only are there the normal ever-present threats to my online world (like net neutrality), but in the past year or so, it has become apparent to everyone that conspiracy theorists were onto the right idea despite themselves, and that the US government is, in fact, into everything, as deeply and often as possible. (As an aside, if you haven't heard about TAO yet, make sure to check it out and then re-consider your thoughts on the possibility of BadBIOS).

Anyway, political crap is leaking over into my world, and it's annoying.

The part that makes it interesting to me is that the biggest revelation in all of this was from Ed Snowden, a system administrator. It's a little bit like Hobbits in Lord of the Rings...no one knows we exist for years until suddenly the fate of the world hinges on the activities of one of us.

Other people are recognizing that we work in an important intersection of knowledge and responsibility, too. I came across a presentation from this year's Chaos Communication Congress in Germany. It was a talk by Jacob Appelbaum and Julian Assange, who were introduced by Sarah Harrison. The name of the talk was SysAdmins of the World Unite.

Most of the talk is on YouTube, but there are some cutouts. The audio recording, in its entirety (at least as far as I can tell) is available on SoundCloud. To save you the time, I've transcribed some of the more interesting parts:

Sarah Harrison: Why are sysadmins playing an important role in this fight for freedom of information?
Jacob Appelbaum: All of us have agency, but some of us have more agency than others, in the sense that you have access to systems that give you access to information that help to found knowledge that you have in your own head. So someone like Manning or someone like Snowden, who has access to these documents in the course of their work, they will simply have a better understanding of what is actually happening. They have access to the primary source documents as part of their job.

This, I think, fundamentally, is a really critical, I would say a formative thing. When you start to read these original source documents, you start to understand the way that organizations actually think internally. This is one of the things that Julian Assange has said quite a lot. It's that when you read the internal documents of an organization, that's how they really think about a thing. This is different than a press release.

And people who have grown up on the internet, and they are essentially natives on the internet, and that's all of us, I think, for the most part. It's definitely me. That essentially forms a way of thinking about organizations where the official thing that they say is not interesting. You know that there is an agenda behind that, and you don't necessarily know what that true agenda is.

And so people who grow up in this and see these documents, they realize the agency that they have. They understand it, they see that power, and they want to do something about it, in some cases. Some people do it in small starts and fits. So there are lots of sources for lots of newspapers that are inside of defense organizations or really really large companies, and they share this information.

In the case of Chelsea Manning and in the case of Snowden, they went big. And I presume that this is because of the scale of the wrongdoing they saw, in addition to the amount of agency that was provided by their access and by their understanding of the actual information that they were able to have in their possession.

Sarah Harrison: And do you think that its something to do with, being technical, they have a potential ability to find a way to do this safer than other people, perhaps? Or…

Jabob Appelbaum:
I mean, It's clearly the case that this helps. There's no question that understanding how to use those computer systems and being able to navigate them, that that is going to be a helpful skill. But i what it really is, is that these these are people who grew up in an era, and I myself am one of these people, where we grew up in an era where we were overloaded by information but we still were able to absorb a great deal of it. And we really are constantly going through this.

And if we look to the past, we see that it's not just technical people, it's actually people who have an analytical mind. So for example, Daniel Ellsberg, who is famous for the Ellsberg paradox. He was of course a very seriously embedded person in the US military, he was in the RAND corporation, he worked with McNamara, and during the Vietnam War, he had access to huge amount of information.

And it was the ability to analyze this information and to understand, in this case how the US government during the Vietnam War, was lying to the entire world. And it was the magnitude of those lies combined with the ability to prove that they were lies, that I believe, combined with this analytical skill, it was clear what the action might be, but it wasn't clear what the outcome would be.

And with Ellsberg, the outcome was a very positive one. In fact, it's the most positive outcome for any whistleblower so far that I know of in the history of the United States, and maybe even in the world.

What we see now with Snowden and what we've now seen with Chelsea Manning, is unfortunately a very different outcome, at least for Manning. So, this is also a hugely important point, which is that Ellsberg did this in the context of resistance against the Vietnam War. And when Ellsberg did this, there were huge support networks. There were gigantic things that split across all political spectrums of society.

And so it is the analytical framework that we find ourselves with still, but additionally with the internet. And so, every single person here, who works as a sysadmin, could you raise your hand?

Right.

You represent, and I'm sorry to steal Julian's thunder, but he was using Skype, and um, well… [applause] … we all know Skype has interception and man in the middle problems, so I'm going to take advantage of that fact. You see, it's not just the NSA.

Everyone that raised their hand, you should raise your hand again. If you work at a company where you think they might be involved in something that is a little bit scary, keep your hand up.

Right.

So here's the deal. Everybody else in the room lacks the information that you probably have access to. And if you were to make a moral judgement; if you were to make an ethical consideration, about these things, it would be the case that, as a political class, you would be able to inform all of the other political classes in this room, all of the other people in this room, in a way that only you have the agency to do. And those who benefit from you never doing that are the other people that have that.

Those people are also members of other classes as well, and so the question is, if you were to unite as a political class, and we are to unite with you in that political class, we can see that there is a contextual way to view this through, uh, an historical lens, essentially, Which is to say that when the industrialized workers of the world decided that race and gender were not lines that we should split on, but instead we should look at workers and owners. Then we started to see real change in the way workers were treated and in the way that the world itself was organizing labor.

And this is a hugely important change during the industrial revolution. And we are going through a very similar time now with regard to information politics and with regard to the value of information in our information age.

I'm not sure I'm a big fan of framing our role as that of a political class. That being said, there really are some distinct parallels between today's information worker and yesterday's factory workers, and that I suppose you could look at this through a Marxist lens and see us as a modern day proletariat, but I'm vaguely uninterested in framing the discussion in that light. Someone else can hold that banner if that's what they care about.

I'm much more concerned with is how we make decisions about what we should do with the responsibilities we have, and the knowledge we have, and the information that we have access to. The idea that we have a unique responsibility to society at large because of the privileged role we play in the modern workforce is an intriguing one. It's not over-romanticizing to see that people in our positions, in the right circumstances, can make huge impacts - Edward Snowden is prime evidence of that. But every bit of leverage that we have which could be used for good can also be used for ill. And it's not just black and white, either.

All of us only know what we know, and sometimes, we know what we don't know. We never know what we don't know we don't know - the unknown unknowns, in other words.

If you are involved with military planning, or if you just play RTS games, you're familiar with the concept of the fog of war, a lack of situational awareness because your information is incomplete. And so, decisions that you make because you believe them to be right may wind up being hamartia because your knowledge was incomplete.

That being said, every decision that we make, in our jobs and in our lives, can only be made with the cognizant knowledge that our information is lacking and incomplete. When it comes to making decisions that can dramatically affect others, though, that places an extra onus of correctness. Not only are we affecting ourselves, we're affecting others, and in potentially unknown ways.

If someone were in a position to leak information that was extraordinarily damaging to an organization that was doing a perceived ill, is the wrongdoing enough justification by itself to publicly release that information? How much wrongdoing is enough? To employ some reductio ad absurdum, how much intentional littering does a company need to do before it turns into industrial waste?

And all of this presupposes that we're essentially incidental non-combatants in this war on the public. If you answered the wrong job posting and found yourself as a sysadmin for Hank Scorpio, that's one thing, but what if you suspected that a company were guilty of wrongdoing? If you go "undercover", so to speak, as a sysadmin in that organization, in order to uncover wrongdoing, how is that inherently different than vigilantism?

And yet, that's exactly what Julian Assange was promoting in the talk. Here's the transcription:

The system that exists globally now is created by the interconnection of many individual systems, and we are all, or many of us, are part of administering that system, and have extraordinary power, in a way that is really an order of magnitude different to the power that industrial workers had back in the 20th century. And we can see that in the cases of the famous leaks, the wikileaks was done, or the recent Edward Snowden revelations, that it is possible now for even a single system administrator to have a very significant change to, or rather, apply a rather significant constraint, a constructive constraint, to the behavior of these organizations.

Not merely wrecking or disabling them, not merely going out on strikes to change policy, but rather shifting information from an information apartheid system which we are developing, from those with extraordinary power and extraordinary information into the knowledge commons, where it can be used to, not only as a disciplining force, but it can be used to construct and understand the new world that we're entering into.

Now, Hayden, the former director of the CIA and NSA, is terrified of this. In Cypherpunks, we called for this directly last year, but to give you an interesting quote from Hayden, possibly following up on those words of mine and others:

"We need to recruit from Snowden's generation", says Hayden. "We need to recruit from this group because they have the skills that we require. So the challenge is how to recruit this talent while also protecting ourselves from the small fraction of the population that has this romantic attachment to absolute transparency at all costs". And that's us, right?

So, what we need to do is, spread that message and go into all of those organizations. In fact, deal with them. I'm not saying don't join the CIA. No. Go and join the CIA. Go in there. Go in the ballpark and get the ball and bring it out. With the understanding, with the paranoia, that all those organizations will be infiltrated by this generation, by an ideology that is spread across the internet. And every young person is educated on the internet.

There will be no person that has not been exposed to this ideology of transparency and understanding of wanting to keep the internet which we were born into, free. This is the last free generation. The coming together of systems of government, the new information apartheid, the linking together, is such that none of us will be able to escape it in just a decade. Our identities will be coupled to it, the information sharing such that none of us will be able to escape it.

We are all becoming part of the state, whether we like it or not, so our only hope is to determine what sort of state it is that we are going to become part of, and we can do that by looking, and being inspired by some of the actions that produce human rights and free education, and so on. Why people recognizing that they were part of the state, recognizing that their own power, and taking concrete and robust action to make sure they lived in the sort of society that they wanted to, and not in a hellhole dystopia.

While I can't agree with a lot of what he says, there are some good points. The continuing digitization of our live is inevitable - there's no going back from that. And I believe that we absolutely do need to work to determine the future of the world that we live in. It's this gradual realization of mine - that it's finally dawning on me, I'm just not able to blindly ignore the politics of the situation with my head in the sand. Things do seem to be flying off the rails.

So here's where I put it to you, my readers. What do you make of this? Do you see any credence to the view that information workers are, in essence, their own political class (or do you see value in that)? I don't want to live in a "hellhole dystopia" either, but I'm not yet convinced that becoming an active combatant in the battle for "knowledge commons" is the right way to go about it. Do you think it's possible to affect change inside the system, as I hope it is?

Please, comment below and share your thoughts.

Cisco Switch-Profile Issues

Date December 27, 2013

So, whenever you've got a Fabric Extender attached to more than one switch, you need to configure the shared ports in both places. To make life "easier", Cisco has the concept of "switch profiles", where the two switches will have what amounts to a template, that gets applied. That way you don't have to worry about doing the same thing in each place. Theoretically, it's an awesome idea. In practice, though, I'm not even running the things in production and it's causing me all kinds of problems.

Here's an example:


core01# config sync
Enter configuration commands, one per line. End with CNTL/Z.
core01(config-sync)# switch-profile core-shared
Switch-Profile started, Profile ID is 1
core01(config-sync-sp)# int po1
core01(config-sync-sp-if)# switchport trunk allowed vlan 1-314,1050-1065
Error: This PC has already been configured outside switch-profile. Please configure further commands outside switch-profile or import the port-channel within switch-profile

So, lets check on it.

core01# sh run int po1

!Command: show running-config interface port-channel1
!Time: Fri Dec 27 19:55:43 2013

version 6.0(2)N1(2)

interface port-channel1
switchport mode trunk
switchport trunk allowed vlan 1050-1065
spanning-tree port type network
vpc peer-link

Alright. And in the switch-profile?


core01# sh run switch-profile | section .*interface.port-channel1$
interface port-channel1
switchport mode trunk
switchport trunk allowed vlan 1050-1065
vpc peer-link

Alright, yes, As you can see in the running-config, the 'spanning-tree port type network' line exists, but not in the switch-profile. So, now we have to get it out of there. The obvious answer seems to be to go into 'configure terminal' mode (now just 'configure' in NX-OS) and take it out. Lets do that:


core01(config-sync-sp-if)# buffer-delete all
core01(config-sync-sp)# config t
core01(config)# int po1
core01(config-if)# switchport trunk allowed vlan 1-314,1050-1065
Error: Command is not mutually exclusive
core01(config-if)#

So we can't update the switch-profile because it's got configuration outside of the profile, and we can't update it outside of the profile, because it's managed by the profile? Ooooookay. Maybe we can import it.


core01# config sync
Enter configuration commands, one per line. End with CNTL/Z.
core01(config-sync)# switch-profile core-shared
Switch-Profile started, Profile ID is 1
core01(config-sync-sp)# import interface po1
core01(config-sync-sp-import)# verify
Failed: Verify Failed

Well, huh.


core01(config-sync-sp-import)# sh switch-profile status

switch-profile : core-shared
----------------------------------------------------------

Start-time: 840920 usecs after Fri Dec 27 20:06:57 2013
End-time: 59573 usecs after Fri Dec 27 20:06:59 2013

Profile-Revision: 86
Session-type: Import-Verify
Session-subtype: -
Peer-triggered: No
Profile-status: Verify Failed

Local information:
----------------
Status: Verify Success
Error(s):

Peer information:
----------------
IP-address: 129.10.108.61
Sync-status: In sync
Status: Verify Failure
Error(s):
Following commands failed mutual-exclusion checks:
interface port-channel1
spanning-tree port type network

Alright, so the local switch successfully verified, but the remote switch failed. Lets check over there.


core02# sh run int po1

!Command: show running-config interface port-channel1
!Time: Fri Dec 27 20:09:10 2013

version 6.0(2)N1(2)

interface port-channel1
switchport mode trunk
switchport trunk allowed vlan 1050-1065
spanning-tree port type network
vpc peer-link

core02# sh run switch-profile | section .*interface.port-channel1$
interface port-channel1
switchport mode trunk
switchport trunk allowed vlan 1050-1065
vpc peer-link

Yeah, that's pretty much exactly what core01 said, too.

Things like this are why I want to drink.

sigh. I'm going to be calling the TAC. Again, for the second time this week.

UPDATE

Alright, after talking with Cisco's Technical Assistance Center (TAC), here's what I've found out.

  • The answer to almost all of the inconsistency problems I've seen regarding switch profiles is to break the peer-sync (using no sync-peers destination <destination>), make each of the changes locally so that each switch is locally coherent between its local config and the switch-profile config, make sure that the two switches' switch-profiles are 100% completely and absolutely identical, and then set up the peer-sync again just as it was before.
  • Although the guy on the phone said that there's no document stating this, they have found that most of the problems related to switch-profile syncing is on non-FEX interfaces. That is, interfaces that are 'local' to the Switch itself, or a port channel. He recommended that I only use switch-profiles for those interfaces that are actually shared between the switches because they live on a dual-homed FEX.

There's no way to remove an interface from a switch-profile, either. The only solution is to blow away the profile and re-create. As much...uhh...fun, as that sounds, it's not something that I'd like to be doing a lot of. Anyway, hopefully this can help someone else out. Good luck, and let me know if you come across anything weird with switch-profiles, too. I'm interested to hear how many people are using them versus manually configuring each Nexus for each change you make. I'm SUPER interested if you use another tool to automate shared port configs on multiple switches. Comment below!