September 19, 2013
There are times I feel like I'm still a fish out of water here. I'm getting better, but the occasional event occurs that throws me for a loop because the error patterns here don't always match the error patterns that I'm used to in small business.
Today, for instance, we had a student come in and report problems with connecting to a Windows-based SFTP server. In my year of experience at Northeastern, I'd never had to deal with this particular service. It apparently "just worked", so when the Windows administrator diagnosed it as an error with the student's username, and resolved it as a one-off, I just filed it away in my mind.
Half an hour later, another student walked in, and she couldn't connect to the SFTP server either.
What is your natural instinct, at this point?
Mine too. One-off be damned, something is almost certainly wrong with the SFTP server. I haven't dealt with a single problem in a year, then two in a half hour? This trips every sensor I've got.
As it turns out, it was her password. Because of some esoteric problems in our infrastructure (not problems so much as design decisions two decades ago, but fairly integral to the infrastructure at this point), there are certain passwords that don't sync correctly from NIS to LDAP to Active Directory (don't ask). The SFTP server in question ran on Windows, and testing bore out that the user was able to authenticate to Linux systems, but not Windows. Changing her password to something supported by the authentication scheme solved the problem.
The underlying cause of the pattern, where two people independently reported unrelated errors in a properly-functioning service that hadn't shown problems for a year turned out to be innocuous. One of the classes, for apparently the first time in a year, required the use of the SFTP server. When the teacher instructed people to connect, a subset of them had account problems which showed themselves at that time, due to the combination of a service that isn't in use much by anyone and the fact that those users didn't authenticate against our active directory controllers.
I found this interesting from an academic sense, but frustrating from a troubleshooting sense, because this so clearly triggered my "service is down" instinct.
Do you have that same kind of instinct where alarm bells ring?