Mosh Pit, or Why Terminal Emulators Suck

TELNET had some good things going for it — a local-echo mode and a well-defined network virtual terminal. Then SSH came along and added minor enhancements like confidentiality and authentication, at the cost of losing the local-echo mode and the well-defined terminal semantics.

— from the homepage of Mosh

Terminal emulators are a hard sell. I mean, yeah, we want that functionality, but at the cost of selling our souls? OK, if not our souls, how about our sanity?

Ever had a misbehave program freak out and corrupt your display? Or maybe you’d cat the wrong binary, then suddenly f?Kfjd!ܾ!j?sRc?CH?U^, it looks like your mom picked up the phone during your BBS session, and you have to type ‘reset’ to fix it. It’s not world-ending, but it is irritating.

As amusing as the telnet quote is, it’s right. Local echo is nothing if not immediate. We get feedback. Granted, in the case of telnet, that feedback doesn’t mean a whole lot, but still, it’s nice to be able to type and not see the letters show up several seconds later just because you’re typing into a terminal from 35,000ft using an internet connection that cost you the price of a month of dial-up back in the days when mom could interrupt your session of Legend of the Red Dragon.

Enter Mosh, a new take on terminal emulators.

Mosh is a client/server model. You run it almost exactly like you would ssh:

mosh [email protected]

It actually connects over ssh, then runs mosh-server on the other end. That’s when things get a little strange.

Mosh actually performs its transmissions via encrypted UDP datagrams. In the spirit of the heathen connectionless-UDP-protocols, it also doesn’t try to deliver each and every individual packet that the server spits out – so when you screw up and run find /, you don’t have to wait 10 minutes for your futile ctrl-c keystrokes to get to the server. Plus, and maybe most damning, it allows you to close your laptop, get off the plane, go into the terminal, pay ANOTHER $20, and connect back in immediately, all without running a screen session. Clearly, this is the work of Satan.

In all seriousness, though, this is pretty handy. What they actually do is, instead of treating the terminal like a stream (which is what every command line terminal emulator ever has done), they treat it like a state machine, and the network transmissions are all about bringing the client state into sync with the server state. That’s a cool idea.

It uses AES-128 in OCB mode. Of course, the Layer-4 headers are still unencrypted, but IP Over Unicorn doesn’t exist yet. *ahem*

Check it out. It’s free, available for a lot of systems, and the formal paper will be presented at this year’s USENIX ATC’12 conference during Federated Conferences Week. Cool, right?

So, after I wrote this, I spun up an Ubuntu server in the HP Cloud, and started playing with it. Very interesting.

The first thing I noticed was the confidence degree that the server received what I typed. I noticed this because the mosh shows how confident it is by underlining the words as I type them. For instance, when I type

echo “Four score and seven years ago our fathers brought forth on this continent”

as I type that, the underline “chases” my cursor. When the underline disappears, the client is 100% confident that the server and the client are in the same state.

Interestingly, though, when I paste in that same string, the underline is nonexistent. At least, I don’t see it at all. I’m on a Mac, and I’m using (yeah yeah, I know). I’m not sure if there’s an intrinsic difference in how handles pastes vs keyboard inputs, but the server becomes immediately aware when I paste, but not when I type. Odd.

Also interesting is the output of ps:

10512 ? S 0:00 mosh-server new -s -c 256
10513 pts/1 Ss+ 0:00 \_ -/bin/bash
10614 pts/1 S+ 0:00 \_ screen -T screen-256color ...
10633 ? Ss 0:00 \_ SCREEN -T screen-256color ...
10708 pts/2 Ss 0:00 \_ /bin/bash
10870 pts/2 R+ 0:00 \_ ps axf

All of the screen stuff is normal Ubuntu 10.10 cruft, but you can see that even though it does use the original ssh control channel to start up the connection, it drops it immediately.

I’ll run packet captures if anyone is curious, but if you’re that curious, you should just play with it yourself! Just remember to open UDP ports 60000-61000 to the machine on your firewall (which I almost forgot, natch). Let me know what you think of it in the comments below!

  • eldorel

    I’m not sure how I feel about having to open a thousand incoming ports for a terminal service. Feels a bit like the active/passive ftp days all over again.

    As for the instant response to pasted data, my guess is that the entire string is being sent as a single packet as compared to one every keystroke or so.

  • eldorel:

    I agree about the port block. It may be a side effect of the OCB-mode encryption, but you would think that they could encapsulate session information in the payload.

    I know it’s not production ready, but the new approach to terminal access is really interesting.

  • @jason wellband

    As someone who has experienced all of the above, and who had the experience of using minicom today to access a device via serial port, this sounds interesting. I am also not a big fan of the ports issue – especially if it’s server talking to client. At $dayjob, our servers generally aren’t allowed to talk back to the “client” networks.

  • John McGrath

    I was glancing at the documentation, but did not see the reason for so many UDP ports.

    Anyone have any ideas as to why Mosh requires it?

  • @John McGrath: Remember that each mosh user runs his/her own mosh server. All these UDP ports are presumably to allow multiple users on the same system to use mosh at the same time. 10k simultaneous users should be enough for anybody, right? ;-)

  • John McGrath

    @Mike: Absolutely!!
    OK, who’s got the Port Stretcher (TM)?!?

    I was reading that Mosh uses the SSH port to negiotate the connection and uses the random UDP port for its own services, piping it through the SSH tunnel in the firewall.

    I don’t know if this is correct or not, but it sounds more secure than opening up 10k UDP ports in a firewall.

  • For the record: Mosh will not do local echo unless the ‘epoch’ it is currently in (epochs being delimited by things like line breaks and esc hits) is known to be printing typed characters to the screen. This way, there is no way a password could start to appear, for instance. The video on the Mosh website covers this.

  • Pingback: Mosh, high-latency terminal-emulatie done right! | Towards a delicious future()