25 June 2012

Perlish Curses

So let's write silly little test programs. First thing to do is write a program that reads keystrokes and reports back what the keycode is.


For a terminal program, anything above the standard alphanumeric keys can get exceedingly tricky, since there are many different keyboards and terminal programs available. Keystrokes like F1 and the left arrow key get returned as a sequence of escape characters: ^[OP and ^[[D in my case, but other terminal software may use different sequences. Happily, setting Curses::keypad(1); will tell ncurses to do its best at interpreting these special sequences for me, and calls to Curses::getch() will return a special value for those special keys. Less happily, as one Perl Monk discovered, the function will return characters and ctrl-keys as-is along with these special numeric values, so it can be a bit awkward to differentiate between them. Never mind, at least the rest of the library protects us from some of the C cruft inherited from curses.h - so addstr, waddstr, mvaddstr and mvwaddstr are all thankfully rolled into one.

Since we're attempting some semblance of Unicode support this time around, there's also some really !!Fun!! things to consider: What happens if I copy-paste some unicode characters (e.g. Chinese) into my terminal? And how the hell are we going to handle extended character input methods?

After playing with my keycode-reading program for a little, it seems like it won't be such a terrible problem after all. There's some special key sequences that feed a bunch of bytes into the program, and the byte sequence corresponds to the UTF-8 for the characters I pasted / entered. SCIM (or I guess it's ibus that I'm using nowadays?) seems to fire up just fine and lets me type Chinese into gnome-terminal. Curses doesn't interpret these byte sequences specially, so I'll have to do some parsing and reading of my own, but it's not the end of the world.


As for "printing" characters to the screen - fun terminology that dates back to the early teletype machines where your output was literally printed onto a roll of paper -  it seems fairly easy to do in perl. There's significantly less arcane invocation required to configure things for utf-8. Here's the start of my program:-


use warnings;
use strict;
use utf8;
use Carp;
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");
use Text::CharWidth qw(mbwidth mbswidth mblen);
use Curses;

So far, so good. A little bit of standard Curses initialisation and we can print unicode text, no problem. It seems that I was worrying too much - if you can give Curses some multi-byte UTF-8 character sequence, it'll happily pass it on to your UTF-8 enabled terminal without looking too hard at it.

That's not to say things will be easy. Oh no.


Here's what I've got so far:-

Screenshot of curses test program

The top line is perl warning me about a potential bug. I forget what triggered it exactly, possibly the odd numeric/character duality of what getch() gives us. The nice thing is, however, that I have a way to show them without messing up the rest of the screen.

Ordinarily, messages from warn and die would just get crapped onto the screen wherever curses left the cursor position last, and possibly get overwritten by more output. By installing a few signal handlers and shoving the message into a variable, I can print it out and ensure that future screen refreshes will still show the last error:-

my $topline = "Perl-based keycode reading program! Press q to quit.";
   $SIG{__WARN__} = sub { Curses::addstr(0, 0, $_[0]); $topline = $_[0]; };
   $SIG{__DIE__} = sub { Curses::endwin(); print STDERR "We crumble!\n"; };


I also print out whatever useful information I can find; the termname will be handy later when we attempt to test our programs on terminals other than gnome-terminal (and start tearing our hair out because they all have subtle differences). I've also made it so that I can toggle curses' keypad mode with ctrl-K, letting me see either the raw or interpreted character sequences.

Then we've got some boxes. This is me experimenting with how wide characters affect the coordinate system curses uses; all coordinates are given as (row, column) but the interaction between single-width and full-width characters in screen coordinates is not specified.

As it turns out, the coordinates presume single-width character cells for the screen but still let you render a full-width character - it just spills into the adjacent cell. At least, this is true for my version of Curses.pm, my version of libncursesw.so, and my version of gnome-terminal. With all those levels of indirection, it becomes difficult to control the output with any degree of certainty.

Animation of some garbage text being left behind in curses
One flaw I've found so far is that when you attempt to draw a wide character inbetween two other wide characters, you get garbage hanging around on the screen. Here's a .gif of me moving a little 人 character around the rest of the screen. The garbage characters you see being left behind aren't really 'there' - forcing curses to redraw the entire screen fixes it - but it's still a nuisance. It might not really be curses, or the terminal's, or anyone's fault; asking to draw a character inbetween two others is a pretty odd request and you should expect odd results.

I started to think that I might not be able to use perl's curses after all, but in hindsight it's not so bad. You cannot always control what characters you're going to have to draw in what positions, especially with user input being a factor, but a bug like this can be mitigated:-
  • For a text editor, we're likely going to be drawing each line in one pass anyway, rather than flitting around the screen and repositioning the cursor a lot.
  • Likewise, if I were to make a silly little roguelike game - which I'm considering - I'd have fairly tight control over the cells used to display the game world, and could reasonably expect that I wouldn't accidentally hit this overdrawing problem.
  • In all other cases, such as needing to draw some sort of "dialog box" atop previous content, a quick call to Curses::clearok(1); Curses::refresh(); Curses::clearok(0); should fix things.

Note that I don't want to just set clearok(1) and be done with it. In this mode, curses makes no assumptions about existing screen content and redraws everything from scratch. Even with today's modern computers, if I set that and then hold the arrow keys down I'm going to notice some flicker as the entire screen updates. I don't want that. I certainly don't want it over a potentially slow remote shell!

There was one final problem I ran into but I'm hoping it won't be so important - I noticed that when attempting to use some (allegedly) full-width line drawing characters for the boxes, they were being treated as though they were single-width. So I tried using the Chinese characters you see above, and things worked fine - possibly somewhere along the line some part of the chain got the wrong idea about the widths.

Otherwise, I'm pretty happy with how this turned out and will continue experimenting with perl and curses.

09 June 2012


So, to get back into the programming groove I am going to be remaking my clunky old text editor. Because I'm such a Qt fanboy, the logical choice would be to make a quick Qt GUI around QTextDocument and call it a day. It really would be pretty damn easy.

But! there are many instances in which a terminal-based text editor is the superior choice. SSHing between machines, you frequently don't want to bother doing X forwarding and load a heavy UI over the network. And of course, there's those rare instances when X isn't working and you need to fix it...

This means I'm going to have to get to grips with the NCurses library again. Only this time, my goal is to do things "properly" and handle Unicode and so-called "wide" characters. This is where things get Interesting, since NCurses (or at least, the System V curses library it's based on) predates Unicode and the wide character stuff was bolted on later. The documentation makes little reference to the wide version of the various functions, and there's special things to consider like ensuring the locale is set to UTF-8 and #defineing the _XOPEN_SOURCE_EXTENDED macro above all your source files. It's not going to be easy.

Decisions, Decisions...

I have a couple of options open to me to get started on this. One is, I could get my head around the C API for libncursesw5-dev, and write a nice big C++ wrapper around the bits I need so that I never have to worry about it again. It could even use Qt - not the GUI parts, but Qt Core, which has nice things like QString in it. Damn, I love QString. I love a lot of the Qt library, since it actually covers a lot more than just painting pretty widgets on the screen. Making an interface to NCurses that takes QStrings and the like may seem a little bizarre, but if bizarre keeps me entertained, then let's do that.

The other option is my old friend Perl. I'd love to try out some perl6 - I wish it were ready but it's not quite there yet, we'll leave that for another day. But perl5 is solid as a rock, has Curses bindings, and I could also take the opportunity to learn some Moose at the same time. Moose is a nice high-level object system for perl, which I'll need if I'm making more than the single-file swiss-army-sledgehammer perl scripts that I usually come up with.

The only questions are: Will wrestling with wchar_t and ncurses in C++ drive me crazy? Do the perl Curses bindings even support unicode perl strings? Why did my fresh 12.04 install fail to render Chinese in my C++ test program, when the same program works fine on my 11.04 laptop? And why do I need setlocale, and why is wcwidth not in ncurses, and... which language do I pick?

Lower your expectations

The answer to the language conundrum is simply "try both". Let's aim lower. Let's make silly little test programs that barf a few characters on screen, and maybe listen for some input. Programs to print out what keycode the user just entered are always handy for debugging. And by aiming lower, we reduce that awful overhead of planning out the entire project in one lump.

It is no good to have grand plans if you can't even get started due to your planning process spinning away in the background obsessing over details that aren't implemented yet. Just... just go write something. Something bad. Something poorly thought out. Something which you can get done and then learn from the mistakes in the next iteration.