18 November 2013

Naming Names, Part 2

Remember how, in the last post, I invited you all to join me "next week" for part 2? What a funny joke that was! In the meantime I've been busy being sick, getting sucked back into WoW thanks to a "gift" from a "friend", and have been dealing with hilariously bad hardware failures. But I'm finally back to finish what I started, because I owe you all that much.

In Part 1, we converted a crufty old shell script of mine to its Perl 5 equivalent, and then built it up a little to be more smart about how it goes about renaming files.

In Part two, we go nuts with feature-creep. Follow me along for the ride.


30 October 2013

The iGoogles, they do nothing!

So the threat of iGoogle shutting down has been hanging over our heads for most of the year. One of the larger programming projects I wanted to do in this time would address that, and come up with something really nice that people could use as a replacement. Sadly, I haven't been working nearly as fast as I thought I could, and that project is still on the horizon. But November is almost upon us and iGoogle is going away now. What's to be done?

Why, it's time to throw something together quickly in Perl, of course.

Grabbing the Data

The support page linked above has details on how to download a backup of your iGoogle data to your computer. The iGoogle-settings.xml file you end up with looks a bit like this:-
<?xml version="1.0" encoding="utf-8"?>
<GadgetTabML version="1.0" xmlns:iGoogle="http://www.google.com/ig" xmlns="http://schemas.google.com/GadgetTabML/2008">
  <SkinPreferences zipcode="Sydney, Australia" country="au" language="en" />
  <Tab title="Webcomics" skinUrl="http://www.elitedesigns.us/goothemes/MistedForest/doc_theme.xml">
    <Layout iGoogle:spec="THREE_COL_LAYOUT_1" />
    <Section>
      <Module type="RSS" xmlns="http://www.google.com/ig">
        <UserPref name="numItems" value="6" />
        <ModulePrefs xmlUrl="http://feed43.com/order_of_the_stick_comic.xml" />
      </Module>
      <Module type="RSS" xmlns="http://www.google.com/ig">
        <UserPref name="numItems" value="6" />
        <ModulePrefs xmlUrl="http://www.girlgeniusonline.com/ggmain.rss" />
      </Module>
      <Module type="RSS" xmlns="http://www.google.com/ig">
        <UserPref name="numItems" value="8" />
        <ModulePrefs xmlUrl="http://drmcninja.com/feed/" />
      </Module>
While there are many other Module types to consider, I'm only interested in extracting the urls of the RSS feeds that I follow, and constructing some html page with the links from of those feeds. It'll be slow to fetch each individual feed, and the script will need to be run manually, but it should serve as a decent replacement.

The eXtra Messy Language

The first thing you'll notice is that the iGoogle backup is in XML. XML is great in some ways but it is dishearteningly verbose and cryptic a lot of the time. We could look up the schema Google has made and create a proper model to represent the entire dataset in our program, or we could just rip out the bits we want with XML::Twig. XML::Twig is a nice Perl 5 library that makes parsing XML relatively painless.

It looks like we'd mostly be interested in identifying the <Tab title="Blah"> elements, and then getting the <ModulePrefs xmlUrl="http://blah"> attribute out for each <Module> element (if it posesses the "RSS" type) within the tab. I could also pull out the numItems value, but for the purposes of a quick hack to replace iGoogle, I could happily see myself hard-coding the same number of items for all the feeds.

XML::Twig supports two styles of operation. The first is loading the entire document into memory in one go and you use traditional operations on elements to traverse the XML tree however you want. The second allows you to pre-define which parts of the tree interest you and provide XML::Twig with callbacks to your own code that get run on those portions. This second mode is great for large XML documents where loading the entire thing isn't feasible. Our iGoogle settings file isn't exactly large, but the callback mode is still a nice way to deal with things. Let's try it out here.

The first cut of the script looks like this:-
#!/usr/bin/perl

use warnings;
use strict;
use utf8;
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");
use XML::Twig;    # Package Name: libxml-twig-perl
use Data::Dumper;
use FindBin qw/$RealBin/;

my $igoogle_settings_filename = "$RealBin/iGoogle-settings.xml";
print STDERR "Opening iGoogle settings file: $igoogle_settings_filename\n";

my $tabs = {}; # Tab Name => [ urls ];
# Called by XML::Twig on each ModulePrefs part.
sub collect_feed_urls
{
   my ($twig, $element) = @_;
   # Extract the URL from this element, and the tab name from its ancestor.
   my $url = $element->att('xmlUrl');
   my $tab_name = $element->parent('Tab')->att('title');
   # Put feed urls in an array ref, grouped by tab name.
   $tabs->{$tab_name} = [] unless defined $tabs->{$tab_name};
   push @{$tabs->{$tab_name}}, $url;
}

# Create our Twig object and tell it what bits we're interested in.
my $twig = XML::Twig->new(
      twig_handlers => {
         'Tab//Module[@type="RSS"]/ModulePrefs' => \&collect_feed_urls,
      }
   );
$twig->parsefile($igoogle_settings_filename); # XML::Twig will just die if there's a problem. Fine by me.

# Print everything out to check it worked so far.
print STDERR Dumper($tabs);
There's not much to say, really - the preamble just loads the modules we want to use, and determines where the XML file is (in the same directory as the script). Then we create a variable called $tabs, a hash reference to a bunch of array references. This is where we'll store the feed URLs. The subroutine collect_feed_urls is the callback we give to XML::Twig, and it closes over that $tabs variable. This means that whenever it runs, the $tabs that it references is the one we declared in the main script, and it can shove all the data in there. I'm essentially using it as a global variable, except it's way more elegant than that really, honest.

One keyword you might be unfamiliar with in the subroutine is unless - a beautiful part of Perl which is effectively just if not, but without necessitating parentheses around the whole expression. And of course, to be Perlish, I'm using the condition at the end of the statement, since it's only one statement I want to affect. If we were writing in a more C-ish style, that one line might turn into:-
if ( ! defined $tabs->{$tab_name})
{
    $tabs->{$tab_name} = [];
}

Moving on. Creating our Twig object with XML::Twig->new() allows us to set some global options for the module, as well as defining the handlers that should be triggered on certain parts of the document. The right hand side of the handler is just a subroutine reference, and could be written in-line if you've got a series of small transformations you wish to do. In our case, it's cleaner to refer to the subroutine we defined earlier using \&.

The left hand side specifies what XML::Twig should be looking for to run our handler. It can be as simple as the XML element name, or (in this case) an XPath-like expression. The expression Tab//Module[@type="RSS"]/ModulePrefs means we are looking for a <ModulePrefs> element, and that it should be looking for one contained in a <Module> element with a 'type' attribute set to "RSS", and there can be any number of other enclosing elements as long as there's a <Tab> as an ancestor somewhere.

Finally, $twig->parsefile() tells it to go, and finally I use Data::Dumper to check that what we've got so far worked.

Really Simple except when it's not

We've got two main steps to deal with next. Firstly, we need to take each feed URL and fetch that from the Interwebs. Secondly, the document we get back from that URL should be in RSS (or "Really Simple Syndication") format. We'll want to parse out the individual items from that, and extract the title and link.

Grabbing a file from the internet over HTTP is easy enough in Perl, there's a plethora of libraries to do it for us. LWP::Simple should be more than sufficient - it exports a get(url) subroutine that returns the document text or undef on failure.

For the RSS parsing, we have a few options as well. RSS uses XML, so we could just use XML::Twig on it. However, while the iGoogle-settings.xml file's structure is known to us and won't be changing anytime soon, the kinds of RSS we might get from random webservers could be weird dialects of it or malformed or just completely bizarre. There's no way to know ahead of time, so it's better to use a library specifically made to deal with RSS content. XML::RSS is one such module. It can produce RSS files as well as consume them, but for now we're only interested in using its parse() method and then going through all the items.

The script is starting to get a little big, so it's time to think about putting more things into subroutines. I make a 'main' sub, and a 'get_feed' sub to do the job of downloading and parsing the RSS:-
sub get_feeds
{
   my ($tabs) = @_;
   # Fetch the feeds and build a hashref of feed url => XML:RSS object (or undef)
   my $rsses = {};
   foreach my $url_list (values %$tabs) {
      foreach my $url (@$url_list) {
         $rsses->{$url} = get_feed($url);
      }
   }
   return $rsses;
}

sub get_feed
{
   my ($url) = @_;
   # Use LWP::Simple to fetch the page for us - returns 'undef' on any failure.
   print STDERR "Fetching $url ...";
   my $feed_text = get($url);
   if ( ! defined $feed_text) {
      print STDERR " Failed!\n";
      return;
   }

   # Obtained a bit of text over the interwebs, but is it valid RSS?
   my $rss = XML::RSS->new;
   eval {
      $rss->parse($feed_text);
   };
   if ($@) {
      print STDERR " Bad RSS feed!\n";
      return;
   }

   print STDERR " OK\n";
   return $rss;
}
Okay, we've got the feeds downloaded and made some sense out of them. What's next? I could print the RSS items to the terminal to check, but I have plenty of confidence that XML::RSS has done its thing successfully. And there's only a few days left before iGoogle shuts down, so let's hurry things along, shall we?

Spitting it all out

My plan is to have the script print out an HTML document. This means we'll need a bit of HTML boilerplate, wrapped around the main body, which will be divided into sections based on each old iGoogle tab, which in turn will feature a number of short lists of the top N items from each RSS feed. To keep things from getting too messy, this suggests a subroutine for each visual level in the document hierarchy. The output will be pretty plain to begin with, but we can always dress it up with CSS later.

There are a few tricks that are handy when you want to embed something strange like HTML strings inside Perl source. The most obvious trick is just to get a module to write the HTML tags for you, but I don't want to find one of those just yet. One simple way to put a large block of text in is using "here documents", by choosing some special string like "EOF" to terminate a multi-line bit of text:-
sub create_html
{
   my ($tabs, $rsses) = @_;
   my $output = "";
   $output .= <<EOF;
      <html>
         <head>
            <title>maiGoogle</title>
         </head>
         <body>
EOF

   $output .= html_body($tabs, $rsses);

   $output .= <<EOF;
         </body>
      </html>
EOF
   return $output;
}
The other alternative is to use the magic quote operators Perl gives us. Normally, you'd use double quotes (") to delimit a string literal that you want variables to be interpolated into, but that can be problematic when your HTML also has quote marks in it. Escaping everything with backslashes gets ugly fast. So instead, Perl lets you choose your quote character with qq :-
sub html_rss_feed
{
   my ($url, $rss) = @_;
   my $output = "";

   # Not all the feeds might have been fetched OK, or some might not have parsed properly.
   if ( ! defined $rss) {
      # Ideally, we'd remember why, but for now just say sorry.
      $output .= qq!<div class="feed">\n!;
      $output .= qq!  <span class="title"><a href="$url">$url</a></span>\n!;
      $output .= qq!  <p class="fail">This feed failed to load, sorry.</p>\n!;
      $output .= qq!</div>\n!;
      return $output;
   }
Here, I've chosen the bang (!) to indicate the start and end of my string, but you can use pretty much anything.

Anyway, now we've got some output, does it work?


It does! Fantastic! There's just one last issue to clear up...

Enforcing Sanity

It worked right up until the point where a feed I was getting items from gave me a title with more HTML in it. Naturally, that confused the hell out of the browser and it decided everything past that item was commented out thanks to the feed's malformed HTML. What we absolutely must do when using unknown data from the web is sanity-check it a little - in this case, I'm happy to just rip out anything that looks HTML-like and leave the titles as plain as possible. We could use some regexps to do this but again you never know what crazy things are possible with a spec as large as HTML - let's use a Perl module to do it for us. HTML::Strip looks like a good candidate.

We can write a small sub to strip out any of the bad HTML:-
sub sanify
{
   my ($str) = @_;
   my $hs = HTML::Strip->new();
   $str = $hs->parse($str);
   $hs->eof();
   return $str;
}
We do this rather than make one instance of HTML::Strip and use its ->parse() method each time, because we want to give each fragment of RSS a clean slate.
Let's give this a shot, what could possibly go wrong?

Things went wrong

The good news is that HTML::Strip fixed the broken html in the title of that one feed. The bad news is that now I'm looking at things more carefully, there's a few remaining problems. Firstly, there's an encoding issue somewhere - a few characters are clearly encoded wrong. It happens, when you're sourcing data from all over the place and smooshing them together into one HTML page. iGoogle did a pretty good job of that.

Secondly, some feeds just aren't ever getting loaded - some seem to time out waiting for the feed XML to arrive. We can fix this by abandoning LWP::Simple and using its bigger brother, LWP::UserAgent. In fact, after switching to LWP::UserAgent, I find that the reason some feeds were failing was because we were getting a "403 Forbidden" error. Spoofing the User Agent String and pretending to be Firefox fixes this. In an ideal internet it shouldn't, but the internet is far from ideal and it is a necessary fix.

The character encoding thing might require more time to debug, so for now I'll add a link to the feed itself after the title, and store any error messages I get in a second hashref that we can include in any feeds that fail.


Edit: It was HTML::Strip. It either didn't like the unicode characters, or was over-zealous in stripping things. I don't know. Changing sanify() to a simpler implementation fixed the problem.
sub sanify
{
   my ($str) = @_;
   $str //= "";
   $str =~ s/</&lt;/g;
   $str =~ s/>/&gt;/g;
   $str =~ s/&/&amp;/g;
   return $str;
}

Here's the final script. It's kludgy in places, I still haven't figured out why some odd characters are present in some feeds, and the output is completely unstyled. But it works, and with iGoogle shutting down I want to get this done and out there now.
#!/usr/bin/perl

use warnings;
use strict;
use utf8::all;    # Package Name: libutf8-all-perl
use XML::Twig;    # Package Name: libxml-twig-perl
use XML::RSS;     # Package Name: libxml-rss-perl
use LWP::UserAgent;
use FindBin qw/$RealBin/;

my $igoogle_settings_filename = "$RealBin/iGoogle-settings.xml";
my $number_of_rss_items_per_feed = 8;
print STDERR "Opening iGoogle settings file: $igoogle_settings_filename\n";

my $tabs = {}; # Tab Name => [ urls ];
# Called by XML::Twig on each ModulePrefs part.
sub collect_feed_urls
{
   my ($twig, $element) = @_;
   # Extract the URL from this element, and the tab name from its ancestor.
   my $url = $element->att('xmlUrl');
   my $tab_name = $element->parent('Tab')->att('title');
   # Put feed urls in an array ref, grouped by tab name.
   $tabs->{$tab_name} = [] unless defined $tabs->{$tab_name};
   push @{$tabs->{$tab_name}}, $url;
}

# Keep a quick record of any error messages that might help debug why a feed failed, for inclusion in the HTML.
my $error_messages = {};   # feed url -> string

sub main
{
   # Create our Twig object and tell it what bits we're interested in.
   my $twig = XML::Twig->new(
         twig_handlers => {
            'Tab//Module[@type="RSS"]/ModulePrefs' => \&collect_feed_urls,
         }
      );
   $twig->parsefile($igoogle_settings_filename); # XML::Twig will just die if there's a problem. Fine by me.

   # Fetch the feeds.
   my $rsses = get_feeds($tabs);

   # Print the HTML.
   print create_html($tabs, $rsses);
}


sub get_feeds
{
   my ($tabs) = @_;
   # Fetch the feeds and build a hashref of feed url => XML:RSS object (or undef)
   my $rsses = {};
   foreach my $url_list (values %$tabs) {
      foreach my $url (@$url_list) {
         $rsses->{$url} = get_feed($url);
      }
   }
   return $rsses;
}

sub get_feed
{
   my ($url) = @_;
   # Use LWP::UserAgent to fetch the page for us, and decode it from whatever text encoding it uses.
   print STDERR "Fetching $url ...";
   my $ua = LWP::UserAgent->new;
   $ua->timeout(30);    # A timeout of 30 seconds seems reasonable.
   $ua->env_proxy;      # Read proxy settings from environment variables.
   # One step that became necessary: LIE TO THE WEBSITES. Without this, Questionable Content
   # and some other sites instantly give us a 403 Forbidden when we try to get their RSS.
   # I know that Jeph Jacques, author of QC, did have some problem a while ago with some
   # Android app that was "Stealing" his content by grabbing the image directly and not the
   # advertising that supports his site. That's fine, but user-agent filters harm the web.
   # I'm just trying to get a feed of comic updates, here - I don't care if there's no in-line
   # image right there in the feed, all I need is a simple list of links to QC's archive
   # pages. That way Jeph gets his ad revenue, and his site is easy to check for new updates.
   # Sigh. Stuff like this, and sites that require Javascript to display simple images and text
   # are a blight on the internet.
   $ua->agent("Mozilla/5.0 Firefox/25.0");

   my $response = $ua->get($url);
   unless ($response->is_success) {
      print STDERR " Failed! ", $response->status_line, "\n";
      $error_messages->{$url} = "Feed download failed: " . $response->status_line;
      return;
   }
   my $feed_text = $response->decoded_content;

   # Obtained a bit of text over the interwebs, but is it valid RSS?
   my $rss = XML::RSS->new;
   eval {
      $rss->parse($feed_text);
   };
   if ($@) {
      print STDERR " Bad RSS feed!\n";
      $error_messages->{$url} = "RSS parsing failed: $@";
      return;
   }

   print STDERR " OK\n";
   return $rss;
}


sub create_html
{
   my ($tabs, $rsses) = @_;
   my $output = "";
   # My HTML syntax-highlighter hates the <meta> line for some reason, but only in the full script.
   # It's getting confused about just what it's being asked to highlight, perhaps.
   # Oh well! TMTOWTDI.
   my $content = "text/html;charset=utf-8";
   $output .= qq{<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">\n};
   $output .= qq!<html>\n!;
   $output .= qq!  <head>\n!;
   $output .= qq!    <meta http-equiv="content-type" content="$content" />\n!;
   $output .= qq!    <title>maiGoogle</title>\n!;
   $output .= qq!  </head>\n!;
   $output .= qq!  <body>\n!;

   $output .= html_body($tabs, $rsses);

   $output .= qq!  </body>\n!;
   $output .= qq!</html>\n!;
   return $output;
}

sub html_body
{
   my ($tabs, $rsses) = @_;
   my $output = "";
   # For each Tab, print a header and the feeds. Tabs are sorted in alphabetical order, why not.
   foreach my $tabname (sort keys %$tabs) {
      $output .= html_tab($tabname, $tabs->{$tabname}, $rsses);
   }
   return $output;
}

sub html_tab
{
   my ($tabname, $url_list, $rsses) = @_;
   my $output = "";
   $output .= qq!\n<h1>$tabname</h1>\n!;
   foreach my $url (@$url_list) {
      my $rss = $rsses->{$url};
      $output .= html_rss_feed($url, $rss);
   }
   return $output;
}

sub sanify
{
   my ($str) = @_;
   $str //= "";
   $str =~ s/</&lt;/g;
   $str =~ s/>/&gt;/g;
   $str =~ s/&/&amp;/g;
   return $str;
}

sub html_rss_feed
{
   my ($url, $rss) = @_;
   my $output = "";

   # Not all the feeds might have been fetched OK, or some might not have parsed properly.
   if ( ! defined $rss) {
      # Do we know what went wrong?
      my $error = $error_messages->{$url} // "The feed failed to load for mysterious reasons, sorry.";
      sanify($error);
      $output .= qq!<div class="feed">\n!;
      $output .= qq!  <span class="title"><a href="$url">$url</a></span>\n!;
      $output .= qq!  <p class="fail">$error</p>\n!;
      $output .= qq!</div>\n!;
      return $output;
   }

   # Feed seems to have loaded OK.

   # Figure out what the title of this feed is; default to the URL.
   my $title = $rss->channel('title') // $url;
   $title = sanify($title);
   # Where should clicking the title of the feed box link to?
   my $title_link = $rss->channel('link') // $url;
   $title_link = sanify($title_link);

   # Show them the top few items.

   $output .= qq!<div class="feed">\n!;
   $output .= qq!  <span class="title"><a href="$title_link">$title</a></span>\n!;
   $output .= qq!  <span class="rsslink"><a href="$url">rss</a></span>\n!;
   $output .= qq!  <ul>\n!;

   my @items = @{$rss->{items}};
   # Remove elements from the list, starting at N, to the end. This leaves elements 0..(N-1).
   splice @items, $number_of_rss_items_per_feed;

   # Show them as list items.
   foreach my $item (@items) {
      $output .= qq!    <li> !;
      $output .= html_rss_item($item);
      $output .= qq! </li>\n!;
   }
   $output .= qq!  </ul>\n!;
   $output .= qq!</div>\n!;

   return $output;
}

sub html_rss_item
{
   my ($item) = @_;
   # Return an html link to this RSS feed item.
   my $title = sanify($item->{title});
   my $link = sanify($item->{link});
   return qq!<a href="$link">$title</a>!;
}

main();
You can run this as-is by invoking e.g.
./maiGoogle.pl >maiGoogle.html
or put it in your crontab to run hourly or whatever. It wouldn't work well as a CGI script because it's going to have to check all of those feeds before it gets around to returning a page to you, every time.

I'm not going to promise further refinements in weekly installments, because we all know how that goes. If future posts happen, they happen; if not, tweak it yourself!

07 September 2013

Constantly correcting const correctness

I made a comment on Shamus Young's blog, a blog far nicer than mine which you should visit right now. He actually posts frequently! In this case, the subject of Software Engineering came up, and how it differs from the kind of programming that goes on in the trenches. Anyway, my comment ended up so lengthy that it ought to be a post unto itself. Here it is.

It can be hard to pin down these nebulous software development roles. As a subject, Software Engineering bored the hell out of me at Uni. It was all flowcharts and entity relationship diagrams and that jazz. Databases were thrown in there at one point because hey, databases.

The actual Software Engineering as a thing we need to do before putting code to screen is, well, something I just figured was part of the Good Programmer role. But I must admit, I didn't really appreciate the benefits of e.g. const-correctness until working on a very large project with other people, much later.

For the still-sane:-

const is a keyword in C++ and friends that (amongst other things) lets the programmer declare that this thing here won't be modified. While this doesn't really matter for the simple stuff like integers, which are passed by value, a lot of the really interesting stuff happens by passing references to objects around. If I declared my function as
Frobbable *
frobnicate(int a, string b, Frobbable *thing_to_frob);
then it might not be immediately obvious what happens to the Frobbable thing I pass in as the third argument; does it get modified, and the return value of the function is merely for convenience? Or does the function just make a new copy and return that? Changing the signature to
Frobbable *
frobnicate(int a, string b, const Frobbable *thing_to_frob);
is a guarantee from the author of the frobnicate function that no harm will come to the original thing_to_frob, leading us to believe that any changed versions will come back via the return value.

(Note, I wouldn't be passing raw pointers around, this is just an example, most likely it'd be better to use smart pointers declared as Frobbable::ptr_type and Frobbable::const_ptr_type, or turn the whole class into a pImpl-idiom thing, possibly with copy-on-write semantics... and then you see just how far the rabbit-hole goes)

The other thing adding const does for us is ensure that when we're writing the frobnicate function, we really don't accidentally modify the object when we know we shouldn't be. If the Frobnicate class declares some methods, some of which change the internal state of a Frobnicate object and some of which don't, we can (and should) add const to the method to indicate that fact:-
class Frobnicate
{
public:

   string 
   get_name() const; // doesn't change our internal state.

   void
   set_name(string new_name); // might.

private:
   string d_name;
}
Now if we accidentally were to use the set_name method inside our frobnicate function, the compiler will throw a fit at us, yelling at us that we promised we weren't going to change thing_to_frob, we promised, why would you do that. When we have a const Frobnicate object, the only methods we can call on it are those that have themselves been declared const.

This is where the real madness sets in. If you haven't been designing with const in mind from the beginning, it is an absolute nightmare to go back to all those methods and check if they should be const or not, just to gain this benefit of maybe having the compiler catch a few bugs for you. It's not possible to merely make a few objects const, because then the methods you use on those objects need to be const too, and anything they interface with will also need to be const, and so on. In a small project, I must admit I don't bother with it. Let everything be mutable, we're probably going to rip this code up tomorrow anyway. But in anything large, it can be very useful to prevent you from introducing side effects to your code. Suddenly you realise that yes, you really shouldn't be modifying this object at this time, this needs to be a wholly separate code path, this is why you've been getting that strange glitch.

15 May 2013

Naming Names, Part 1

I frequently go off on tangents. While watching some youtube videos I'd downloaded, I got the notion that it'd be neater to see them all grouped into the teams they are on. I was about to use my trusty, crusty, old bash function to rename them when it occurred to me that it was past due for an update. For reference, here is the function I've been using in my profile since Forever:-
function filenamesed () {
   if [ "$1" = "" -o "$2" = "" ]; then
      echo "Usage: filenamesed 'sedcmds' filename(s)"
      return
   fi
   SEDCMD="$1"
   shift
   while [ "$1" != "" ]; do
      F="$1"
      NF=`echo "$F" | sed "$SEDCMD"`
      if [ "$F" != "$NF" ]; then
         echo "$F -> $NF"
         mv -i "$F" "$NF"
      fi
      shift
   done
}
I won't explain it in depth. Suffice to say that it takes a GNU sed command as its first argument, and a bunch of files as other arguments, and then applies that sed command to the filenames. For example, when I want to change all the underscores in filenames into spaces, I would run:-
filenamesed 's/_/ /g;' videos/youtube/*.mp4
So if the original goal was "Watch some MindCrack UHC" and Tangent #1 was "rename those files", Tangent #2 is "update your crusty old shell script" and we finally come to the maximally-tangental task of bringing you lovely people along for the ride. Let's do this in Perl 5, piecemeal, and explain our thought processes along the way.


23 April 2013

Waking Up Is Difficult

I previously posted about my new samsung n220 that I'm using for development on the go. One problem with it that's plagued me for quite a while is suspend, and more specifically, resume. And it's hardly a consolation prize that the laptop can go to sleep so easily if it can't wake up again!

The trouble was that it only periodically failed to resume, which makes it a nightmare to debug as you can never 100% trust that your latest tweaks have actually fixed the problem or not.

Well, I'm posting this update to say that I'm almost maybe probably possibly definitely mostly 99% certain that I have a fix. It's not really addressing the base cause of the problem, which I suspect is a buggy ACPI BIOS, but it serves to prevent the strange failure state that it kept getting stuck in.

C State. C State run. Run, state, run!

Intel C States are the terms used to describe how active or sleepy the processor is at any given moment. C0 means the processor is busy working at full capacity, and higher numbers refer to increasing amounts of laziness in the name of power efficiency. A lot of the time while you are using your laptop, it really isn't doing anything special, and will sit in the lowest power state - the highest numbered C state. This isn't a problem, because under normal circumstances, it takes mere microseconds to snap back to life when needed.

Unless, maybe, it goes into this deep sleep mode just as the laptop is also going into an ACPI sleep mode and maybe it's just too sleepy to wake up and just five more minutes, please.

To look at how the C states are being used, we can use powertop , a lovely little utility made by Intel (but entirely usable on non-Intel machines) to track power usage. It's great for finding ways to increase your battery life, but also handy to illustrate what your CPU is doing and when. Here's what it looks like when starting it:-


Pushing the right arrow key takes us to the Idle stats page, which shows the percentage of time spent in each state.


Unfortunately, it seems that that max C4 state is interacting badly with whatever ACPI bug I also have. So the fix is to simply tell Linux to never let the processor daydream that hard, and to give it a good poke if it tries to doze off.

Editing the /etc/default/grub file allows us to change what kernel parameters are used at boot. The line to change is the GRUB_CMDLINE_LINUX one, and we want to add the intel_idle.max_cstate=3 parameter to prevent us from ever going into the C4 state. My /etc/default/grub now looks like this:-
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_HIDDEN_TIMEOUT=0
GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX="acpi_osi=Linux acpi_backlight=vendor intel_idle.max_cstate=3"

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"

Save the file, run sudo update-grub afterwards as directed, and reboot. There won't be any noticeable difference, but checking PowerTOP will show:-


We're no longer spending any time in the C4 state. And months and months of cautious use will show that it's actually working and hasn't failed to resume once!

Obviously, not entering the maximal power-saving state means slightly increased power draw on average and slightly reduced battery life. I haven't noticed a huge difference, personally; the power use tends to hover around 10W±1 and I still get more battery life from a full charge than I spend in a typical outing. I hope someone out there with similar problems finds this information useful. It's always so hard to find a good solution to these kinds of problems online, because they're always very specific to individual hardware and software configurations. What worked fine for one person may be completely useless for another; I tried blacklisting modules for various bits of possibly-buggy hardware, and tried different acpi-related kernel params. Just as I decided that I had finally found the solution, it would get locked up again. But I've finally got it sorted out and there's just one more problem to fix: The 3G modem!

16 March 2013

Mixing up Minecraft

So I've put the Curses / Perl 5 stuff on hold for now while I work on a new secret project. I've been busy doing C++ and Qt stuff, making a nice GUI app to help people make remixes of their favourite Minecraft texture packs.

I started this around the new year, and was racing to get something done before the Minecraft 1.5 release. Well, so much for that! The thing with the Minecraft 1.5 release is that it splits the terrain and items sheets up into individual files, which are generally easier for people to work with. In fact, pretty much as soon as I had the idea to make this tool, Mojang announced they'd be changing how texture packs work to make things easier for users. Damn their black hearts!

Anyway, I think my tool still has a lot of potential to reduce the headache of merging bits of texture packs together and generally tweaking things. Certainly there's a few extra features on my to-do list now: the new version supports animations for everything, so that'll be a fun feature to work on.

Here's a screenshot of my not-yet ready alpha version. Dokucraft is on the left, and the default Minecraft on the right, but you could load any number of packs and arrange them how you please. The central area is where you'll be able to drag and drop tiles into to create your own custom pack, and the program will remember the steps you took to make it simpler when Minecraft and the texture packs inevitably get updates.



There's still much to do of course, but I've finally got most of the "infrastructure" code out of the way that I'm happy enough with it to show people. If you've got some special workflow or tweak that you like to apply to your Minecraft texture packs, leave a comment!

04 February 2013

Neko's New Netbook

Around the end of last year, I decided I could really use a small portable laptop to do work on. I still have and love my old macbook, but while it is certainly "portable" it is not the sort of machine I would want to just grab and go to the park with. It has a glossy glass screen, which is ideal for crystal-clear viewing indoors but completely useless outdoors.

The Project

My n220 with its default matte display.
I started to wonder - e-ink type displays are ideal for the outdoors. And writing doesn't require a high refresh rate, really. Were there any notebooks with that sort of screen and a go-anywhere attitude? Surely there were plenty of writers out there who would love such a thing. Well, my initial searches concluded that yes, there were people who also wanted something like that - who pose the question only to get shot down on forums because "why would you want that". It's true that e-ink displays have a refresh rate problem, and perhaps that would have some implications even for typing text. I was undaunted and continued searching.

Then I found out about Pixel Qi displays. These are the same displays that were designed for the One Laptop Per Child project, and the designer is also making the small 10-inch displays available to commercial partners or interested hackers. Indoors, they work as a traditional LCD - colour and everything. The cool thing is, direct sunlight doesn't obliterate the image. Instead, it gracefully 'degrades' to a greyscale e-paper-like display, which looks great.

I learned I could shell out $800 for a "Sunbook" made by one of these partners, but the price was a bit steep for the specs, and it was basically just a Samsung n110 fitted with the screen.

Testing the screen just after replacing it.
Perhaps I could do better if I bought a netbook compatible with the display and fitted it myself! And this is exactly what I did. I trawled the Make forum for the Pixel Qi display, gathered as much info as I could, and bought a second-hand Samsung n220 from eBay. The screen, obtained via MakerShed, ended up costing me more than the laptop itself! But all up, it was still cheaper, more powerful, and more interesting to buy the parts myself and assemble them.

Up until then, I had never modified a laptop more than switching hard drives or RAM. Prising the plastic covers off the body and screen involved many scary snapping noises. Have I broken it?!? I've broken it, haven't I?! The LVDS connector was fiddly, as expected. However, all went well and I've levelled up my Hardware skill.

My New Toy

So, how happy am I with the Pixel Qi screen and the Samsung n220? Pretty damn happy! It enables me to just casually grab the netbook and go out for a walk - to one of the local parks, the library, wherever - set up and do some programming. It means that if home is too noisy to get any work done, it doesn't matter as much; the whole world is my office!

There are a few minor quibbles, certainly. There's a tiny bit of light bleed around the edge of the display (when it's dark enough that the light is even necessary, of course!). It doesn't have quite the same colour range or viewing angle that the original samsung LCD had. The netbook itself is from 2010, and while capable, isn't the fastest machine around.

But those things don't really matter. I'm not playing games on this thing, I'm working. I don't mind if code takes a little longer to compile. Most of my time is spent writing code, and that's where this netbook shines. Quite literally, in direct sunlight, because the display becomes easier to read. I can't emphasise enough how good it is to be able to use it outside. Especially since (what with the health problems) I've been super deficient on vitamin D and need to get lots more of that. But also simply because it's nice to get out into the fresh air.

Battery life was already pretty good to begin with at around 5 hours. Now that I've installed xubuntu, the screen and a small SSD, the battery indicator says I have 8 hours of charge to play with! I have never managed to use up all that charge in one go, so I'd say it's perfect.

System of Operations

Out in the park with the Pixel Qi display.
I tried out a bunch of distro combinations and settled on xubuntu 12.10, the 64-bit version. There wasn't a huge difference between 32-bit and 64-bit, they both have a slight edge over each other performance wise, but 64-bit is the future and if the CPU supports it that's really what I should be using. Memory usage isn't a terrible problem; I don't plan to keep the same Firefox habits as I do on the macbook and leave 30+ tabs open from months ago. That said, the XFCE desktop environment was much lighter on memory usage and much snappier than anything else I tried, and suits this machine ideally. I can customise the panels to maximise screen real estate, and installed Synapse for a lightning-fast little program launcher.

The other cool thing about this laptop is that it has a built-in 3G modem. It shows up as an internal USB device, and you can (in theory) configure Network Manager to build a Mobile Broadband connection using it. I haven't got this working yet; I don't think it's Linux' fault, the only SIM card I have is for my phone and perhaps there is some subtle difference I'm missing in the setup details. Perhaps when I'm earning some money from selling software here on the interwebs, I can justify spending for little extras like this. If I get it working, I'll be sure to edit this post.

Alas, I haven't yet pinned down the problem I have with pretty much all linux machines I've ever used:- Suspend and Resume. It's a game of Russian Roulette when suspending this machine, you have no idea whether it'll wake up properly or not. Happily, shutdown and boot are pretty fast, so you don't absolutely need it. Still, it'd be nice to preserve state between walkabouts. Again, I'm doing some research and suspect it's a misbehaving module that's to blame, but the infuriating thing about debugging these problems is that they're intermittent and you never know for sure if you've fixed it or not (spoiler: you haven't).

Update: I've found a workaround for the suspend/resume problem! See my latest post here.

Technical Details

To enable the display brightness controls for this laptop in Linux, you need to pass the acpi_osi=Linux acpi_backlight=vendor options to the kernel. You can put this in /etc/default/grub to get it active permanently. Fn+Up/Down arrows will now control the backlight brightness under X, but it doesn't let you go all the way down to "off", which is what you need to switch the Pixel Qi into pure reflective mode and maybe save a bit of power. To fix this, I found that you can also control what the intel graphics driver is telling the panel for its brightness level. These are two independent brightness controls, so what I've done for now is just bind a key (I used Fn-F5, XF86Launch1) to toggle the brightness completely on or off via the intel driver. Here's the script I used:-


#!/bin/bash

# To run this from a keybind without sudo wanting a password, you should make the
# file /etc/sudoers.d/qi_backlight with the contents:-
#
# james ALL=(root)NOPASSWD:/usr/bin/tee /sys/class/backlight/intel_backlight/brightness
#
# and make the file 440. Maybe make the file with a ~ on the name first, then
# finally mv it into place, or keep a root shell around; sudo likes to fall
# over and die if even the slightest thing is out of place.


# ---- Must be using correct hardware ----
[[ -f /sys/class/backlight/intel_backlight/max_brightness ]] || exit

MAX_BRIGHTNESS=`cat /sys/class/backlight/intel_backlight/max_brightness`
CURR_BRIGHTNESS=`cat /sys/class/backlight/intel_backlight/brightness`

if (( $CURR_BRIGHTNESS > 0 )); then
 NEW_BRIGHTNESS=0
else
 NEW_BRIGHTNESS="$MAX_BRIGHTNESS"
fi
echo "Changing brightness from $CURR_BRIGHTNESS to $NEW_BRIGHTNESS."

echo "$NEW_BRIGHTNESS" | sudo tee /sys/class/backlight/intel_backlight/brightness

As the comments in the script say, be careful if you're setting this up to let you sudo it without a password! A mangled sudoers file will lock you out of any root activities until you fix it (via booting in single user mode, liveUSB rescue, etc).

I'm really loving this laptop, and can't understand why the makers of Pixel Qi aren't pushing for it to be used in more tech. It's great, but if it's not easy to get a hold of, its popularity will remain limited.

Yes, there's glare in the photo, it looks much nicer in real life.