October 18th, 2014 | Tags: , , ,

In this post we look at how text data can be transposed in a shell script. Suppose you have a comma-delimited text file (csv) which looks like this:


2014-10-01,Reading1,20.3
2014-10-01,Reading2,21.5
2014-10-01,Reading3,24.0
2014-10-01,Reading4,22.2
2014-10-02,Reading1,20.5
2014-10-02,Reading2,21.5
2014-10-02,Reading3,24.1
2014-10-02,Reading4,22.4
2014-10-03,Reading1,20.5
2014-10-03,Reading2,21.7
2014-10-03,Reading3,24.2
2014-10-03,Reading4,22.5

…and so on. Perhaps this is a set of sensor readings over a period of time, and in this case there are four readings per day. For further analysis it might be more suitable to store each date on a single line with the four readings as columns. In other words we want to transpose rows to columns, i.e. pivot the values on date. The file should look like this:


2014-10-01,20.3,21.5,24.0,22.2
2014-10-02,20.5,21.5,24.2,22.4
2014-10-03,20.5,21.7,24.1,22.5

Since this needs to process multiple input rows of to produce one output row, sed will not be suitable. Instead we need to use awk. The following tiny script will do the trick.


awk -F, '{val = val "," $3; if( NR % 4 == 0 ) { print $1 val; val = "" } }'

Now, what exactly is this doing? First, we need to specify that the file is comma delimited, which is what -F, does. Next, the main principle is that the code stored between the curly brackets will be executed individually for each row, however a session (including variables) is maintained throughout the execution of entire input. So val is a variable into which we are storing the third field on each row ($3) prepended by a comma. The if statement checks whether the row number (NR is a special built in variable which holds the number of the row being currently processed) is divisible by four (% is the modulo function, as in most languages). If yes, we print the date which is the first column ($1) as well as the val variable which now has the values from the previous three rows as well as this one, separated by commas. The variable is then reset.

Obviously, we are making an assumption here that the data is uniform, i.e. that there are exactly four readings available for each day; otherwise the script would be a little more complex.

October 18th, 2014 | Tags: , ,

Suppose you have a monthly process to archive some data such as log files etc. Each month a separate archive file is created, and so after a few months you will have several archive files – for example as shown below:


archive.2014-08.tar.gz
archive.2014-09.tar.gz
archive.2014.10.tar.gz

Now if you wish to extract your data from all three files, you could run individual commands such as:


tar -zxvf archive.2014-08.tar.gz
tar -zxvf archive.2014-09.tar.gz
tar -zxvf archive.2014-10.tar.gz

This works fine. However the following won’t work if you want a one-line which does them all in one short:


tar -xzvf *.tar.gz
tar: archive.2014-09.tar.gz: Not found in archive
tar: archive.2014-10.tar.gz: Not found in archive

Basically the way that tar command works, if more than one filename (or an expansion) is passed as an argument, it will look for the second, third, etc. files inside the first file. So the error is saying it cannot find the second and third archive file inside the first one (archive.2014-08.tar.gz). So in essence the tar command itself really does have to be called individually for each file. The way to simplify this on a one-liner is to use the xargs command which will do exactly that:


ls *.tar.gz | xargs -n1 tar -xzvf

October 25th, 2013 | Tags: , ,

FPrint is a PPA with packages for fingerprint-based authentication. Website includes good documentation on how to install and set it up.

October 16th, 2013 | Tags: , , ,

This is a great one-liner which removes old kernel images and frees up space in your boot partition:


sudo apt-get purge $(dpkg -l linux-{image,headers}-"[0-9]*" | awk '/ii/{print $2}' | grep -ve "$(uname -r | sed -r 's/-[a-z]+//')")

This comes from the top answer to a question on ask ubuntu.

September 20th, 2013 | Tags:

Here’s an article on Lego’s Land on how to reorder accounts in the left pane of Thunderbird.

August 16th, 2012 | Tags: , ,

The Nokia N900 is getting a little old now, but it still an amazing piece of kit. This post has a few pointers for making the most out of command line usage it enables.

  • Command Line Execution Widget: This widget lets you run commands from the desktop and outputs the results..
  • Cmd Shortcuts: Allows you to quickly run your own defined commands.
  • gPodder: This excellent podcast catcher comes with a command line interface. Run gpo from the terminal to get the full list of options.
  • FeedingIt: This RSS aggregator can be run from the command line. The known options are /usr/bin/FeedingIt update and /usr/bin/FeedingIt status.
  • Alarmed: This is a graphical interface to the cron scheduler. Apart from neat access to phone functionality (e.g. switching profiles, networking, and yes alarms) it also allows arbitrary shell commands to be scheduled. This is particularly useful with the gPodder and FeedingIt mentioned above. To use this tool you currently need to enable the testing repository.
August 11th, 2012 | Tags: , , ,

Most digital cameras store Exif data in the JPEG photo files. This includes things like date and time, camera model and camera settings and in some cases even GPS coordinates. jhead is a very useful command line utility which can read and edit the Exif data. For example, you may wish to remove the data from photos published online.

Another useful thing is to rename photos using the date and time information stored in Exif (note: read the documentation before running):

jhead -n%Y%m%d-%H%M%S *.jpg

And best of all jhead is in the Ubuntu universe repository, so installing it is as simple as:

sudo apt-get install jhead

April 20th, 2012 | Tags:

If you are evaluating reporting or analytics tools, or just like to mess about with them, it’s always good to get your hands on some “real world” data sets. It sure beats using the Steel Wheels, Classic Models, eFashion etc. sample databases typically shipped with the products.

For a truly comprehensive list of free data sets, check out this page on Quora. Below are some of my favourites:

The following sites are also worth following on a more ongoing basis:

February 4th, 2012 | Tags: ,

Steps to install Oracle Express Edition (XE) database 10g on Ubuntu 11.10 (Oneiric).

  1. Download the Oracle XE deb package (free registration is required).
  2. Double click the downloaded file and select to install it.
  3. In terminal run sudo /etc/init.d/oracle-xe configure.
  4. You will be prompted to enter the following parameters: HTTP port number, database listener port number, SYSTEM and SYS database accounts password and whether the service should be started upon boot.
  5. Thereafter the configuration might take a few minutes. That’s it. To start the service in the future run sudo /etc/init.d/oracle-xe start and to stop sudo /etc/init.d/oracle-xe stop.
October 17th, 2011 | Tags: ,

GNU/Linux includes many utilities for working with text files through the shell. In this post we take a quick look at accessing and manipulating text files in a “column-wise” mode.

Suppose you have the following two files, each with two columns separated by the TAB character.

$cat file1
Alice   Paris
Bob     Tokyo
Mary    London
John    New York

$cat file2
13 May    Orange
19 Oct    Blue
11 Nov    Black
29 Feb    Red

The data in the two files are in fact related, i.e. file2 contains the date of birth and favourite colour of the people mentioned in file1 (assuming also that the files are sorted correctly). It would make sense to combine the two files together so that each row has the full data for each person. The paste command does just that.

$paste file1 file2 > file3
$cat file3
Alice   Paris     13 May    Orange
Bob     Tokyo     19 Oct    Blue
Mary    London    11 Nov    Black
John    New York  29 Feb    Red

Suppose that we are only interested in the name and date of birth of each person, and we can discard the hometown and favourite colour information. The cut command is what we shall use:

$cut file3 -f 1,3 > file4
$cat file4
Alice   13 May
Bob     19 Oct
Mary    11 Nov
John    29 Feb

Our next and final requirement is to reorder the columns differently. Instead of having the name followed by date of birth, suppose we want to have the columns the other way round. Unfortunately cat -f 3,1 produces exactly the same output as cut -f 1,3, so the cut command will not be sufficient. We have to use sed instead.

$sed -e 's/([^t]*)t([^t]*)/2t1/' file4 > file5
$cat file5
13 May    Alice
19 Oct    Bob
11 Nov    Mary
29 Feb    John

How does that work? Well ([^t]*) is a “named expression” which matches all characters except TAB. The search pattern looks for two of them, separated by TAB (t). In the replace-with part, they are referred to as 2 and 1, again separated by t.

Of course if file5 was what we ultimately wanted from the beginning as our output, we could have simply piped commands together:

$paste file1 file2 | cut -f 1,3 | sed -e 's/([^t]*)t([^t]*)/2t1/' > file5

or alternatively

$paste file1 file2 | sed -e 's/([^t]*)t([^t]*)t([^t]*)t([^t]*)/3t1/' > file5