banner

For a list of BASHing data 2 blog posts see the index page.    RSS


Timing a CSV to TSV operation

I regularly convert CSVs to TSVs on the command line for data auditing, and my favourite converter is csvformat -T from the csvkit bundle of CSV utilities.

A recent conversion took a remarkably long time to finish, so I wondered whether the ancient c2t function from the Cookbook could do big conversions faster. To find out I made up a 100,000-record CSV named "demo.csv". This was a well-behaved CSV with no over-quoting, no superfluous commas, no embedded newlines, no included tabs etc etc. I then timed the two commands. I used the BASH keyword time as a timer, where the "real" result is the actual elapsed time for the process.

timer1

OK, that's a pretty clear answer.

Now I had another question: are there other ways to quickly and easily compare process times?


That other "time". Different from the BASH keyword time is GNU time. I can launch it by by entering its full path (/usr/bin/time), or by quoting it, or by escaping any one of its characters, as shown below. Using the -p option gives an output like that of the keyword time.

timer2

Timing of aliases and shell functions like "c2t" needs additional fiddling, because GNU time doesn't understand them:

timer3

One solution is to first export -f the function as a shell environment variable. The function will appear in the list of environment variables generated by printenv, but will disappear when the shell session is closed:

timer4

GNU time can now time the function's work if it's entered as a BASH command. The -c option to bash means that the whole string "c2t demo.csv" is read and isn't broken into command and argument:

timer5

A job for "pv". I use the pipe viewer pv command in some of my scripts to show a progress bar for an operation. With its -t option, pv displays elapsed time as a seconds counter that stops counting when the process is finished. In the screenshot below, I've also used the -p option to show a progress bar, and -w 50 to restrict the bar to 50 characters wide:

timer6

Note that I used pv by feeding it the argument to my command ("demo.csv") and piping the output to the command I wanted to time.


...and in conclusion, I'll stick with the BASH keyword time!


Next post:
2024-11-08   Pretty-printing a table in the terminal


Last update: 2024-11-01
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License