For a list of BASHing data 2 blog posts see the index page.
Timing a CSV to TSV operation
I regularly convert CSVs to TSVs on the command line for data auditing, and my favourite converter is csvformat -T from the csvkit bundle of CSV utilities.
A recent conversion took a remarkably long time to finish, so I wondered whether the ancient c2t function from the Cookbook could do big conversions faster. To find out I made up a 100,000-record CSV named "demo.csv". This was a well-behaved CSV with no over-quoting, no superfluous commas, no embedded newlines, no included tabs etc etc. I then timed the two commands. I used the BASH keyword time as a timer, where the "real" result is the actual elapsed time for the process.
OK, that's a pretty clear answer.
Now I had another question: are there other ways to quickly and easily compare process times?
That other "time". Different from the BASH keyword time is GNU time. I can launch it by by entering its full path (/usr/bin/time), or by quoting it, or by escaping any one of its characters, as shown below. Using the -p option gives an output like that of the keyword time.
Timing of aliases and shell functions like "c2t" needs additional fiddling, because GNU time doesn't understand them:
One solution is to first export -f the function as a shell environment variable. The function will appear in the list of environment variables generated by printenv, but will disappear when the shell session is closed:
GNU time can now time the function's work if it's entered as a BASH command. The -c option to bash means that the whole string "c2t demo.csv" is read and isn't broken into command and argument:
A job for "pv". I use the pipe viewer pv command in some of my scripts to show a progress bar for an operation. With its -t option, pv displays elapsed time as a seconds counter that stops counting when the process is finished. In the screenshot below, I've also used the -p option to show a progress bar, and -w 50 to restrict the bar to 50 characters wide:
Note that I used pv by feeding it the argument to my command ("demo.csv") and piping the output to the command I wanted to time.
...and in conclusion, I'll stick with the BASH keyword time!
Next post:
2024-11-08 Pretty-printing a table in the terminal
Last update: 2024-11-01
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License