For a list of BASHing data 2 blog posts see the index page.
A text full of nulls - what happened?
You have a simple ASCII CSV, "orig".
You send it to a friend who makes a minor edit in line 5 ("9.4" > "9.3") and sends it back to you. The edited version ("new") looks fine in your Geany text editor
but a couple of uninterpreted characters seem to have been added at the beginning of the file, and the file has mysteriously doubled in size.
The -c %s option for stat shows the file size in bytes.
Well, the bad news is that your friend is probably running Microsoft Windows. The good news is that you can fix the file problem.
Your friend evidently did the editing in a Windows program which uses UTF-16 encoding and sent you the result as-is:
In UTF-16 encoding, each of the ASCII characters had a null byte added, which explains the doubling of file size. The strange characters at the start of "new" are a byte order mark that identifies the encoding as UTF-16, little-endian.
Just convert "new" back to ASCII and all will be well.
Last update: 2024-06-07
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License