banner

For a full list of BASHing data blog posts see the index page.     RSS


Spreadsheet annoyance no. 3: quotes have priority

In an earlier post I complained about spreadsheet programs: Excel, LibreOffice Calc and Gnumeric. All of them confuse non-dates with dates, and automatically interpret certain number strings with 2 colons as [h]:mm:ss. Grrr.

I've now found another example of spreadsheets doing unexpected things with plain-text data. By default, spreadsheet programs see quotes (") as string delimiters. If there's a quote at the start of a data item but none at the end, the results are... interesting. To demonstrate, I'll use these 7 lines:

No quotes
"Quotes start and end"
"Quote at start
Filler line
Filler line
Quote at end"
No quotes

If I copy-paste the 7 lines into LibreOffice Calc, Gnumeric or Google Sheets, the spreadsheet puts everything between the 2 isolated quotes into one cell. The 7 lines get reduced to 4 rows:

annoy1

In other words, the default row-building policy is that string-delimiting quotes are more important than newlines. That's sort of logical in the example I've given, but not in other situations, as shown below.


Next I'll try a case where the "closing" isolated quote isn't at the end of a line:

No quotes
"Quotes start and end"
"Quote at start
Filler line
Filler line
Quote " in middle
No quotes

annoy2

Calc puts the last 5 lines into a single cell, as though it's looking for a closing quote at the end of a line and if it can't find it, the entire rest-of-file is treated as part of the quote-delimited string.

Gnumeric finds what it thinks is a closing quote, then deletes the end of the line in which that quote appears. The " in middle" string has vanished — if I save the sheet as a Gnumeric file and read it with zcat, that string just ain't there:

annoy3

Sheets finds a closing quote and doesn't display it, but unlike Gnumeric it retains the remainder of the line.


What happens if 2 lines have starting quotes but no closing quotes?

No quotes
"Quotes start and end"
"Quote at start
Filler line
Filler line
"Quote at start
No quotes

annoy4

Calc again puts the last 5 lines into a single cell, not recognising the second isolated quote as close-of-string.

Gnumeric accepts the second isolated quote as close-of-string and discards everything after it in the line.

Sheets accepts the second isolated quote as close-of-string but not close-of-row.


Finally, what happens if the lines with isolated quotes don't start or end with one?

No quotes
"Quotes start and end"
Quote " in middle
Filler line
Filler line
Quote " in middle
No quotes

annoy5

No problem: all 3 spreadsheet programs return the copy-pasted 7 lines in 7 rows, and display the isolated quotes in the middles of lines.


So what? If you want to look at a plain-text file in a spreadsheet, one line to a row, make sure there aren't any unmatched quotes at the beginning of lines, otherwise Strange Things will happen. With Calc and Gnumeric, you can avoid Strange Things by changing the string delimiter in the text import dialog from a quote (") to something else; I don't know how to change the default in Google Sheets.


Last update: 2021-01-20
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License