For a list of BASHing data 2 blog posts see the index page.

Beware these characters in a terminal

Back in 2018, I showed in a BASHing data post that printing a Single Character Introducer (SCI) in a terminal emulator could unexpectedly generate a mass of "c62;" strings followed by a hang (you didn't return to a prompt). This strange result appeared in gnome-terminal and xfce4-terminal, but not in xterm. An IT security professional, Timothy Bolton, wrote to me in 2018 that

printf "\x1b\x5b\x35\x63"

likewise pumped out "c62;", and although I could confirm it at the time, I can't now. Instead, in the terminal emulators I've tried in 2025, the SCI simply hides the next character, and Bolton's string does nothing. Here's how that looks in my usual emulator, sakura (3.8.6); I get the same result with xfce4-terminal (1.1.3):

I wrote "hides" because the next character isn't really deleted, as can be seen in a text editor:

I have no idea why the 2018 and 2025 results are different. In both years my machines were running Debian stable as a base with the en_AU.UTF-8 locale.

The following table shows what happens when I print each of the C0 and C1 control characters within the string AB[control character]CD\n in sakura.

Acronym	Name	Unicode	Effect
C0
NUL	Null	U+0000	none
SOH	Start of heading	U+0001	none
STX	Start of text	U+0002	none
ETX	End of text	U+0003	none
EOT	End of transmission	U+0004	none
ENQ	Enquiry	U+0005	none
ACK	Acknowledge	U+0006	none
BEL	Bell	U+0007	none
BS	Backspace	U+0008	hides previous character
HT	Horizontal tabulation	U+0009	adds tab character
LF	Line feed	U+000A	starts new line with next character
VT	Vertical tabulation	U+000B	starts new line with next character at its previous horizontal position
FF	Form feed	U+000C	starts new line with next character at its previous horizontal position
CR	Carriage return	U+000D	adds newline, hides preceding characters
SO	Shift out	U+000E	none
SI	Shift in	U+000F	none
DLE	Data link escape	U+0010	none
DC1	Device control 1	U+0011	none
DC2	Device control 2	U+0012	none
DC3	Device control 3	U+0013	none
DC4	Device control 4	U+0014	none
NAK	Negative acknowledge	U+0015	none
SYN	Synchronous idle	U+0016	none
ETB	End transmission block	U+0017	none
CAN	Cancel	U+0018	none
EM	End of medium	U+0019	none
SUB	Substitute	U+001A	appears as �
ESC	Escape	U+001B	hides next character
FS	File separator	U+001C	none
GS	Group separator	U+001D	none
RS	Record separator	U+001E	none
US	Unit separator	U+001F	none

C1
PAD	Padding character	U+0080	none
HOP	High octet preset	U+0081	none
BPH	Break permitted here	U+0082	none
NBH	No break here	U+0083	none
IND	Index	U+0084	starts new line with next character at its previous horizontal position
NEL	Next line	U+0085	starts new line with next character
SSA	Start of selected area	U+0086	none
ESA	End of selected area	U+0087	none
HTS	Horizontal tabulation set	U+0088	none
HTJ	Horizontal tabulation with justification	U+0089	adds tab character
VTS	Vertical tabulation set	U+008A	none
PLD	Partial line down	U+008B	none
PLU	Partial line up	U+008C	none
RI	Reverse index	U+008D	rewrites entry (see below)
SS2	Single shift 2	U+008E	none
SS3	Single shift 3	U+008F	none
DCS	Device control string	U+0090	hides next characters, including newline
PU1	Private use 1	U+0091	none
PU2	Private use 2	U+0092	none
STS	Set transmit state	U+0093	none
CCH	Cancel character	U+0094	none
MW	Message waiting	U+0095	none
SPA	Start protected area	U+0096	none
EPA	End protected area	U+0097	none
SOS	Start of string	U+0098	hides next characters, including newline
SGCI	Single graphic character introducer	U+0099	none
SCI	Single character introducer	U+009A	hides next character
CSI	Control sequence introducer	U+009B	inserts itself invisibly, hides next character
ST	String terminator	U+009C	none
OSC	Operating system command	U+009D	hides next characters, including newline
PM	Privacy message	U+009E	hides next characters, including newline
APC	Application program command	U+009F	hides next characters, including newline

I worry about the characters with something other than "none" for an effect, because I do data auditing in a terminal, and I've found almost all the C0 and C1 characters lurking in text files at one time or another. Most of the time they're part of mojibake arising from a failed encoding conversion. I can't simply delete the control characters — I need to work out what the original characters were before the mojibake-ing.

Below I show the "effective" C1 characters at work, except RI:

This is, of course, only how the strings look in a terminal. The NEL control character, for example, doesn't really generate a newline, as shown in a text editor:

RI is a special and very weird case. The two screenshots below show what my terminal looks like before and after pressing Enter:

Once again, the output looks fine in a text editor if saved as "ri-demo":

Summing up. I haven't tried every modern terminal emulator, but the ones I've looked at will all interpret a C1 control character as though they were ANSI-standard teletype machines. This is annoying behaviour when working with text files in a terminal. In contrast, neither a TTY nor the venerable xterm pay attention to the meaning of the C1 control characters:

Next post:
2025-06-27 Five ways to pass a shell variable to AWK

Last update: 2025-06-20
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License