banner

For a list of BASHing data 2 blog posts see the index page.    RSS


Beware these characters in a terminal

Back in 2018, I showed in a BASHing data post that printing a Single Character Introducer (SCI) in a terminal emulator could unexpectedly generate a mass of "c62;" strings followed by a hang (you didn't return to a prompt). This strange result appeared in gnome-terminal and xfce4-terminal, but not in xterm. An IT security professional, Timothy Bolton, wrote to me in 2018 that

printf "\x1b\x5b\x35\x63"

likewise pumped out "c62;", and although I could confirm it at the time, I can't now. Instead, in the terminal emulators I've tried in 2025, the SCI simply hides the next character, and Bolton's string does nothing. Here's how that looks in my usual emulator, sakura (3.8.6); I get the same result with xfce4-terminal (1.1.3):

SCI

I wrote "hides" because the next character isn't really deleted, as can be seen in a text editor:

sci-text

I have no idea why the 2018 and 2025 results are different. In both years my machines were running Debian stable as a base with the en_AU.UTF-8 locale.

The following table shows what happens when I print each of the C0 and C1 control characters within the string AB[control character]CD\n in sakura.

AcronymNameUnicodeEffect
C0
   NULNullU+0000none
   SOHStart of headingU+0001none
   STXStart of textU+0002none
   ETXEnd of textU+0003none
   EOTEnd of transmissionU+0004none
   ENQEnquiryU+0005none
   ACKAcknowledgeU+0006none
   BELBellU+0007none
   BSBackspaceU+0008hides previous character
   HTHorizontal tabulationU+0009adds tab character
   LFLine feedU+000Astarts new line with next character
   VTVertical tabulationU+000Bstarts new line with next character at its previous horizontal position
   FFForm feedU+000Cstarts new line with next character at its previous horizontal position
   CRCarriage returnU+000Dadds newline, hides preceding characters
   SOShift outU+000Enone
   SIShift inU+000Fnone
   DLEData link escapeU+0010none
   DC1Device control 1U+0011none
   DC2Device control 2U+0012none
   DC3Device control 3U+0013none
   DC4Device control 4U+0014none
   NAKNegative acknowledgeU+0015none
   SYNSynchronous idleU+0016none
   ETBEnd transmission blockU+0017none
   CANCancelU+0018none
   EMEnd of mediumU+0019none
   SUBSubstituteU+001Aappears as �
   ESCEscapeU+001Bhides next character
   FSFile separatorU+001Cnone
   GSGroup separatorU+001Dnone
   RSRecord separatorU+001Enone
   USUnit separatorU+001Fnone
C1
   PADPadding characterU+0080none
   HOPHigh octet presetU+0081none
   BPHBreak permitted hereU+0082none
   NBHNo break hereU+0083none
   INDIndexU+0084starts new line with next character at its previous horizontal position
   NELNext lineU+0085starts new line with next character
   SSAStart of selected areaU+0086none
   ESAEnd of selected areaU+0087none
   HTSHorizontal tabulation setU+0088none
   HTJHorizontal tabulation with justificationU+0089adds tab character
   VTSVertical tabulation setU+008Anone
   PLDPartial line downU+008Bnone
   PLUPartial line upU+008Cnone
   RIReverse indexU+008Drewrites entry (see below)
   SS2Single shift 2U+008Enone
   SS3Single shift 3U+008Fnone
   DCSDevice control stringU+0090hides next characters, including newline
   PU1Private use 1U+0091none
   PU2Private use 2U+0092none
   STSSet transmit stateU+0093none
   CCHCancel characterU+0094none
   MWMessage waitingU+0095none
   SPAStart protected areaU+0096none
   EPAEnd protected areaU+0097none
   SOSStart of stringU+0098hides next characters, including newline
   SGCISingle graphic character introducerU+0099none
   SCISingle character introducerU+009Ahides next character
   CSIControl sequence introducerU+009Binserts itself invisibly, hides next character
   STString terminatorU+009Cnone
   OSCOperating system commandU+009Dhides next characters, including newline
   PMPrivacy messageU+009Ehides next characters, including newline
   APCApplication program commandU+009Fhides next characters, including newline

I worry about the characters with something other than "none" for an effect, because I do data auditing in a terminal, and I've found almost all the C0 and C1 characters lurking in text files at one time or another. Most of the time they're part of mojibake arising from a failed encoding conversion. I can't simply delete the control characters — I need to work out what the original characters were before the mojibake-ing.

Below I show the "effective" C1 characters at work, except RI:

CI effects

This is, of course, only how the strings look in a terminal. The NEL control character, for example, doesn't really generate a newline, as shown in a text editor:

nel-demo

RI is a special and very weird case. The two screenshots below show what my terminal looks like before and after pressing Enter:

ri

Once again, the output looks fine in a text editor if saved as "ri-demo":

ri-demo

Summing up. I haven't tried every modern terminal emulator, but the ones I've looked at will all interpret a C1 control character as though they were ANSI-standard teletype machines. This is annoying behaviour when working with text files in a terminal. In contrast, neither a TTY nor the venerable xterm pay attention to the meaning of the C1 control characters:

tty

Next post:
2025-06-27   Five ways to pass a shell variable to AWK


Last update: 2025-06-20
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License