For a full list of BASHing data blog posts, see the index page.
The trouble with Windows CRLF
In 2018 there was hope for Windows end-of-line (CRLF) sufferers. Microsoft announced that the latest version of its text editor "Notepad" would correctly open text files with plain linefeed (LF) line endings:
For many years, Windows Notepad only supported text documents containing Windows End of Line (EOL) characters - Carriage Return (CR) & Line Feed (LF). This means that Notepad was unable to correctly display the contents of text files created in Unix, Linux and macOS...
...This has been a major annoyance for developers, IT Pros, administrators, and end users throughout the community.
Today, we’re excited to announce that we have fixed this issue!
Then Microsoft acquired GitHub, and people began to wonder if 2018 was the beginning of the end for CRLF in Windows programs, and an end to the Great Newline Schism. I doubt it, just as I doubt that Microsoft will join the rest of us in the 21st century and move to UTF-8 encoding anytime soon.
A Data Cleaner's Cookbook has a page devoted to finding and removing CR on the command line. Here I give examples of how a Windows carriage return at the end of a line can stuff up some basic command-line operations.
An operation specifying the last character on a line will give the wrong result. Because CR is non-printing, the line looks OK but shell tools will recognise CR (\r) as the character at the end of the line, just before the linefeed (LF, \n):
The rev command will put a CR at the beginning:
echo with the -n option to omit a newline will fail and hang:
tr deleting newlines will fail and hang:
grepping to match a full line will fail:
reading lines in a while loop will fail:
Concatenation of successive lines with paste or sed will fail:
AWK will fail on records and fields:
diff and comm will respect CR and give unexpected results:
join will give unexpected results:
Last update: 2019-03-17