Bash alternative to trim newline \n and extra white-space
Mia Lopez
I am trying to parse a multiline sentence:
You have to go tomorrow by car.As you can see there is a new line + space then "car."
I used this regex:
You.have.to.go.tomorrow.by.\n.+It worked great when I used it with regex101 but when I used it in bash, it worked for only the first sentence:
Parser='You.have.to.go.tomorrow.by.\n.+'Result:
You have to go tomorrow byI am using bash, I want the full sentence:
"You have to go tomorrow by car."I am using:
sed -e 's/<[^>]\+>/ /g' | grep -oP $parserto delete all HTML tags then grep for the parser.
52 Answers
-z, --null-data separates lines with NUL character instead of newline, which makes it possible to match newlines.
grep -Pzo \
'You have to go tomorrow by\n\s+car.' text | tr -s '\n ' ' 'If you were to do it in pure bash, you would probably need to ANSI-quote your pattern to represent newline.
#!/bin/bash
pattern=$'You have to go tomorrow by\n\s+car.'
[[ $(<text) \
=~ ($pattern) ]] && echo ${BASH_REMATCH}Assume you intend to clean up only the line you are referring to, then it's possible to combine substitutions. Matching the line containing 'You have to go tomorrow by' we can then group and run multiple commands with braces {...}, separated by semicolons, on this match.
sed -rn '/You have to go tomorrow by/{N; s/\n//; s/ {2,}/ /; s/<[^>]+>//g;p}' textNRead the next line and add it to the pattern space.ssubstitute text.gglobal, substitute all occurrences in the line.pprint.
With tr
<FileName tr -s '\n' ' 'With xargs
<FileName xargsNotice
Change FileName with the name of the file containing the two lines.
<FileNamewill read the file and output to STDIN... i.e something likecat FileName |.tr -s '\n' ' 'brings the two lines into one line and removes multiple spaces leaving only single spaces.xargsby default trims newlines and extra white space as part of its job to converts input from STDIN into arguments to a command... i.e. this is how it works to do its job.
You can also pipe the output from sed to tr like so:
<FileName sed -e 's/<[^>]\+>//g' | tr -s '\n' ' 'or from sed to xargs like so:
<FileName sed -e 's/<[^>]\+>//g' | xargs