Velvet Star Monitor

Standout celebrity highlights with iconic style.

general

grep count multiple occurrences

Writer Matthew Martinez

Is it possible to do a grep count of multiple occurrences in a file in one single command? For example:

$ cat > file
blah alfa
beta blah
blah blahgamma
gamma

I can do:

grep -c 'alfa' file
1
grep -c 'beta' file
1
grep -c 'gamma' file
2

But is it possible to so domething like:

grep -c -e 'alfa' -e 'beta' -e 'gamma' -somemoreblackmagic file

and get the counts for each of them?

alfa 1
beta 1
gamma 2
5

8 Answers

I don't think grep is capable of what you want to do.

Just use awk instead:-)

This solution may not work well for large files (is not optimized). And works for plain words only - not regexps. But it's easy to add some features if so desired.

Low end version with restrictions outlined in comments below:

awk '
{ split($0, b); for (i in b) ++A[b[i]]
}
END { split("'"$*"'", a) for (i in a) print sprintf("%s %d", a[i], A[a[i]])
}
'

just give the search strings directly to the script

[EDIT]
fixed version with regex support (see comment below). Please tell me if there still are any open issues.

# ---- my favorite ----
awk -F' ?-c ' '
BEGIN { split("'"$*"'", a) }
{ for (i = 2; a[i]; ++i) if (match($0, a[i])) ++A[i] }
END { for (i = 2; a[i]; ++i) if (A[i]) print a[i] " " A[i] }
'
# ---- my favorite ----

sample usage:

script_name -c alfa -c beta -c gamma << !
alfa
beta
gamma
gamma
!

gives:

alfa 1
beta 1
gamma 2

regex usage:

script_name -c "^al" -c "beta" -c gamma -c "m.$" << !
alfa
beta
gamma
gamma
!

gives:

^al 1
beta 1
gamma 2
m.$ 2

[/EDIT]

4

You can get what you need just by using grep, sort and uniq.

grep -EIho 'alfa|beta|gamma' *|sort|uniq -c
3

Another awk solution, with shell script wrapper thrown in:

#!/bin/sh –
awk '
BEGIN { split("alfa beta gamma", keyword) for (i in keyword) count[keyword[i]]=0
}
/alfa/ { count["alfa"]++ }
/beta/ { count["beta"]++ }
/gamma/ { count["gamma"]++ }
END { for (i in keyword) print keyword[i], count[keyword[i]]
}'

If you want to be able to choose the search keywords at runtime (and provide them as arguments, as in sparkie’s answer), this script can be adapted to build the awk script dynamically.

1

Perl solution:

perl -lne 'chomp;$s{$_}++ if /alpha|beta|gamma/ }{ print "$_ $s{$_}" for keys %s' file
1

No grep cannot do this in one pass, I would suggest using awk:

awk -v pat='alfa beta gamma' ' BEGIN { split(pat, p) } { for(k in p) if($0 ~ p[k]) c[k]++ } END { for(k in p) print p[k], c[k]?c[k]:0 }
'

Or as a rather long one-liner:

awk -v pat='alfa beta gamma' 'BEGIN { split(pat, p) } { for(k in p) if($0 ~ p[k]) c[k]++ } END { for(k in p) print p[k], c[k]?c[k]:0 }'

Explanation

pat is split into the p array, which is then used to search for matches on each line ($0 ~ p[k]). The counters are held in the c array. The c[k]?c[k]:0 bit uses the ternary operator to print 0 when c[k] is zero.

Note if your pattern contains space, you need to use a different delimiter between the patterns in pat and to update the split command accordingly.

Testing

Input:

cat << EOF > file
alfa
beta
gamma
gamma
EOF

Output with pat='alfa beta gamma':

alfa 1
beta 1
gamma 2

Input:

cat << EOF > file
alfa beta
beta
gamma gamma
gamma alfa
alfalfa
alfa alfa
EOF

Output with pat='^a a$ alfa beta gamma':

beta 2
gamma 2
^a 3
a$ 6
alfa 4

The output matches in both cases the output from running grep -c with each pattern individually.

2

I'd suggest to use uniq (with sort).

$ sort file | uniq -c
1 alfa
1 beta
2 gamma

You need sort if the file might not be sorted (in fact, only if the multiple occurences might not be on consecutive lines).

UPDATE:

Assuming that you have predefined patterns and they don't contain space:

$ PATTERNS='alfa beta gamma'
$ for P in $PATTERNS; do echo $P `grep -c $P file`; done
alfa 1
beta 1
gamma 2
2

Here one sample from my daily work:

All Files ending FlowBase.java, Count occurence of String "Input*" > 1

Example: file will listed

"inputABD"

"inputABD"

$ for i in $(find . | grep FlowBase.java); do echo $i $(egrep "input." $i | sed 's/^."input//' | sed 's/";.*//' | uniq -c | awk '($1 > 1) { print $2}' | wc -l); done | awk '($2 > 0) {print $1}'

Just use -o to collect occurrences, then count the number of occurrences with wc -l.

Works single:

grep -o 'alfa' file | wc -l 1
grep -o 'beta' file | wc -l 1
grep -o 'gamma' file | wc -l 2

Group counts:

grep -o -e 'alfa' -e 'beta' -e 'gamma' file | wc -l 4

Individual counts: (sorted by frequency)

grep -o -e 'alfa' -e 'beta' -e 'gamma' file | sort | uniq -c | sort 1 alfa 1 beta 2 gamma

Far more straight forward (and memorable) than other answers

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy