grep count multiple occurrences
Matthew Martinez
Is it possible to do a grep count of multiple occurrences in a file in one single command? For example:
$ cat > file
blah alfa
beta blah
blah blahgamma
gammaI can do:
grep -c 'alfa' file
1
grep -c 'beta' file
1
grep -c 'gamma' file
2But is it possible to so domething like:
grep -c -e 'alfa' -e 'beta' -e 'gamma' -somemoreblackmagic fileand get the counts for each of them?
alfa 1
beta 1
gamma 2 5 8 Answers
I don't think grep is capable of what you want to do.
Just use awk instead:-)
This solution may not work well for large files (is not optimized). And works for plain words only - not regexps. But it's easy to add some features if so desired.
Low end version with restrictions outlined in comments below:
awk '
{ split($0, b); for (i in b) ++A[b[i]]
}
END { split("'"$*"'", a) for (i in a) print sprintf("%s %d", a[i], A[a[i]])
}
'just give the search strings directly to the script
[EDIT]
fixed version with regex support (see comment below).
Please tell me if there still are any open issues.
# ---- my favorite ----
awk -F' ?-c ' '
BEGIN { split("'"$*"'", a) }
{ for (i = 2; a[i]; ++i) if (match($0, a[i])) ++A[i] }
END { for (i = 2; a[i]; ++i) if (A[i]) print a[i] " " A[i] }
'
# ---- my favorite ----sample usage:
script_name -c alfa -c beta -c gamma << !
alfa
beta
gamma
gamma
!gives:
alfa 1
beta 1
gamma 2regex usage:
script_name -c "^al" -c "beta" -c gamma -c "m.$" << !
alfa
beta
gamma
gamma
!gives:
^al 1
beta 1
gamma 2
m.$ 2[/EDIT]
4You can get what you need just by using grep, sort and uniq.
grep -EIho 'alfa|beta|gamma' *|sort|uniq -c 3 Another awk solution, with shell script wrapper thrown in:
#!/bin/sh –
awk '
BEGIN { split("alfa beta gamma", keyword) for (i in keyword) count[keyword[i]]=0
}
/alfa/ { count["alfa"]++ }
/beta/ { count["beta"]++ }
/gamma/ { count["gamma"]++ }
END { for (i in keyword) print keyword[i], count[keyword[i]]
}'If you want to be able to choose the search keywords at runtime (and provide them as arguments, as in sparkie’s answer), this script can be adapted to build the awk script dynamically.
Perl solution:
perl -lne 'chomp;$s{$_}++ if /alpha|beta|gamma/ }{ print "$_ $s{$_}" for keys %s' file 1 No grep cannot do this in one pass, I would suggest using awk:
awk -v pat='alfa beta gamma' ' BEGIN { split(pat, p) } { for(k in p) if($0 ~ p[k]) c[k]++ } END { for(k in p) print p[k], c[k]?c[k]:0 }
'Or as a rather long one-liner:
awk -v pat='alfa beta gamma' 'BEGIN { split(pat, p) } { for(k in p) if($0 ~ p[k]) c[k]++ } END { for(k in p) print p[k], c[k]?c[k]:0 }'Explanation
pat is split into the p array, which is then used to search for matches on each line ($0 ~ p[k]). The counters are held in the c array. The c[k]?c[k]:0 bit uses the ternary operator to print 0 when c[k] is zero.
Note if your pattern contains space, you need to use a different delimiter between the patterns in pat and to update the split command accordingly.
Testing
Input:
cat << EOF > file
alfa
beta
gamma
gamma
EOFOutput with pat='alfa beta gamma':
alfa 1
beta 1
gamma 2Input:
cat << EOF > file
alfa beta
beta
gamma gamma
gamma alfa
alfalfa
alfa alfa
EOFOutput with pat='^a a$ alfa beta gamma':
beta 2
gamma 2
^a 3
a$ 6
alfa 4The output matches in both cases the output from running grep -c with each pattern individually.
I'd suggest to use uniq (with sort).
$ sort file | uniq -c
1 alfa
1 beta
2 gammaYou need sort if the file might not be sorted (in fact, only if the multiple occurences might not be on consecutive lines).
UPDATE:
Assuming that you have predefined patterns and they don't contain space:
$ PATTERNS='alfa beta gamma'
$ for P in $PATTERNS; do echo $P `grep -c $P file`; done
alfa 1
beta 1
gamma 2 2 Here one sample from my daily work:
All Files ending FlowBase.java, Count occurence of String "Input*" > 1
Example: file will listed
"inputABD"
"inputABD"
$ for i in $(find . | grep FlowBase.java); do echo $i $(egrep "input." $i | sed 's/^."input//' | sed 's/";.*//' | uniq -c | awk '($1 > 1) { print $2}' | wc -l); done | awk '($2 > 0) {print $1}'
Just use -o to collect occurrences, then count the number of occurrences with wc -l.
Works single:
grep -o 'alfa' file | wc -l 1
grep -o 'beta' file | wc -l 1
grep -o 'gamma' file | wc -l 2Group counts:
grep -o -e 'alfa' -e 'beta' -e 'gamma' file | wc -l 4Individual counts: (sorted by frequency)
grep -o -e 'alfa' -e 'beta' -e 'gamma' file | sort | uniq -c | sort 1 alfa 1 beta 2 gammaFar more straight forward (and memorable) than other answers