diff options
author | Sawood Alam <ibnesayeed@gmail.com> | 2020-12-17 09:38:50 -0500 |
---|---|---|
committer | GitHub <noreply@github.com> | 2020-12-17 09:38:50 -0500 |
commit | 3fb57fa0a6664f03ef48308ed5eba3a1423bd9ff (patch) | |
tree | d6bb64ca7c8753dbcf77b8a24d4b13825567f4f9 /extra | |
parent | 5afde4690a4653db53fe4962af5da3eb9188d9a2 (diff) | |
download | fatcat-3fb57fa0a6664f03ef48308ed5eba3a1423bd9ff.tar.gz fatcat-3fb57fa0a6664f03ef48308ed5eba3a1423bd9ff.zip |
Improve status counting efficiency
When the input is large with a small number of unique items to be counted then counting as we go would be linear and more efficient approach than sorting and unique counting.
Diffstat (limited to 'extra')
-rw-r--r-- | extra/bulk_download/README.md | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/extra/bulk_download/README.md b/extra/bulk_download/README.md index 83b92fd9..19aac432 100644 --- a/extra/bulk_download/README.md +++ b/extra/bulk_download/README.md @@ -36,5 +36,5 @@ SHA1, and attempted URL. You can check for errors (and potentially try) with: Or, count status codes: - cut -f1 fetch_status.log | sort | uniq -c | sort -nr + awk '{s[$1]++} END {for(k in s){print k, s[k]}}' fetch_status.log | sort -nrk2 |