aboutsummaryrefslogtreecommitdiffstats
path: root/extra/bulk_download
diff options
context:
space:
mode:
authorSawood Alam <ibnesayeed@gmail.com>2020-12-17 09:38:50 -0500
committerGitHub <noreply@github.com>2020-12-17 09:38:50 -0500
commit3fb57fa0a6664f03ef48308ed5eba3a1423bd9ff (patch)
treed6bb64ca7c8753dbcf77b8a24d4b13825567f4f9 /extra/bulk_download
parent5afde4690a4653db53fe4962af5da3eb9188d9a2 (diff)
downloadfatcat-3fb57fa0a6664f03ef48308ed5eba3a1423bd9ff.tar.gz
fatcat-3fb57fa0a6664f03ef48308ed5eba3a1423bd9ff.zip
Improve status counting efficiency
When the input is large with a small number of unique items to be counted then counting as we go would be linear and more efficient approach than sorting and unique counting.
Diffstat (limited to 'extra/bulk_download')
-rw-r--r--extra/bulk_download/README.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/extra/bulk_download/README.md b/extra/bulk_download/README.md
index 83b92fd9..19aac432 100644
--- a/extra/bulk_download/README.md
+++ b/extra/bulk_download/README.md
@@ -36,5 +36,5 @@ SHA1, and attempted URL. You can check for errors (and potentially try) with:
Or, count status codes:
- cut -f1 fetch_status.log | sort | uniq -c | sort -nr
+ awk '{s[$1]++} END {for(k in s){print k, s[k]}}' fetch_status.log | sort -nrk2