From 3fb57fa0a6664f03ef48308ed5eba3a1423bd9ff Mon Sep 17 00:00:00 2001 From: Sawood Alam Date: Thu, 17 Dec 2020 09:38:50 -0500 Subject: Improve status counting efficiency When the input is large with a small number of unique items to be counted then counting as we go would be linear and more efficient approach than sorting and unique counting. --- extra/bulk_download/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'extra') diff --git a/extra/bulk_download/README.md b/extra/bulk_download/README.md index 83b92fd9..19aac432 100644 --- a/extra/bulk_download/README.md +++ b/extra/bulk_download/README.md @@ -36,5 +36,5 @@ SHA1, and attempted URL. You can check for errors (and potentially try) with: Or, count status codes: - cut -f1 fetch_status.log | sort | uniq -c | sort -nr + awk '{s[$1]++} END {for(k in s){print k, s[k]}}' fetch_status.log | sort -nrk2 -- cgit v1.2.3