From 3fb57fa0a6664f03ef48308ed5eba3a1423bd9ff Mon Sep 17 00:00:00 2001
From: Sawood Alam <ibnesayeed@gmail.com>
Date: Thu, 17 Dec 2020 09:38:50 -0500
Subject: Improve status counting efficiency

When the input is large with a small number of unique items to be counted then counting as we go would be linear and more efficient approach than sorting and unique counting.
---
 extra/bulk_download/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'extra')

diff --git a/extra/bulk_download/README.md b/extra/bulk_download/README.md
index 83b92fd9..19aac432 100644
--- a/extra/bulk_download/README.md
+++ b/extra/bulk_download/README.md
@@ -36,5 +36,5 @@ SHA1, and attempted URL. You can check for errors (and potentially try) with:
 
 Or, count status codes:
 
-    cut -f1 fetch_status.log | sort | uniq -c | sort -nr
+    awk '{s[$1]++} END {for(k in s){print k, s[k]}}' fetch_status.log | sort -nrk2
 
-- 
cgit v1.2.3