From 9d518593633fac490b47f67544787454dc69f1bf Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Thu, 9 May 2019 17:47:58 -0700 Subject: clearer CDX munge notes --- notes/crawl_cdx_merge.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'notes') diff --git a/notes/crawl_cdx_merge.md b/notes/crawl_cdx_merge.md index d2cffee..d330e9b 100644 --- a/notes/crawl_cdx_merge.md +++ b/notes/crawl_cdx_merge.md @@ -11,7 +11,7 @@ Run script from scratch repo: Assuming we're just looking at PDFs: - zcat CRAWL-2000.cdx.gz | rg -i pdf | sort -S 4G -u | gzip > CRAWL-2000.sorted.cdx.gz + zcat CRAWL-2000.cdx.gz | rg -i pdf | sort -S 4G -u > CRAWL-2000.sorted.cdx ## Old Way -- cgit v1.2.3