aboutsummaryrefslogtreecommitdiffstats
path: root/notes
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2019-05-09 17:47:58 -0700
committerBryan Newbold <bnewbold@archive.org>2019-05-09 17:47:58 -0700
commit9d518593633fac490b47f67544787454dc69f1bf (patch)
tree24fdaeb9086331b2020a67c3c66bf16c8212090e /notes
parent27d149734439ee68738957df76cfb6f687b3f19b (diff)
downloadsandcrawler-9d518593633fac490b47f67544787454dc69f1bf.tar.gz
sandcrawler-9d518593633fac490b47f67544787454dc69f1bf.zip
clearer CDX munge notes
Diffstat (limited to 'notes')
-rw-r--r--notes/crawl_cdx_merge.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/notes/crawl_cdx_merge.md b/notes/crawl_cdx_merge.md
index d2cffee..d330e9b 100644
--- a/notes/crawl_cdx_merge.md
+++ b/notes/crawl_cdx_merge.md
@@ -11,7 +11,7 @@ Run script from scratch repo:
Assuming we're just looking at PDFs:
- zcat CRAWL-2000.cdx.gz | rg -i pdf | sort -S 4G -u | gzip > CRAWL-2000.sorted.cdx.gz
+ zcat CRAWL-2000.cdx.gz | rg -i pdf | sort -S 4G -u > CRAWL-2000.sorted.cdx
## Old Way