diff options
-rw-r--r-- | notes/crawl_cdx_merge.md | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/notes/crawl_cdx_merge.md b/notes/crawl_cdx_merge.md index d2cffee..d330e9b 100644 --- a/notes/crawl_cdx_merge.md +++ b/notes/crawl_cdx_merge.md @@ -11,7 +11,7 @@ Run script from scratch repo: Assuming we're just looking at PDFs: - zcat CRAWL-2000.cdx.gz | rg -i pdf | sort -S 4G -u | gzip > CRAWL-2000.sorted.cdx.gz + zcat CRAWL-2000.cdx.gz | rg -i pdf | sort -S 4G -u > CRAWL-2000.sorted.cdx ## Old Way |