diff options
author | Bryan Newbold <bnewbold@archive.org> | 2022-07-15 13:52:05 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2022-07-15 13:52:07 -0700 |
commit | b10105cbf74f87acf417dfea9e324b1dbff3b8ec (patch) | |
tree | d77f51038a6ae39c88fe687c3b3f46b45d501071 /pig/filter-cdx-paper-pdfs.pig | |
parent | 3f8fed325a3dd8d51652dffab89880c1cf25656b (diff) | |
download | sandcrawler-b10105cbf74f87acf417dfea9e324b1dbff3b8ec.tar.gz sandcrawler-b10105cbf74f87acf417dfea9e324b1dbff3b8ec.zip |
html: fulltext URL prefixes to skip; also fix broken pattern matching
Due to both the 'continue-in-a-for-loop' and 'missing-trailing-commas',
the existing pattern matching was not working.
Diffstat (limited to 'pig/filter-cdx-paper-pdfs.pig')
0 files changed, 0 insertions, 0 deletions