aboutsummaryrefslogtreecommitdiffstats
path: root/pig/filter-cdx-paper-pdfs.pig
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2022-07-15 13:52:05 -0700
committerBryan Newbold <bnewbold@archive.org>2022-07-15 13:52:07 -0700
commitb10105cbf74f87acf417dfea9e324b1dbff3b8ec (patch)
treed77f51038a6ae39c88fe687c3b3f46b45d501071 /pig/filter-cdx-paper-pdfs.pig
parent3f8fed325a3dd8d51652dffab89880c1cf25656b (diff)
downloadsandcrawler-b10105cbf74f87acf417dfea9e324b1dbff3b8ec.tar.gz
sandcrawler-b10105cbf74f87acf417dfea9e324b1dbff3b8ec.zip
html: fulltext URL prefixes to skip; also fix broken pattern matching
Due to both the 'continue-in-a-for-loop' and 'missing-trailing-commas', the existing pattern matching was not working.
Diffstat (limited to 'pig/filter-cdx-paper-pdfs.pig')
0 files changed, 0 insertions, 0 deletions