aboutsummaryrefslogtreecommitdiffstats
path: root/pig/filter-cdx-join-urls.pig
diff options
context:
space:
mode:
authorEllen Spertus <ellen.spertus@gmail.com>2018-07-30 11:55:19 -0700
committerEllen Spertus <ellen.spertus@gmail.com>2018-07-30 11:55:19 -0700
commit81dbd0e05653682dccb8bc74b39067b4ee7ac1f2 (patch)
tree657118763cc81f1f1ae5538bed1b18c8d82f8f6f /pig/filter-cdx-join-urls.pig
parentdd0df0fe3574352011d6a0fe3c12e59b0a4b8259 (diff)
downloadsandcrawler-81dbd0e05653682dccb8bc74b39067b4ee7ac1f2.tar.gz
sandcrawler-81dbd0e05653682dccb8bc74b39067b4ee7ac1f2.zip
Changed scoring, including adding code to compute string differences. Turned off line length checking.
New scores: ['(583,sha1:K2DKSSVTXWPRMFDTWSTCQW3RVWRIOV3Q,DOI-0,'title 1','title 1: tng')'] ['(500,sha1:K2DKSSVTXWPRMFDTWSTCQW3RVWRIOV3Q,DOI-0.5,'title 1','title 1: tng 2')'] ['(500,sha1:K2DKSSVTXWPRMFDTWSTCQW3RVWRIOV3Q,DOI-0.75,'title 1','title 1: tng 3')'] ['(588,sha1:C3YNNEGH5WAG5ZAAXWAEBNXJWT6CZ3WU,DOI-1,'title 2: tng','title 2: rebooted')']
Diffstat (limited to 'pig/filter-cdx-join-urls.pig')
0 files changed, 0 insertions, 0 deletions