aboutsummaryrefslogtreecommitdiffstats
path: root/mapreduce/README.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2018-08-08 12:14:16 -0700
committerBryan Newbold <bnewbold@archive.org>2018-08-08 12:14:21 -0700
commit71be2e685848a31888811e2e398e769f7e0486c2 (patch)
tree58026a7a473a7e301db8ba2293970f4d294cd2a0 /mapreduce/README.md
parentc4db53036eac90841eb4f970b77db8c1677ef75b (diff)
downloadsandcrawler-71be2e685848a31888811e2e398e769f7e0486c2.tar.gz
sandcrawler-71be2e685848a31888811e2e398e769f7e0486c2.zip
row-count: require f:c, not file:size
I tried using the empty List() and got a test failure, so it seems like we do need to specific *some* field here. file:size gets populated by the extraction job, not the backfill job, so I had been miscounting table sizes (counting only the number of GROBID extracted items, not rows in the table). TODO: count on key or no column, not f:c
Diffstat (limited to 'mapreduce/README.md')
0 files changed, 0 insertions, 0 deletions