diff options
author | Bryan Newbold <bnewbold@archive.org> | 2018-08-08 12:14:16 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2018-08-08 12:14:21 -0700 |
commit | 71be2e685848a31888811e2e398e769f7e0486c2 (patch) | |
tree | 58026a7a473a7e301db8ba2293970f4d294cd2a0 /mapreduce/README.md | |
parent | c4db53036eac90841eb4f970b77db8c1677ef75b (diff) | |
download | sandcrawler-71be2e685848a31888811e2e398e769f7e0486c2.tar.gz sandcrawler-71be2e685848a31888811e2e398e769f7e0486c2.zip |
row-count: require f:c, not file:size
I tried using the empty List() and got a test failure, so it seems like
we do need to specific *some* field here.
file:size gets populated by the extraction job, not the backfill job, so
I had been miscounting table sizes (counting only the number of GROBID
extracted items, not rows in the table).
TODO: count on key or no column, not f:c
Diffstat (limited to 'mapreduce/README.md')
0 files changed, 0 insertions, 0 deletions