aboutsummaryrefslogtreecommitdiffstats
path: root/python_hadoop/tests
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-04-16 16:52:59 -0700
committerBryan Newbold <bnewbold@archive.org>2020-04-16 16:53:01 -0700
commit622c5bb1f9b6f4d773a31ead2fd9b14413a6fb00 (patch)
treefbfef8a75d410735c1b22d57ac49bbaae4f07ba7 /python_hadoop/tests
parent83ca181637dfc34804649e1d342e3cb3ee59b5df (diff)
downloadsandcrawler-622c5bb1f9b6f4d773a31ead2fd9b14413a6fb00.tar.gz
sandcrawler-622c5bb1f9b6f4d773a31ead2fd9b14413a6fb00.zip
persist: only GROBID updates file_meta, not file-result
The hope here is to reduce deadlocks in production (on aitio). As context, we are only doing "updates" until the entire file_meta table is filled in with full metadata anyways; updates are wasteful of resources, and most inserts we have seen the file before, so should be doing "DO NOTHING" if the SHA1 is already in the table.
Diffstat (limited to 'python_hadoop/tests')
0 files changed, 0 insertions, 0 deletions