diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-04-16 16:52:59 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-04-16 16:53:01 -0700 |
commit | 622c5bb1f9b6f4d773a31ead2fd9b14413a6fb00 (patch) | |
tree | fbfef8a75d410735c1b22d57ac49bbaae4f07ba7 /python_hadoop | |
parent | 83ca181637dfc34804649e1d342e3cb3ee59b5df (diff) | |
download | sandcrawler-622c5bb1f9b6f4d773a31ead2fd9b14413a6fb00.tar.gz sandcrawler-622c5bb1f9b6f4d773a31ead2fd9b14413a6fb00.zip |
persist: only GROBID updates file_meta, not file-result
The hope here is to reduce deadlocks in production (on aitio).
As context, we are only doing "updates" until the entire file_meta table
is filled in with full metadata anyways; updates are wasteful of
resources, and most inserts we have seen the file before, so should be
doing "DO NOTHING" if the SHA1 is already in the table.
Diffstat (limited to 'python_hadoop')
0 files changed, 0 insertions, 0 deletions