diff options
author | Bryan Newbold <bnewbold@archive.org> | 2021-02-15 21:53:38 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2021-02-15 21:53:40 -0800 |
commit | cd3e05ac8dc98d87d50c67c28968fa228ea2d016 (patch) | |
tree | f48df3ee35b9b4f1c5ced5517722c558ea54c67d /fatcat_scholar/sandcrawler.py | |
parent | 55bd186a21f5e3703e8f3ba3b0a14f1387ed0ccc (diff) | |
download | fatcat-scholar-cd3e05ac8dc98d87d50c67c28968fa228ea2d016.tar.gz fatcat-scholar-cd3e05ac8dc98d87d50c67c28968fa228ea2d016.zip |
truncate indexed fulltext body at 1 MByte
There was a large ~4 MByte document getting indexed
(work_lumgqw4vqbgvha2ejbsbaepedq) with several megabytes of text, and
this was causing elasticsearch indexing timeouts.
Diffstat (limited to 'fatcat_scholar/sandcrawler.py')
0 files changed, 0 insertions, 0 deletions