summaryrefslogtreecommitdiffstats
path: root/fatcat_scholar/sandcrawler.py
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-06-03 19:30:15 -0700
committerBryan Newbold <bnewbold@archive.org>2020-06-03 19:32:50 -0700
commitf9035c7ca9637668911afa7e9345138563aad33e (patch)
treef6bd0f817190e315d9e8b0016ab1a7e0d5c73c7f /fatcat_scholar/sandcrawler.py
parent9722f39e38a45d3201c836f0c2805ae9f6c1f581 (diff)
downloadfatcat-scholar-f9035c7ca9637668911afa7e9345138563aad33e.tar.gz
fatcat-scholar-f9035c7ca9637668911afa7e9345138563aad33e.zip
improve text scrubbing
Was going to use textpipe, but dependency was too large and failed to install with halfway modern GCC (due to CLD2 issue): https://github.com/GregBowyer/cld2-cffi/issues/12 So instead basically pulled out the clean_text function, which is quite short.
Diffstat (limited to 'fatcat_scholar/sandcrawler.py')
0 files changed, 0 insertions, 0 deletions