blob: cb3785afcf83fb597dcd073e8d345384eaac5403 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
The domain isiarticles.com hosts a bunch of partial spam PDFs.
As a first pass, we can remove these via the domain itself.
A "blocklist" for this domain has been added to sandcrawler, so they should not
get auto-ingested in the future.
# 2022-04-20
fatcat-cli search file domain:isiarticles.com --count
25067
## Prod Cleanup
See bulk edits log.
|