aboutsummaryrefslogtreecommitdiffstats
path: root/extra/cleanups/file_isiarticles.md
blob: cb3785afcf83fb597dcd073e8d345384eaac5403 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

The domain isiarticles.com hosts a bunch of partial spam PDFs.

As a first pass, we can remove these via the domain itself.

A "blocklist" for this domain has been added to sandcrawler, so they should not
get auto-ingested in the future.

    # 2022-04-20
    fatcat-cli search file domain:isiarticles.com --count
    25067

## Prod Cleanup

See bulk edits log.