aboutsummaryrefslogtreecommitdiffstats
path: root/extra/bulk_edits/2022-04-20_isiarticles.md
blob: b0177a464f1d92a28aaabb83c14a28d690589c9c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

See metadata cleanups for context. Basically a couple tens of thousands of sample/spam articles hosted on the domain isiarticles.com.

## Prod Updates

Start small:

    export FATCAT_API_HOST=https://api.fatcat.wiki
    export FATCAT_AUTH_WORKER_CLEANUP=[...]
    export FATCAT_API_AUTH_TOKEN=$FATCAT_AUTH_WORKER_CLEANUP

    fatcat-cli search file domain:isiarticles.com --entity-json -n0 \
        | rg -v '"content_scope"' \
        | rg 'isiarticles.com/' \
        | head -n50 \
        | pv -l \
        | fatcat-cli batch update file release_ids= content_scope=sample --description 'Un-link and mark isiarticles PDFs as content_scope=sample' --auto-accept
    # editgroup_ihx75kzsebgzfisgjrv67zew5e

The full batch:

    fatcat-cli search file domain:isiarticles.com --entity-json -n0 \
        | rg -v '"content_scope"' \
        | rg 'isiarticles.com/' \
        | pv -l \
        | fatcat-cli batch update file release_ids= content_scope=sample --description 'Un-link and mark isiarticles PDFs as content_scope=sample' --auto-accept

And some more with ':80' in the URL:

    fatcat-cli search file domain:isiarticles.com '!content_scope:*' --entity-json -n0 \
        | rg -v '"content_scope"' \
        | rg 'isiarticles.com:80/' \
        | pv -l \
        | fatcat-cli batch update file release_ids= content_scope=sample --description 'Un-link and mark isiarticles PDFs as content_scope=sample' --auto-accept

Verify:

    fatcat-cli search file domain:isiarticles.com '!content_scope:*' --count
    0