blob: 423c7aa3d653e13e5a2a16758c32f3606fe66140 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
|
Ran a cleanup of ~24k file entities from the domain
`page-one.live.cf.public.springer.com`, which are not entire journal articles
but just "samples" (one or two pages).
See `file_single_page` cleanup notes for prep and background.
## Prod
Configure CLI:
export FATCAT_API_HOST=https://api.fatcat.wiki
export FATCAT_AUTH_WORKER_CLEANUP=[...]
export FATCAT_API_AUTH_TOKEN=$FATCAT_AUTH_WORKER_CLEANUP
fatcat-cli --version
fatcat-cli 0.1.6
fatcat-cli status
API Version: 0.5.0 (local)
API host: https://api.fatcat.wiki [successfully connected]
Last changelog: 5634988
API auth token: [configured]
Account: cleanup-bot [bot] [admin] [active]
editor_vvnmtzskhngxnicockn4iavyxq
Start small:
zcat /srv/fatcat/datasets/files_pageone.json.gz \
| jq '"file_" + .ident' -r \
| head -n50 \
| parallel -j1 fatcat-cli get {} --json \
| jq . -c \
| rg -v '"content_scope"' \
| rg 'page-one.live.cf.public.springer.com' \
| pv -l \
| fatcat-cli batch update file release_ids= content_scope=sample --description 'Un-link and mark Springer "page-one" preview PDF files as content_scope=sample'
# editgroup_hcumfatcvjg3fheycnm2uay5aq
Looks good, accepted that editgroup.
Run entire batch, in auto-accept mode:
zcat /srv/fatcat/datasets/files_pageone.json.gz \
| jq '"file_" + .ident' -r \
| parallel -j1 fatcat-cli get {} --json \
| jq . -c \
| rg -v '"content_scope"' \
| rg 'page-one.live.cf.public.springer.com' \
| pv -l \
| fatcat-cli batch update file release_ids= content_scope=sample --description 'Un-link and mark Springer "page-one" preview PDF files as content_scope=sample' --auto-accept
# 24.4k 0:20:06 [20.2 /s]
|