aboutsummaryrefslogtreecommitdiffstats
path: root/extra/bulk_edits/2022-02-09_plos_non_articles.md
blob: 5deadf22f4bb02a1aab42e7eba99ad9e550158ad (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

PLOS publishes a number of non-articles, and many are not correctly marked in
metadata.

## Issue Images

    fatcat-cli search releases doi_prefix:10.1371 title:image --index-json -n0 | rg '10.1371/image.' | wc -l
    # Got 1142 hits in 92ms
    # 348

    fatcat-cli search releases doi_prefix:10.1371 title:"issue image" --count
    # 348

    export FATCAT_AUTH_WORKER_CLEANUP=[...]
    export FATCAT_API_AUTH_TOKEN=$FATCAT_AUTH_WORKER_CLEANUP

    # start small
    fatcat-cli search releases doi_prefix:10.1371 title:"issue image" release_type:article-journal --entity-json -n 400 \
        | jq 'select(.release_type == "article-journal")' -c \
        | rg '10.1371/image.' \
        | head -n50 \
        | fatcat-cli batch update release release_type=graphic --description "PLoS Issue Images as type 'graphic'"
    # Got 348 hits in 121ms
    # editgroup_cq5cch7pmjglpehojhmza5hvxq

    # the rest
    fatcat-cli search releases doi_prefix:10.1371 title:"issue image" release_type:article-journal --entity-json -n 400 \
        | jq 'select(.release_type == "article-journal")' -c \
        | rg '10.1371/image.' \
        | fatcat-cli batch update release release_type=graphic --description "PLoS Issue Images as type 'graphic'" --auto-accept
    # Got 298 hits in 105ms

## Non-PLOS DOI Releases

    !doi_prefix:10.1371 container_id:iznnn644szdwva7khyxqzc73bi
    # 10

Some of these are "repo DOIs with `container_id`", some are DOAJ. The DOAJ ones
did not fuzzy-match mostly because of greek characters, and should be merged...
manually? In this case there are only a handful, but there will be more
elsewhere.

    fatcat-cli search releases title:"authors reply" 'container_id:*' 'doaj_id:*' --count
    # 275

    fatcat-cli search releases title:"authors reply" 'container_id:*' 'doaj_id:*' plos --count
    # 5

    fatcat-cli search releases '!doi_prefix:10.1371' '!pmid:*' '!doi:*' 'container_id:*' journal:plos 'doaj_id:*' --count
    # 1511

    fatcat-cli search releases '!doi_prefix:10.1371' '!pmid:*' '!doi:*' 'container_id:*' journal:plos 'doaj_id:*' '!title:correction' --count
    # 35

    fatcat-cli search releases '!doi_prefix:10.1371' 'container_id:*' journal:plos --count
    # 2012

(note: the above run while in the process of removing a lot of "RWTH" repo DOIs)

Ok, after the batch fixups:

    fatcat-cli search releases '!doi_prefix:10.1371' 'container_id:*' journal:plos --count
    1507

    fatcat-cli search releases '!doi_prefix:10.1371' 'container_id:*' journal:plos '!doaj_id:*' --count
    4

Will fix these up manually. The DOAJ cleanups will be more involved... should
probably add a simple blocklist in DOAJ article importer to skip attempts.