aboutsummaryrefslogtreecommitdiffstats
path: root/extra/bulk_edits/2022-02-09_plos_non_articles.md
diff options
context:
space:
mode:
Diffstat (limited to 'extra/bulk_edits/2022-02-09_plos_non_articles.md')
-rw-r--r--extra/bulk_edits/2022-02-09_plos_non_articles.md69
1 files changed, 69 insertions, 0 deletions
diff --git a/extra/bulk_edits/2022-02-09_plos_non_articles.md b/extra/bulk_edits/2022-02-09_plos_non_articles.md
new file mode 100644
index 00000000..5deadf22
--- /dev/null
+++ b/extra/bulk_edits/2022-02-09_plos_non_articles.md
@@ -0,0 +1,69 @@
+
+PLOS publishes a number of non-articles, and many are not correctly marked in
+metadata.
+
+## Issue Images
+
+ fatcat-cli search releases doi_prefix:10.1371 title:image --index-json -n0 | rg '10.1371/image.' | wc -l
+ # Got 1142 hits in 92ms
+ # 348
+
+ fatcat-cli search releases doi_prefix:10.1371 title:"issue image" --count
+ # 348
+
+ export FATCAT_AUTH_WORKER_CLEANUP=[...]
+ export FATCAT_API_AUTH_TOKEN=$FATCAT_AUTH_WORKER_CLEANUP
+
+ # start small
+ fatcat-cli search releases doi_prefix:10.1371 title:"issue image" release_type:article-journal --entity-json -n 400 \
+ | jq 'select(.release_type == "article-journal")' -c \
+ | rg '10.1371/image.' \
+ | head -n50 \
+ | fatcat-cli batch update release release_type=graphic --description "PLoS Issue Images as type 'graphic'"
+ # Got 348 hits in 121ms
+ # editgroup_cq5cch7pmjglpehojhmza5hvxq
+
+ # the rest
+ fatcat-cli search releases doi_prefix:10.1371 title:"issue image" release_type:article-journal --entity-json -n 400 \
+ | jq 'select(.release_type == "article-journal")' -c \
+ | rg '10.1371/image.' \
+ | fatcat-cli batch update release release_type=graphic --description "PLoS Issue Images as type 'graphic'" --auto-accept
+ # Got 298 hits in 105ms
+
+## Non-PLOS DOI Releases
+
+ !doi_prefix:10.1371 container_id:iznnn644szdwva7khyxqzc73bi
+ # 10
+
+Some of these are "repo DOIs with `container_id`", some are DOAJ. The DOAJ ones
+did not fuzzy-match mostly because of greek characters, and should be merged...
+manually? In this case there are only a handful, but there will be more
+elsewhere.
+
+ fatcat-cli search releases title:"authors reply" 'container_id:*' 'doaj_id:*' --count
+ # 275
+
+ fatcat-cli search releases title:"authors reply" 'container_id:*' 'doaj_id:*' plos --count
+ # 5
+
+ fatcat-cli search releases '!doi_prefix:10.1371' '!pmid:*' '!doi:*' 'container_id:*' journal:plos 'doaj_id:*' --count
+ # 1511
+
+ fatcat-cli search releases '!doi_prefix:10.1371' '!pmid:*' '!doi:*' 'container_id:*' journal:plos 'doaj_id:*' '!title:correction' --count
+ # 35
+
+ fatcat-cli search releases '!doi_prefix:10.1371' 'container_id:*' journal:plos --count
+ # 2012
+
+(note: the above run while in the process of removing a lot of "RWTH" repo DOIs)
+
+Ok, after the batch fixups:
+
+ fatcat-cli search releases '!doi_prefix:10.1371' 'container_id:*' journal:plos --count
+ 1507
+
+ fatcat-cli search releases '!doi_prefix:10.1371' 'container_id:*' journal:plos '!doaj_id:*' --count
+ 4
+
+Will fix these up manually. The DOAJ cleanups will be more involved... should
+probably add a simple blocklist in DOAJ article importer to skip attempts.