From c37e552d2a05844d1bb84ae0b55b467fb9429229 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Wed, 1 Jul 2020 16:36:16 -0700 Subject: commit old example notes --- notes/cleanup_tasks.txt | 18 ++++++++++++++++++ notes/example_entities.txt | 26 ++++++++++++++++++++++++++ notes/merge_releases_examples.txt | 21 +++++++++++++++++++++ 3 files changed, 65 insertions(+) create mode 100644 notes/cleanup_tasks.txt create mode 100644 notes/example_entities.txt create mode 100644 notes/merge_releases_examples.txt diff --git a/notes/cleanup_tasks.txt b/notes/cleanup_tasks.txt new file mode 100644 index 00000000..bf418e59 --- /dev/null +++ b/notes/cleanup_tasks.txt @@ -0,0 +1,18 @@ + +Cambridge Chemical Database (NCI) + + doi_prefix:10.3406 release_type:article + + 193,346+ entities + + should be 'dataset' not 'article' + + datacite importer + +Frontiers + + Frontiers non-PDF abstracts, which have DOIs like `10.3389/conf.*`. Should + crawl these, but `release_type` should be... `abstract`? There are at least + 18,743 of these. Should be fixed in both crossref-bot, then a retro-active + cleanup. + diff --git a/notes/example_entities.txt b/notes/example_entities.txt new file mode 100644 index 00000000..416da610 --- /dev/null +++ b/notes/example_entities.txt @@ -0,0 +1,26 @@ + +errata/update: + Fourth Test of General Relativity: Preliminary Results + 10.1103/physrevlett.20.1265 + 10.1103/physrevlett.21.266.3 + + same title; later is errata to the first. + very minor: The term "baud length" was consistently misprinted as "band length." + +DOIs for individual images + https://commons.wikimedia.org/wiki/Category:Media_from_Williams_et_al._2010_-_10.1371/journal.pone.0010676 + +long-tail journal not in fatcat; web-native, tricky to crawl + https://angryoldmanmagazine.com/ + +dataset + "ISSN-Matching of Gold OA Journals (ISSN-GOLD-OA) 2.0" + https://pub.uni-bielefeld.de/data/2913654 + 2 files + has DOI: 10.4119/unibi/2913654 + +release group; single PDF is valid copy of two DOIs: + https://fatcat.wiki/file/wr64e37yvfcidgbowtslx7omne + 10.5167/uzh-146424 + 10.1016/j.physletb.2017.12.006 + ALSO: has CC-BY license_slug diff --git a/notes/merge_releases_examples.txt b/notes/merge_releases_examples.txt new file mode 100644 index 00000000..ca65705e --- /dev/null +++ b/notes/merge_releases_examples.txt @@ -0,0 +1,21 @@ + +https://fatcat.wiki/release/search?q=Validation+of+middle-atmospheric+campaign-based+water+vapour+measured+by+the+ground-based+microwave+radiometer + + 4 releases, all dois. 3x have same author list, 1 same authors different order + +https://fatcat.wiki/release/search?q=Perspectives+and+pregnancy+outcomes+of+maternal+Ramadan+fasting+in+the+second+trimester+of+pregnancy + + 6 releases: + 2 figshare article + 2 figshare files + 1 primary + 1 correction + +https://figshare.com/articles/Plasmodium_falciparum_evades_innate_immunity_by_hybrid_ABO_blood_group_phenotype_formation/8208689/119 + + 119 versions (!) + +https://fatcat.wiki/release/search?q=NeuroTrends+Visualization + + 45 versions across two figshare works + -- cgit v1.2.3