summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2020-07-01 16:36:16 -0700
committerBryan Newbold <bnewbold@robocracy.org>2020-07-01 16:36:16 -0700
commitc37e552d2a05844d1bb84ae0b55b467fb9429229 (patch)
tree0593888f7f51aa7c63013dcc121caec939a430eb
parentf53ada2addef33a0096af079281ad81143339136 (diff)
downloadfatcat-c37e552d2a05844d1bb84ae0b55b467fb9429229.tar.gz
fatcat-c37e552d2a05844d1bb84ae0b55b467fb9429229.zip
commit old example notes
-rw-r--r--notes/cleanup_tasks.txt18
-rw-r--r--notes/example_entities.txt26
-rw-r--r--notes/merge_releases_examples.txt21
3 files changed, 65 insertions, 0 deletions
diff --git a/notes/cleanup_tasks.txt b/notes/cleanup_tasks.txt
new file mode 100644
index 00000000..bf418e59
--- /dev/null
+++ b/notes/cleanup_tasks.txt
@@ -0,0 +1,18 @@
+
+Cambridge Chemical Database (NCI)
+
+ doi_prefix:10.3406 release_type:article
+
+ 193,346+ entities
+
+ should be 'dataset' not 'article'
+
+ datacite importer
+
+Frontiers
+
+ Frontiers non-PDF abstracts, which have DOIs like `10.3389/conf.*`. Should
+ crawl these, but `release_type` should be... `abstract`? There are at least
+ 18,743 of these. Should be fixed in both crossref-bot, then a retro-active
+ cleanup.
+
diff --git a/notes/example_entities.txt b/notes/example_entities.txt
new file mode 100644
index 00000000..416da610
--- /dev/null
+++ b/notes/example_entities.txt
@@ -0,0 +1,26 @@
+
+errata/update:
+ Fourth Test of General Relativity: Preliminary Results
+ 10.1103/physrevlett.20.1265
+ 10.1103/physrevlett.21.266.3
+
+ same title; later is errata to the first.
+ very minor: The term "baud length" was consistently misprinted as "band length."
+
+DOIs for individual images
+ https://commons.wikimedia.org/wiki/Category:Media_from_Williams_et_al._2010_-_10.1371/journal.pone.0010676
+
+long-tail journal not in fatcat; web-native, tricky to crawl
+ https://angryoldmanmagazine.com/
+
+dataset
+ "ISSN-Matching of Gold OA Journals (ISSN-GOLD-OA) 2.0"
+ https://pub.uni-bielefeld.de/data/2913654
+ 2 files
+ has DOI: 10.4119/unibi/2913654
+
+release group; single PDF is valid copy of two DOIs:
+ https://fatcat.wiki/file/wr64e37yvfcidgbowtslx7omne
+ 10.5167/uzh-146424
+ 10.1016/j.physletb.2017.12.006
+ ALSO: has CC-BY license_slug
diff --git a/notes/merge_releases_examples.txt b/notes/merge_releases_examples.txt
new file mode 100644
index 00000000..ca65705e
--- /dev/null
+++ b/notes/merge_releases_examples.txt
@@ -0,0 +1,21 @@
+
+https://fatcat.wiki/release/search?q=Validation+of+middle-atmospheric+campaign-based+water+vapour+measured+by+the+ground-based+microwave+radiometer
+
+ 4 releases, all dois. 3x have same author list, 1 same authors different order
+
+https://fatcat.wiki/release/search?q=Perspectives+and+pregnancy+outcomes+of+maternal+Ramadan+fasting+in+the+second+trimester+of+pregnancy
+
+ 6 releases:
+ 2 figshare article
+ 2 figshare files
+ 1 primary
+ 1 correction
+
+https://figshare.com/articles/Plasmodium_falciparum_evades_innate_immunity_by_hybrid_ABO_blood_group_phenotype_formation/8208689/119
+
+ 119 versions (!)
+
+https://fatcat.wiki/release/search?q=NeuroTrends+Visualization
+
+ 45 versions across two figshare works
+