diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2020-07-01 16:36:16 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2020-07-01 16:36:16 -0700 |
commit | c37e552d2a05844d1bb84ae0b55b467fb9429229 (patch) | |
tree | 0593888f7f51aa7c63013dcc121caec939a430eb /notes/cleanup_tasks.txt | |
parent | f53ada2addef33a0096af079281ad81143339136 (diff) | |
download | fatcat-c37e552d2a05844d1bb84ae0b55b467fb9429229.tar.gz fatcat-c37e552d2a05844d1bb84ae0b55b467fb9429229.zip |
commit old example notes
Diffstat (limited to 'notes/cleanup_tasks.txt')
-rw-r--r-- | notes/cleanup_tasks.txt | 18 |
1 files changed, 18 insertions, 0 deletions
diff --git a/notes/cleanup_tasks.txt b/notes/cleanup_tasks.txt new file mode 100644 index 00000000..bf418e59 --- /dev/null +++ b/notes/cleanup_tasks.txt @@ -0,0 +1,18 @@ + +Cambridge Chemical Database (NCI) + + doi_prefix:10.3406 release_type:article + + 193,346+ entities + + should be 'dataset' not 'article' + + datacite importer + +Frontiers + + Frontiers non-PDF abstracts, which have DOIs like `10.3389/conf.*`. Should + crawl these, but `release_type` should be... `abstract`? There are at least + 18,743 of these. Should be fixed in both crossref-bot, then a retro-active + cleanup. + |