diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2019-12-27 11:52:31 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2019-12-31 12:19:29 -0800 |
commit | eb65f50bd4980e421281a739fa93d7ff71e4cdbb (patch) | |
tree | b089a44ca88bfc9386c9543998c8d7ffee7119e8 /notes/bulk_edits/CHANGELOG.md | |
parent | 357142f45d075634b00b49dc983e0d5f394cfc72 (diff) | |
download | fatcat-eb65f50bd4980e421281a739fa93d7ff71e4cdbb.tar.gz fatcat-eb65f50bd4980e421281a739fa93d7ff71e4cdbb.zip |
update bulk edit CHANGELOG and orcid notes
Diffstat (limited to 'notes/bulk_edits/CHANGELOG.md')
-rw-r--r-- | notes/bulk_edits/CHANGELOG.md | 19 |
1 files changed, 6 insertions, 13 deletions
diff --git a/notes/bulk_edits/CHANGELOG.md b/notes/bulk_edits/CHANGELOG.md index 80760938..773d09ef 100644 --- a/notes/bulk_edits/CHANGELOG.md +++ b/notes/bulk_edits/CHANGELOG.md @@ -11,6 +11,12 @@ This file should not turn in to a TODO list! ## 2019-12 +Started continuous harvesting Datacite DOI metadata; first date harvested was +`2019-12-13`. No importer running yet. + +Imported about 3.3m new ORCID identifiers from 2019 bulk dump (after converting +from XML to JSON): <https://archive.org/details/orcid-dump-2019> + Inserted about 154k new arxiv release entities. Still no automatic daily harvesting. @@ -45,22 +51,9 @@ invalid ISSN checksum). Imported files (matched to releases by DOI) from Semantic Scholar (`DIRECT-OA-CRAWL-2019` crawl). - Arabesque importer - crawl-bot - `s2_doi.sqlite` - TODO: archive.org link - TODO: rough count - TODO: date - Imported files (matched to releases by DOI) from pre-1923/pre-1909 items uploaded by a user to archive.org. - Matched importer - internetarchive-bot (TODO:) - TODO: archive.org link - TODO: counts - TODO: date - Imported files (matched to releases by DOI) from CORE.ac.uk (`DIRECT-OA-CRAWL-2019` crawl). |