diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2019-10-08 16:47:03 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2019-10-08 16:47:03 -0700 |
commit | 5808b06162263dee7e7d86d7369d19f299ddf4a9 (patch) | |
tree | 0cefb9167ad9425cb221435da4ba50791380d5f0 /notes/bulk_edits/CHANGELOG.md | |
parent | 451815af3f0581c654cb38a2aabaef800789d037 (diff) | |
download | fatcat-5808b06162263dee7e7d86d7369d19f299ddf4a9.tar.gz fatcat-5808b06162263dee7e7d86d7369d19f299ddf4a9.zip |
move corpus changes to 'notes/bulk_edits'
Diffstat (limited to 'notes/bulk_edits/CHANGELOG.md')
-rw-r--r-- | notes/bulk_edits/CHANGELOG.md | 52 |
1 files changed, 52 insertions, 0 deletions
diff --git a/notes/bulk_edits/CHANGELOG.md b/notes/bulk_edits/CHANGELOG.md new file mode 100644 index 00000000..97b8f8a2 --- /dev/null +++ b/notes/bulk_edits/CHANGELOG.md @@ -0,0 +1,52 @@ + +# Fatcat Production Import CHANGELOG + +This file tracks major content (metadata) imports to the Fatcat production +database (at https://fatcat.wiki). It complements the code CHANGELOG file. + +In general, changes that impact more than 50k entities will get logged here; +this file should probably get merged into the guide at some point. + +This file should not turn in to a TODO list! + +## 2019-09 + +Created and updated metadata for tens of thousands of containers, using +"chocula" pipeline. + + +## 2019-08 + +Merged/fixed roughly 100 container entities with invalid ISSN-L numbers (eg, +invalid ISSN checksum). + +## 2019-04 + +Imported files (matched to releases by DOI) from Semantic Scholar +(`DIRECT-OA-CRAWL-2019` crawl). + + Arabesque importer + crawl-bot + `s2_doi.sqlite` + TODO: archive.org link + TODO: rough count + TODO: date + +Imported files (matched to releases by DOI) from pre-1923/pre-1909 items uploaded +by a user to archive.org. + + Matched importer + internetarchive-bot (TODO:) + TODO: archive.org link + TODO: counts + TODO: date + +Imported files (matched to releases by DOI) from CORE.ac.uk +(`DIRECT-OA-CRAWL-2019` crawl). + +Imported files (matched to releases by DOI) from the public web (including many +repositories) from the `UNPAYWALL` 2018 crawl. + +## 2019-02 + +Bootstrapped! |