aboutsummaryrefslogtreecommitdiffstats
path: root/notes/bulk_edits/CHANGELOG.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2019-10-08 16:47:03 -0700
committerBryan Newbold <bnewbold@robocracy.org>2019-10-08 16:47:03 -0700
commit5808b06162263dee7e7d86d7369d19f299ddf4a9 (patch)
tree0cefb9167ad9425cb221435da4ba50791380d5f0 /notes/bulk_edits/CHANGELOG.md
parent451815af3f0581c654cb38a2aabaef800789d037 (diff)
downloadfatcat-5808b06162263dee7e7d86d7369d19f299ddf4a9.tar.gz
fatcat-5808b06162263dee7e7d86d7369d19f299ddf4a9.zip
move corpus changes to 'notes/bulk_edits'
Diffstat (limited to 'notes/bulk_edits/CHANGELOG.md')
-rw-r--r--notes/bulk_edits/CHANGELOG.md52
1 files changed, 52 insertions, 0 deletions
diff --git a/notes/bulk_edits/CHANGELOG.md b/notes/bulk_edits/CHANGELOG.md
new file mode 100644
index 00000000..97b8f8a2
--- /dev/null
+++ b/notes/bulk_edits/CHANGELOG.md
@@ -0,0 +1,52 @@
+
+# Fatcat Production Import CHANGELOG
+
+This file tracks major content (metadata) imports to the Fatcat production
+database (at https://fatcat.wiki). It complements the code CHANGELOG file.
+
+In general, changes that impact more than 50k entities will get logged here;
+this file should probably get merged into the guide at some point.
+
+This file should not turn in to a TODO list!
+
+## 2019-09
+
+Created and updated metadata for tens of thousands of containers, using
+"chocula" pipeline.
+
+
+## 2019-08
+
+Merged/fixed roughly 100 container entities with invalid ISSN-L numbers (eg,
+invalid ISSN checksum).
+
+## 2019-04
+
+Imported files (matched to releases by DOI) from Semantic Scholar
+(`DIRECT-OA-CRAWL-2019` crawl).
+
+ Arabesque importer
+ crawl-bot
+ `s2_doi.sqlite`
+ TODO: archive.org link
+ TODO: rough count
+ TODO: date
+
+Imported files (matched to releases by DOI) from pre-1923/pre-1909 items uploaded
+by a user to archive.org.
+
+ Matched importer
+ internetarchive-bot (TODO:)
+ TODO: archive.org link
+ TODO: counts
+ TODO: date
+
+Imported files (matched to releases by DOI) from CORE.ac.uk
+(`DIRECT-OA-CRAWL-2019` crawl).
+
+Imported files (matched to releases by DOI) from the public web (including many
+repositories) from the `UNPAYWALL` 2018 crawl.
+
+## 2019-02
+
+Bootstrapped!