summaryrefslogtreecommitdiffstats
path: root/notes/bulk_edits/CHANGELOG.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2021-11-12 11:45:48 -0800
committerBryan Newbold <bnewbold@robocracy.org>2021-11-12 11:45:48 -0800
commitf157cc7a50e0fd9a1c79efb3c29be7d8508ffa66 (patch)
tree25bca40fe7c5752bd514b644a48989b413c14cf4 /notes/bulk_edits/CHANGELOG.md
parent51b81e0c48a1258958ff215bc5da29bef4df4009 (diff)
downloadfatcat-f157cc7a50e0fd9a1c79efb3c29be7d8508ffa66.tar.gz
fatcat-f157cc7a50e0fd9a1c79efb3c29be7d8508ffa66.zip
document cleanups run this week
Diffstat (limited to 'notes/bulk_edits/CHANGELOG.md')
-rw-r--r--notes/bulk_edits/CHANGELOG.md18
1 files changed, 18 insertions, 0 deletions
diff --git a/notes/bulk_edits/CHANGELOG.md b/notes/bulk_edits/CHANGELOG.md
index ed989c41..d82e126e 100644
--- a/notes/bulk_edits/CHANGELOG.md
+++ b/notes/bulk_edits/CHANGELOG.md
@@ -9,6 +9,24 @@ this file should probably get merged into the guide at some point.
This file should not turn in to a TODO list!
+
+## 2021-11
+
+Ran a series of cleanups. See background and prep notes in `notes/cleanups/`
+and specific final commands in this directory. Quick summary:
+
+- more than 9.5 million file entities had truncated timestamps wayback URLs,
+ and were fixed with the full timestamps. there are still a small fraction
+ (0.5%) which were identified but not corrected in this first pass
+- over 140k release entities with non-lowercase DOIs were updated with
+ lowercase DOI. all DOIs in current release entities now lowercase (at least,
+ no ASCII uppercase characters found)
+- over 220k file entities with incorrect release relation, due to an
+ import-time code bug, were fixed. a couple hundred questionable cases remain,
+ but are all mismatched due to DOI slash/double-slash issues and will not be
+ fixed in an automated way.
+
+
## 2021-06
Created new containers via chocula pipeline. Did not update any existing