diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2021-11-12 11:45:48 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2021-11-12 11:45:48 -0800 |
commit | f157cc7a50e0fd9a1c79efb3c29be7d8508ffa66 (patch) | |
tree | 25bca40fe7c5752bd514b644a48989b413c14cf4 /notes/bulk_edits/CHANGELOG.md | |
parent | 51b81e0c48a1258958ff215bc5da29bef4df4009 (diff) | |
download | fatcat-f157cc7a50e0fd9a1c79efb3c29be7d8508ffa66.tar.gz fatcat-f157cc7a50e0fd9a1c79efb3c29be7d8508ffa66.zip |
document cleanups run this week
Diffstat (limited to 'notes/bulk_edits/CHANGELOG.md')
-rw-r--r-- | notes/bulk_edits/CHANGELOG.md | 18 |
1 files changed, 18 insertions, 0 deletions
diff --git a/notes/bulk_edits/CHANGELOG.md b/notes/bulk_edits/CHANGELOG.md index ed989c41..d82e126e 100644 --- a/notes/bulk_edits/CHANGELOG.md +++ b/notes/bulk_edits/CHANGELOG.md @@ -9,6 +9,24 @@ this file should probably get merged into the guide at some point. This file should not turn in to a TODO list! + +## 2021-11 + +Ran a series of cleanups. See background and prep notes in `notes/cleanups/` +and specific final commands in this directory. Quick summary: + +- more than 9.5 million file entities had truncated timestamps wayback URLs, + and were fixed with the full timestamps. there are still a small fraction + (0.5%) which were identified but not corrected in this first pass +- over 140k release entities with non-lowercase DOIs were updated with + lowercase DOI. all DOIs in current release entities now lowercase (at least, + no ASCII uppercase characters found) +- over 220k file entities with incorrect release relation, due to an + import-time code bug, were fixed. a couple hundred questionable cases remain, + but are all mismatched due to DOI slash/double-slash issues and will not be + fixed in an automated way. + + ## 2021-06 Created new containers via chocula pipeline. Did not update any existing |