Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | move 'cleanups' directory from notes to extra/ | Bryan Newbold | 2021-11-29 | 11 | -1306/+0 |
| | |||||
* | Merge branch 'bnewbold-container-merger' | Bryan Newbold | 2021-11-29 | 2 | -0/+160 |
|\ | |||||
| * | notes on container ISSN-L merging, tested in QA | Bryan Newbold | 2021-11-24 | 2 | -0/+160 |
| | | |||||
* | | notes on file_meta partial cleanup | Bryan Newbold | 2021-11-24 | 2 | -0/+196 |
|/ | |||||
* | Merge branch 'bnewbold-mergers' into 'master' | bnewbold | 2021-11-25 | 2 | -0/+136 |
|\ | | | | | | | | | entity mergers framework See merge request webgroup/fatcat!133 | ||||
| * | file de-dupe: notes on prep and QA testing | Bryan Newbold | 2021-11-23 | 2 | -0/+136 |
| | | |||||
* | | codepsell fixes to notes | Bryan Newbold | 2021-11-24 | 1 | -2/+2 |
|/ | |||||
* | document cleanups run this week | Bryan Newbold | 2021-11-12 | 1 | -0/+13 |
| | |||||
* | Merge branch 'bnewbold-import-refactors' into 'master' | bnewbold | 2021-11-11 | 1 | -0/+46 |
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | import refactors and deprecations Some of these are from old stale branches (the datacite subject metadata patch), but most are from yesterday and today. Sort of a hodge-podge, but the general theme is getting around to deferred cleanups and refactors specific to importer code before making some behavioral changes. The Datacite-specific stuff could use review here. Remove unused/deprecated/dead code: - cdl_dash_dat and wayback_static importers, which were for specific early example entities and have been superseded by other importers - "extid map" sqlite3 feature from several importers, was only used for initial bulk imports (and maybe should not have been used) Refactors: - moved a number of large datastructures out of importer code and into a dedicated static file (`biblio_lookup_tables.py`). Didn't move all, just the ones that were either generic or very large (making it hard to read code) - shuffled around relative imports and some function names ("clean_str" vs. "clean") Some actual behavioral changes: - remove some Datacite-specific license slugs - stop trying to fix double-slashes in DOIs, that was causing more harm than help (some DOIs do actually have double-slashes!) - remove some excess metadata from datacite 'extra' fields | ||||
| * | add notes about 'double slash in DOI' issue | Bryan Newbold | 2021-11-09 | 1 | -0/+46 |
| | |||||
* | wayback ts cleanup: one more filter tweak | Bryan Newbold | 2021-11-09 | 1 | -1/+2 |
| | |||||
* | update cleanups notes | Bryan Newbold | 2021-11-09 | 2 | -0/+72 |
| | |||||
* | initial file/release bugfix cleanup worker and notes | Bryan Newbold | 2021-11-09 | 1 | -0/+144 |
| | |||||
* | updates to lowercase DOI cleanup | Bryan Newbold | 2021-11-09 | 1 | -0/+71 |
| | |||||
* | more iteration on short wayback timestamp cleanup | Bryan Newbold | 2021-11-09 | 2 | -3/+128 |
| | |||||
* | cleanups: tweaks to wayback CDX cleanup scripts | Bryan Newbold | 2021-11-09 | 1 | -1/+8 |
| | |||||
* | wayback timestamps: updates to handle 4-digit case | Bryan Newbold | 2021-11-09 | 2 | -11/+108 |
| | |||||
* | start work on wayback short-timestamp cleanup | Bryan Newbold | 2021-11-09 | 2 | -0/+238 |