Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Merge branch 'bnewbold-import-refactors' into 'master' | bnewbold | 2021-11-11 | 1 | -0/+46 |
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | import refactors and deprecations Some of these are from old stale branches (the datacite subject metadata patch), but most are from yesterday and today. Sort of a hodge-podge, but the general theme is getting around to deferred cleanups and refactors specific to importer code before making some behavioral changes. The Datacite-specific stuff could use review here. Remove unused/deprecated/dead code: - cdl_dash_dat and wayback_static importers, which were for specific early example entities and have been superseded by other importers - "extid map" sqlite3 feature from several importers, was only used for initial bulk imports (and maybe should not have been used) Refactors: - moved a number of large datastructures out of importer code and into a dedicated static file (`biblio_lookup_tables.py`). Didn't move all, just the ones that were either generic or very large (making it hard to read code) - shuffled around relative imports and some function names ("clean_str" vs. "clean") Some actual behavioral changes: - remove some Datacite-specific license slugs - stop trying to fix double-slashes in DOIs, that was causing more harm than help (some DOIs do actually have double-slashes!) - remove some excess metadata from datacite 'extra' fields | ||||
| * | add notes about 'double slash in DOI' issue | Bryan Newbold | 2021-11-09 | 1 | -0/+46 |
| | | |||||
* | | wayback ts cleanup: one more filter tweak | Bryan Newbold | 2021-11-09 | 1 | -1/+2 |
| | | |||||
* | | update cleanups notes | Bryan Newbold | 2021-11-09 | 2 | -0/+72 |
| | | |||||
* | | initial file/release bugfix cleanup worker and notes | Bryan Newbold | 2021-11-09 | 1 | -0/+144 |
| | | |||||
* | | updates to lowercase DOI cleanup | Bryan Newbold | 2021-11-09 | 1 | -0/+71 |
| | | |||||
* | | more iteration on short wayback timestamp cleanup | Bryan Newbold | 2021-11-09 | 2 | -3/+128 |
| | | |||||
* | | cleanups: tweaks to wayback CDX cleanup scripts | Bryan Newbold | 2021-11-09 | 1 | -1/+8 |
| | | |||||
* | | wayback timestamps: updates to handle 4-digit case | Bryan Newbold | 2021-11-09 | 2 | -11/+108 |
| | | |||||
* | | start work on wayback short-timestamp cleanup | Bryan Newbold | 2021-11-09 | 2 | -0/+238 |
|/ | |||||
* | update CHANGELOG date and document v0.4 prod migration stepsv0.4.0 | Bryan Newbold | 2021-10-14 | 1 | -0/+48 |
| | |||||
* | notes on v0.4 SQL migration in QA | Bryan Newbold | 2021-10-13 | 1 | -0/+42 |
| | |||||
* | another vanished content example | Bryan Newbold | 2021-10-07 | 1 | -0/+7 |
| | |||||
* | old dblp hacking notes | Bryan Newbold | 2021-06-23 | 1 | -0/+72 |
| | |||||
* | dblp import notes and bulk edit CHANGELOG update | Bryan Newbold | 2021-06-03 | 2 | -1/+47 |
| | |||||
* | DOAJ bulk import notes, and update bulk edit changelog | Bryan Newbold | 2021-06-02 | 2 | -0/+89 |
| | |||||
* | more interesting example entities (eg, to crawl) | Bryan Newbold | 2021-05-18 | 1 | -0/+19 |
| | |||||
* | dblp import notes; bulk edit changelog update | Bryan Newbold | 2020-12-29 | 2 | -1/+63 |
| | |||||
* | DOAJ import notes, and SQL/stats update | Bryan Newbold | 2020-12-23 | 1 | -0/+15 |
| | |||||
* | DOAJ import notes | Bryan Newbold | 2020-12-17 | 2 | -2/+23 |
| | |||||
* | notes on partial-progress DOAJ release metadata import | Bryan Newbold | 2020-12-14 | 1 | -0/+105 |
| | |||||
* | bulk import notes on ORCID | Bryan Newbold | 2020-12-14 | 1 | -0/+55 |
| | |||||
* | bulk edits: note ORCID update | Bryan Newbold | 2020-12-11 | 1 | -1/+5 |
| | |||||
* | ingest and proposal updates | Bryan Newbold | 2020-11-19 | 1 | -0/+44 |
| | |||||
* | more metadata cleanup task notes | Bryan Newbold | 2020-10-01 | 1 | -0/+7 |
| | |||||
* | file_meta import notes | Bryan Newbold | 2020-09-04 | 1 | -0/+75 |
| | |||||
* | bulk edit log: add notes on recent chocula import | Bryan Newbold | 2020-08-17 | 1 | -0/+17 |
| | |||||
* | example bad MAG match | Bryan Newbold | 2020-07-23 | 1 | -0/+6 |
| | |||||
* | commit old example notes | Bryan Newbold | 2020-07-01 | 3 | -0/+65 |
| | |||||
* | JALC bulk edit notes from 2020-03-23 | Bryan Newbold | 2020-07-01 | 1 | -0/+23 |
| | |||||
* | retro-active v0.3.2 changelog updates | Bryan Newbold | 2020-04-17 | 1 | -0/+9 |
| | |||||
* | notes: pubmed backfill (03/2020) | Martin Czygan | 2020-03-24 | 1 | -2/+22 |
| | |||||
* | notes on arxiv+pubmed backfill | Bryan Newbold | 2020-03-20 | 1 | -0/+37 |
| | |||||
* | basic notes in bulk edit changelog | Bryan Newbold | 2020-01-19 | 1 | -0/+7 |
| | |||||
* | bulk edit notes for datacite (QA) | Bryan Newbold | 2020-01-19 | 1 | -0/+152 |
| | |||||
* | pubmed update notes | Bryan Newbold | 2020-01-19 | 1 | -1/+46 |
| | |||||
* | chocula bulk edit note | Bryan Newbold | 2020-01-07 | 2 | -0/+15 |
| | |||||
* | update bulk edit CHANGELOG and orcid notes | Bryan Newbold | 2019-12-31 | 2 | -13/+49 |
| | |||||
* | bulk edit updates | Bryan Newbold | 2019-12-26 | 1 | -3/+4 |
| | |||||
* | pubmed bulk import notes (from QA) | Bryan Newbold | 2019-12-23 | 1 | -0/+45 |
| | |||||
* | arxiv bulk update notes | Bryan Newbold | 2019-12-22 | 2 | -2/+49 |
| | |||||
* | crossref patch bulk import | Bryan Newbold | 2019-11-12 | 2 | -0/+63 |
| | |||||
* | note file fixup pushed in prod | Bryan Newbold | 2019-10-09 | 2 | -1/+64 |
| | |||||
* | move corpus changes to 'notes/bulk_edits' | Bryan Newbold | 2019-10-08 | 3 | -0/+285 |
| | |||||
* | corpus CHANGELOG about chocula updates | Bryan Newbold | 2019-09-03 | 1 | -0/+6 |
| | |||||
* | note container updates in corpus changelog | Bryan Newbold | 2019-08-27 | 1 | -0/+5 |
| | |||||
* | recent bootstrap/import notes | Bryan Newbold | 2019-06-03 | 3 | -0/+495 |
| | |||||
* | migration notes | Bryan Newbold | 2019-05-23 | 2 | -0/+44 |
| | |||||
* | WIP metadata corpus changelog | Bryan Newbold | 2019-05-07 | 1 | -0/+41 |
| | |||||
* | user testing feedback (jefferson, old) | Bryan Newbold | 2019-05-07 | 1 | -0/+26 |
| |