summaryrefslogtreecommitdiffstats
path: root/notes
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'bnewbold-import-refactors' into 'master'bnewbold2021-11-111-0/+46
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | import refactors and deprecations Some of these are from old stale branches (the datacite subject metadata patch), but most are from yesterday and today. Sort of a hodge-podge, but the general theme is getting around to deferred cleanups and refactors specific to importer code before making some behavioral changes. The Datacite-specific stuff could use review here. Remove unused/deprecated/dead code: - cdl_dash_dat and wayback_static importers, which were for specific early example entities and have been superseded by other importers - "extid map" sqlite3 feature from several importers, was only used for initial bulk imports (and maybe should not have been used) Refactors: - moved a number of large datastructures out of importer code and into a dedicated static file (`biblio_lookup_tables.py`). Didn't move all, just the ones that were either generic or very large (making it hard to read code) - shuffled around relative imports and some function names ("clean_str" vs. "clean") Some actual behavioral changes: - remove some Datacite-specific license slugs - stop trying to fix double-slashes in DOIs, that was causing more harm than help (some DOIs do actually have double-slashes!) - remove some excess metadata from datacite 'extra' fields
| * add notes about 'double slash in DOI' issueBryan Newbold2021-11-091-0/+46
| |
* | wayback ts cleanup: one more filter tweakBryan Newbold2021-11-091-1/+2
| |
* | update cleanups notesBryan Newbold2021-11-092-0/+72
| |
* | initial file/release bugfix cleanup worker and notesBryan Newbold2021-11-091-0/+144
| |
* | updates to lowercase DOI cleanupBryan Newbold2021-11-091-0/+71
| |
* | more iteration on short wayback timestamp cleanupBryan Newbold2021-11-092-3/+128
| |
* | cleanups: tweaks to wayback CDX cleanup scriptsBryan Newbold2021-11-091-1/+8
| |
* | wayback timestamps: updates to handle 4-digit caseBryan Newbold2021-11-092-11/+108
| |
* | start work on wayback short-timestamp cleanupBryan Newbold2021-11-092-0/+238
|/
* update CHANGELOG date and document v0.4 prod migration stepsv0.4.0Bryan Newbold2021-10-141-0/+48
|
* notes on v0.4 SQL migration in QABryan Newbold2021-10-131-0/+42
|
* another vanished content exampleBryan Newbold2021-10-071-0/+7
|
* old dblp hacking notesBryan Newbold2021-06-231-0/+72
|
* dblp import notes and bulk edit CHANGELOG updateBryan Newbold2021-06-032-1/+47
|
* DOAJ bulk import notes, and update bulk edit changelogBryan Newbold2021-06-022-0/+89
|
* more interesting example entities (eg, to crawl)Bryan Newbold2021-05-181-0/+19
|
* dblp import notes; bulk edit changelog updateBryan Newbold2020-12-292-1/+63
|
* DOAJ import notes, and SQL/stats updateBryan Newbold2020-12-231-0/+15
|
* DOAJ import notesBryan Newbold2020-12-172-2/+23
|
* notes on partial-progress DOAJ release metadata importBryan Newbold2020-12-141-0/+105
|
* bulk import notes on ORCIDBryan Newbold2020-12-141-0/+55
|
* bulk edits: note ORCID updateBryan Newbold2020-12-111-1/+5
|
* ingest and proposal updatesBryan Newbold2020-11-191-0/+44
|
* more metadata cleanup task notesBryan Newbold2020-10-011-0/+7
|
* file_meta import notesBryan Newbold2020-09-041-0/+75
|
* bulk edit log: add notes on recent chocula importBryan Newbold2020-08-171-0/+17
|
* example bad MAG matchBryan Newbold2020-07-231-0/+6
|
* commit old example notesBryan Newbold2020-07-013-0/+65
|
* JALC bulk edit notes from 2020-03-23Bryan Newbold2020-07-011-0/+23
|
* retro-active v0.3.2 changelog updatesBryan Newbold2020-04-171-0/+9
|
* notes: pubmed backfill (03/2020)Martin Czygan2020-03-241-2/+22
|
* notes on arxiv+pubmed backfillBryan Newbold2020-03-201-0/+37
|
* basic notes in bulk edit changelogBryan Newbold2020-01-191-0/+7
|
* bulk edit notes for datacite (QA)Bryan Newbold2020-01-191-0/+152
|
* pubmed update notesBryan Newbold2020-01-191-1/+46
|
* chocula bulk edit noteBryan Newbold2020-01-072-0/+15
|
* update bulk edit CHANGELOG and orcid notesBryan Newbold2019-12-312-13/+49
|
* bulk edit updatesBryan Newbold2019-12-261-3/+4
|
* pubmed bulk import notes (from QA)Bryan Newbold2019-12-231-0/+45
|
* arxiv bulk update notesBryan Newbold2019-12-222-2/+49
|
* crossref patch bulk importBryan Newbold2019-11-122-0/+63
|
* note file fixup pushed in prodBryan Newbold2019-10-092-1/+64
|
* move corpus changes to 'notes/bulk_edits'Bryan Newbold2019-10-083-0/+285
|
* corpus CHANGELOG about chocula updatesBryan Newbold2019-09-031-0/+6
|
* note container updates in corpus changelogBryan Newbold2019-08-271-0/+5
|
* recent bootstrap/import notesBryan Newbold2019-06-033-0/+495
|
* migration notesBryan Newbold2019-05-232-0/+44
|
* WIP metadata corpus changelogBryan Newbold2019-05-071-0/+41
|
* user testing feedback (jefferson, old)Bryan Newbold2019-05-071-0/+26
|