aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* file merger: add content_scope to list of merged fieldsBryan Newbold2021-11-241-1/+1
|
* release merger: some progress, but also disable (not complete)Bryan Newbold2021-11-231-12/+72
|
* file merges: fixes from testing in QABryan Newbold2021-11-231-14/+23
|
* file de-dupe: notes on prep and QA testingBryan Newbold2021-11-232-0/+136
|
* mergers: small tweaksBryan Newbold2021-11-232-3/+3
|
* mergers: remove entity mergers from __init__ (to work around warning)Bryan Newbold2021-11-231-2/+0
|
* add proposal for entity mergersBryan Newbold2021-11-231-0/+110
|
* initial file merger, with testsBryan Newbold2021-11-232-0/+388
|
* mergers: fmt, lint, refactorsBryan Newbold2021-11-233-96/+200
| | | | | These old merger code is from an old branch and needed significant cleanup
* remove top-level fatcat_merge.py; going to call module __main__ going forwardBryan Newbold2021-11-231-112/+0
|
* first iteration of mergersBryan Newbold2021-11-234-0/+355
|
* python_openapi_client: update metadata tags for next uploadBryan Newbold2021-11-231-2/+2
|
* commit v0.5.0 deployment notesBryan Newbold2021-11-221-0/+85
|
* final CHANGELOG tweak for v0.5.0v0.5.0Bryan Newbold2021-11-221-1/+2
|
* Merge branch 'bnewbold-content-scope'Bryan Newbold2021-11-2233-28/+346
|\
| * update CHANGELOG for v0.5.0Bryan Newbold2021-11-171-1/+18
| |
| * bump rust code to 0.5.0Bryan Newbold2021-11-175-6/+7
| |
| * bump python client to 0.5.0Bryan Newbold2021-11-1710-15/+15
| |
| * because of SQL change, this schema bump does warrent a minor version bump to ↵Bryan Newbold2021-11-171-1/+1
| | | | | | | | v0.5.0 (not v0.4.1)
| * content_scope: include in file ES schema and transformBryan Newbold2021-11-172-0/+2
| |
| * guide: document content_scope fieldBryan Newbold2021-11-173-1/+49
| |
| * minimal python test coverage of content_scope fieldsBryan Newbold2021-11-173-0/+6
| |
| * python code: update python_openapi_client in lockfileBryan Newbold2021-11-171-1/+1
| |
| * update python client library codegen for content_scopeBryan Newbold2021-11-179-17/+95
| |
| * rust: bump crate version and lockfileBryan Newbold2021-11-172-3/+3
| |
| * rust: implement content_scopeBryan Newbold2021-11-175-0/+22
| |
| * SQL implementation of content_scopeBryan Newbold2021-11-172-0/+36
| |
| * codegen rust code for content_scopeBryan Newbold2021-11-173-4/+19
| |
| * schema: add content_scope fields, and bump to 0.4.1Bryan Newbold2021-11-171-1/+10
| |
| * proposal: content_scope fieldBryan Newbold2021-11-171-0/+84
| |
* | typo: don't expand containers for release revs (TOML)Bryan Newbold2021-11-191-1/+1
| |
* | web editgroup diff: don't enrich in TOML diff; fix overlapping breakBryan Newbold2021-11-192-5/+8
| |
* | web generic entity helpers: make enrichment optionalBryan Newbold2021-11-191-18/+49
| |
* | polish editgroup diff viewBryan Newbold2021-11-184-92/+83
| | | | | | | | Still not as great as it could be, but useful in this state.
* | initial implementation of editgroup 'diff' for reviewBryan Newbold2021-11-174-6/+183
| |
* | web: fix API URL link for review pages of entitiesBryan Newbold2021-11-171-2/+2
|/
* updated notes on possible cleanupsBryan Newbold2021-11-171-4/+27
|
* ISSN-L dupes check: output all matchesBryan Newbold2021-11-171-1/+1
|
* document cleanups run this weekBryan Newbold2021-11-125-0/+244
|
* web: handle ES non-int error codes betterBryan Newbold2021-11-121-9/+12
|
* Merge branch 'bnewbold-import-refactors' into 'master'bnewbold2021-11-1127-1599/+874
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | import refactors and deprecations Some of these are from old stale branches (the datacite subject metadata patch), but most are from yesterday and today. Sort of a hodge-podge, but the general theme is getting around to deferred cleanups and refactors specific to importer code before making some behavioral changes. The Datacite-specific stuff could use review here. Remove unused/deprecated/dead code: - cdl_dash_dat and wayback_static importers, which were for specific early example entities and have been superseded by other importers - "extid map" sqlite3 feature from several importers, was only used for initial bulk imports (and maybe should not have been used) Refactors: - moved a number of large datastructures out of importer code and into a dedicated static file (`biblio_lookup_tables.py`). Didn't move all, just the ones that were either generic or very large (making it hard to read code) - shuffled around relative imports and some function names ("clean_str" vs. "clean") Some actual behavioral changes: - remove some Datacite-specific license slugs - stop trying to fix double-slashes in DOIs, that was causing more harm than help (some DOIs do actually have double-slashes!) - remove some excess metadata from datacite 'extra' fields
| * update datacite tests for license slug changesBryan Newbold2021-11-102-8/+7
| | | | | | | | | | Use datacite-specific wrapper function, and remove a couple non-OA/TDM-limited licenses.
| * improve lookup_license_slug helper and lookup tableBryan Newbold2021-11-102-56/+62
| |
| * refactor importer metadata tables into separate file; move some helpers aroundBryan Newbold2021-11-1010-702/+682
| | | | | | | | | | | | | | - MAX_ABSTRACT_LENGTH set in a single place (importer common) - merge datacite license slug table in to common table, removing some TDM-specific licenses (which do not apply in the context of preserving the full work)
| * importers: refactor imports of clean() and other normalization helpersBryan Newbold2021-11-1012-95/+104
| |
| * remove cdl_dash_dat and wayback_static importersBryan Newbold2021-11-104-596/+0
| | | | | | | | | | | | | | | | Cleaning out dead code. These importers were used to create demonstration fileset and webcapture entities early in development. They have been replaced by the fileset and webcapture ingest importers.
| * datacite import: store less subject metadataBryan Newbold2021-11-101-1/+7
| | | | | | | | | | | | | | | | Many of these 'subject' objects have the equivalent of several lines of text, with complex URLs that don't compress well. I think it is fine we have included these thus far instead of parsing more deeply, but going forward I don't think this nested 'extra' metadata is worth the database space.
| * add notes about 'double slash in DOI' issueBryan Newbold2021-11-091-0/+46
| |
| * importers: use clean_doi() in many more (all?) importersBryan Newbold2021-11-096-12/+29
| |
| * clean_doi: stop mutating double-slash DOIs, except for 10.1037 prefixBryan Newbold2021-11-091-1/+2
| |