Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | file merger: add content_scope to list of merged fields | Bryan Newbold | 2021-11-24 | 1 | -1/+1 |
| | |||||
* | release merger: some progress, but also disable (not complete) | Bryan Newbold | 2021-11-23 | 1 | -12/+72 |
| | |||||
* | file merges: fixes from testing in QA | Bryan Newbold | 2021-11-23 | 1 | -14/+23 |
| | |||||
* | file de-dupe: notes on prep and QA testing | Bryan Newbold | 2021-11-23 | 2 | -0/+136 |
| | |||||
* | mergers: small tweaks | Bryan Newbold | 2021-11-23 | 2 | -3/+3 |
| | |||||
* | mergers: remove entity mergers from __init__ (to work around warning) | Bryan Newbold | 2021-11-23 | 1 | -2/+0 |
| | |||||
* | add proposal for entity mergers | Bryan Newbold | 2021-11-23 | 1 | -0/+110 |
| | |||||
* | initial file merger, with tests | Bryan Newbold | 2021-11-23 | 2 | -0/+388 |
| | |||||
* | mergers: fmt, lint, refactors | Bryan Newbold | 2021-11-23 | 3 | -96/+200 |
| | | | | | These old merger code is from an old branch and needed significant cleanup | ||||
* | remove top-level fatcat_merge.py; going to call module __main__ going forward | Bryan Newbold | 2021-11-23 | 1 | -112/+0 |
| | |||||
* | first iteration of mergers | Bryan Newbold | 2021-11-23 | 4 | -0/+355 |
| | |||||
* | python_openapi_client: update metadata tags for next upload | Bryan Newbold | 2021-11-23 | 1 | -2/+2 |
| | |||||
* | commit v0.5.0 deployment notes | Bryan Newbold | 2021-11-22 | 1 | -0/+85 |
| | |||||
* | final CHANGELOG tweak for v0.5.0v0.5.0 | Bryan Newbold | 2021-11-22 | 1 | -1/+2 |
| | |||||
* | Merge branch 'bnewbold-content-scope' | Bryan Newbold | 2021-11-22 | 33 | -28/+346 |
|\ | |||||
| * | update CHANGELOG for v0.5.0 | Bryan Newbold | 2021-11-17 | 1 | -1/+18 |
| | | |||||
| * | bump rust code to 0.5.0 | Bryan Newbold | 2021-11-17 | 5 | -6/+7 |
| | | |||||
| * | bump python client to 0.5.0 | Bryan Newbold | 2021-11-17 | 10 | -15/+15 |
| | | |||||
| * | because of SQL change, this schema bump does warrent a minor version bump to ↵ | Bryan Newbold | 2021-11-17 | 1 | -1/+1 |
| | | | | | | | | v0.5.0 (not v0.4.1) | ||||
| * | content_scope: include in file ES schema and transform | Bryan Newbold | 2021-11-17 | 2 | -0/+2 |
| | | |||||
| * | guide: document content_scope field | Bryan Newbold | 2021-11-17 | 3 | -1/+49 |
| | | |||||
| * | minimal python test coverage of content_scope fields | Bryan Newbold | 2021-11-17 | 3 | -0/+6 |
| | | |||||
| * | python code: update python_openapi_client in lockfile | Bryan Newbold | 2021-11-17 | 1 | -1/+1 |
| | | |||||
| * | update python client library codegen for content_scope | Bryan Newbold | 2021-11-17 | 9 | -17/+95 |
| | | |||||
| * | rust: bump crate version and lockfile | Bryan Newbold | 2021-11-17 | 2 | -3/+3 |
| | | |||||
| * | rust: implement content_scope | Bryan Newbold | 2021-11-17 | 5 | -0/+22 |
| | | |||||
| * | SQL implementation of content_scope | Bryan Newbold | 2021-11-17 | 2 | -0/+36 |
| | | |||||
| * | codegen rust code for content_scope | Bryan Newbold | 2021-11-17 | 3 | -4/+19 |
| | | |||||
| * | schema: add content_scope fields, and bump to 0.4.1 | Bryan Newbold | 2021-11-17 | 1 | -1/+10 |
| | | |||||
| * | proposal: content_scope field | Bryan Newbold | 2021-11-17 | 1 | -0/+84 |
| | | |||||
* | | typo: don't expand containers for release revs (TOML) | Bryan Newbold | 2021-11-19 | 1 | -1/+1 |
| | | |||||
* | | web editgroup diff: don't enrich in TOML diff; fix overlapping break | Bryan Newbold | 2021-11-19 | 2 | -5/+8 |
| | | |||||
* | | web generic entity helpers: make enrichment optional | Bryan Newbold | 2021-11-19 | 1 | -18/+49 |
| | | |||||
* | | polish editgroup diff view | Bryan Newbold | 2021-11-18 | 4 | -92/+83 |
| | | | | | | | | Still not as great as it could be, but useful in this state. | ||||
* | | initial implementation of editgroup 'diff' for review | Bryan Newbold | 2021-11-17 | 4 | -6/+183 |
| | | |||||
* | | web: fix API URL link for review pages of entities | Bryan Newbold | 2021-11-17 | 1 | -2/+2 |
|/ | |||||
* | updated notes on possible cleanups | Bryan Newbold | 2021-11-17 | 1 | -4/+27 |
| | |||||
* | ISSN-L dupes check: output all matches | Bryan Newbold | 2021-11-17 | 1 | -1/+1 |
| | |||||
* | document cleanups run this week | Bryan Newbold | 2021-11-12 | 5 | -0/+244 |
| | |||||
* | web: handle ES non-int error codes better | Bryan Newbold | 2021-11-12 | 1 | -9/+12 |
| | |||||
* | Merge branch 'bnewbold-import-refactors' into 'master' | bnewbold | 2021-11-11 | 27 | -1599/+874 |
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | import refactors and deprecations Some of these are from old stale branches (the datacite subject metadata patch), but most are from yesterday and today. Sort of a hodge-podge, but the general theme is getting around to deferred cleanups and refactors specific to importer code before making some behavioral changes. The Datacite-specific stuff could use review here. Remove unused/deprecated/dead code: - cdl_dash_dat and wayback_static importers, which were for specific early example entities and have been superseded by other importers - "extid map" sqlite3 feature from several importers, was only used for initial bulk imports (and maybe should not have been used) Refactors: - moved a number of large datastructures out of importer code and into a dedicated static file (`biblio_lookup_tables.py`). Didn't move all, just the ones that were either generic or very large (making it hard to read code) - shuffled around relative imports and some function names ("clean_str" vs. "clean") Some actual behavioral changes: - remove some Datacite-specific license slugs - stop trying to fix double-slashes in DOIs, that was causing more harm than help (some DOIs do actually have double-slashes!) - remove some excess metadata from datacite 'extra' fields | ||||
| * | update datacite tests for license slug changes | Bryan Newbold | 2021-11-10 | 2 | -8/+7 |
| | | | | | | | | | | Use datacite-specific wrapper function, and remove a couple non-OA/TDM-limited licenses. | ||||
| * | improve lookup_license_slug helper and lookup table | Bryan Newbold | 2021-11-10 | 2 | -56/+62 |
| | | |||||
| * | refactor importer metadata tables into separate file; move some helpers around | Bryan Newbold | 2021-11-10 | 10 | -702/+682 |
| | | | | | | | | | | | | | | - MAX_ABSTRACT_LENGTH set in a single place (importer common) - merge datacite license slug table in to common table, removing some TDM-specific licenses (which do not apply in the context of preserving the full work) | ||||
| * | importers: refactor imports of clean() and other normalization helpers | Bryan Newbold | 2021-11-10 | 12 | -95/+104 |
| | | |||||
| * | remove cdl_dash_dat and wayback_static importers | Bryan Newbold | 2021-11-10 | 4 | -596/+0 |
| | | | | | | | | | | | | | | | | Cleaning out dead code. These importers were used to create demonstration fileset and webcapture entities early in development. They have been replaced by the fileset and webcapture ingest importers. | ||||
| * | datacite import: store less subject metadata | Bryan Newbold | 2021-11-10 | 1 | -1/+7 |
| | | | | | | | | | | | | | | | | Many of these 'subject' objects have the equivalent of several lines of text, with complex URLs that don't compress well. I think it is fine we have included these thus far instead of parsing more deeply, but going forward I don't think this nested 'extra' metadata is worth the database space. | ||||
| * | add notes about 'double slash in DOI' issue | Bryan Newbold | 2021-11-09 | 1 | -0/+46 |
| | | |||||
| * | importers: use clean_doi() in many more (all?) importers | Bryan Newbold | 2021-11-09 | 6 | -12/+29 |
| | | |||||
| * | clean_doi: stop mutating double-slash DOIs, except for 10.1037 prefix | Bryan Newbold | 2021-11-09 | 1 | -1/+2 |
| | |