Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | crossref importer: skip affiliations lacking 'name' | Bryan Newbold | 2021-12-15 | 1 | -0/+3 |
| | | | | Relatedly, we should start handling ROR affiliations in contribs soon. | ||||
* | mergers: fix typo in env var name | Bryan Newbold | 2021-12-07 | 3 | -3/+3 |
| | |||||
* | ES container schema: add 'sim_pubid' and `ia_sim_collection` fields | Bryan Newbold | 2021-12-03 | 1 | -0/+2 |
| | |||||
* | ES transform: remove prototype microfilm links | Bryan Newbold | 2021-12-03 | 1 | -20/+0 |
| | | | | This ended up being a feature in scholar.archive.org, not fatcat. | ||||
* | chocula importer: handle not-upper-case ISSNs | Bryan Newbold | 2021-11-30 | 1 | -2/+6 |
| | |||||
* | chocula importer: handle broken ISSNs in extra metadata | Bryan Newbold | 2021-11-30 | 1 | -2/+7 |
| | |||||
* | chocula importer: tweak counting, conditions for doing updates | Bryan Newbold | 2021-11-30 | 1 | -15/+7 |
| | |||||
* | chocula importer: move issne/issnp 'extra' to top-level fields if doing updates | Bryan Newbold | 2021-11-30 | 1 | -0/+6 |
| | |||||
* | chocula: don't do name cleanups in importer | Bryan Newbold | 2021-11-30 | 1 | -8/+2 |
| | | | | This kind of cleanup should be done in 'chocula' instead. | ||||
* | container merger: fix bug with filtering by release count | Bryan Newbold | 2021-11-30 | 1 | -13/+15 |
| | | | | | Also apply the "human edit" and "release count" checks only to the dupe (to-be-redirected) idents. | ||||
* | release merger: same editgroup_id fixes as for file and container mergers | Bryan Newbold | 2021-11-24 | 1 | -1/+5 |
| | |||||
* | container merger: fixes from QA testing | Bryan Newbold | 2021-11-24 | 1 | -8/+13 |
| | |||||
* | mergers: don't try to accept empty editgroups in dry-run-mode | Bryan Newbold | 2021-11-24 | 1 | -2/+4 |
| | |||||
* | ES release transform: handle redirected containers better | Bryan Newbold | 2021-11-24 | 1 | -1/+1 |
| | | | | | Despite the inline comment, we were not actually grabbing the "redirected" ident correctly, meaning some counts would not be accurate. | ||||
* | container merger: defer allocation of editgroup_id; and dummy code path | Bryan Newbold | 2021-11-24 | 1 | -1/+5 |
| | |||||
* | initial implementation of container merger | Bryan Newbold | 2021-11-24 | 2 | -0/+353 |
| | |||||
* | file merger: allocate editgroup id later in 'merge' process | Bryan Newbold | 2021-11-24 | 1 | -1/+5 |
| | | | | | The motivation is to avoid creating empty editgroups in dry-run mode, and when all entities are "skipped" | ||||
* | Merge branch 'bnewbold-mergers' into 'master' | bnewbold | 2021-11-25 | 5 | -0/+800 |
|\ | | | | | | | | | entity mergers framework See merge request webgroup/fatcat!133 | ||||
| * | mergers common: remove inaccurate comment | Bryan Newbold | 2021-11-24 | 1 | -2/+0 |
| | | | | | | | | Caught in review, thanks miku | ||||
| * | file merger: add content_scope to list of merged fields | Bryan Newbold | 2021-11-24 | 1 | -1/+1 |
| | | |||||
| * | release merger: some progress, but also disable (not complete) | Bryan Newbold | 2021-11-23 | 1 | -12/+72 |
| | | |||||
| * | file merges: fixes from testing in QA | Bryan Newbold | 2021-11-23 | 1 | -14/+23 |
| | | |||||
| * | mergers: small tweaks | Bryan Newbold | 2021-11-23 | 2 | -3/+3 |
| | | |||||
| * | mergers: remove entity mergers from __init__ (to work around warning) | Bryan Newbold | 2021-11-23 | 1 | -2/+0 |
| | | |||||
| * | initial file merger, with tests | Bryan Newbold | 2021-11-23 | 2 | -0/+388 |
| | | |||||
| * | mergers: fmt, lint, refactors | Bryan Newbold | 2021-11-23 | 3 | -96/+200 |
| | | | | | | | | | | These old merger code is from an old branch and needed significant cleanup | ||||
| * | remove top-level fatcat_merge.py; going to call module __main__ going forward | Bryan Newbold | 2021-11-23 | 1 | -112/+0 |
| | | |||||
| * | first iteration of mergers | Bryan Newbold | 2021-11-23 | 4 | -0/+355 |
| | | |||||
* | | codespell fixes to various other docs | Bryan Newbold | 2021-11-24 | 1 | -1/+1 |
| | | |||||
* | | codespell fixes in python code (comments) | Bryan Newbold | 2021-11-24 | 4 | -6/+6 |
| | | |||||
* | | codespell fixes in web interface templates | Bryan Newbold | 2021-11-24 | 14 | -19/+19 |
|/ | |||||
* | Merge branch 'bnewbold-content-scope' | Bryan Newbold | 2021-11-22 | 5 | -1/+8 |
|\ | |||||
| * | bump python client to 0.5.0 | Bryan Newbold | 2021-11-17 | 1 | -1/+1 |
| | | |||||
| * | content_scope: include in file ES schema and transform | Bryan Newbold | 2021-11-17 | 1 | -0/+1 |
| | | |||||
| * | minimal python test coverage of content_scope fields | Bryan Newbold | 2021-11-17 | 3 | -0/+6 |
| | | |||||
| * | python code: update python_openapi_client in lockfile | Bryan Newbold | 2021-11-17 | 1 | -1/+1 |
| | | |||||
* | | typo: don't expand containers for release revs (TOML) | Bryan Newbold | 2021-11-19 | 1 | -1/+1 |
| | | |||||
* | | web editgroup diff: don't enrich in TOML diff; fix overlapping break | Bryan Newbold | 2021-11-19 | 2 | -5/+8 |
| | | |||||
* | | web generic entity helpers: make enrichment optional | Bryan Newbold | 2021-11-19 | 1 | -18/+49 |
| | | |||||
* | | polish editgroup diff view | Bryan Newbold | 2021-11-18 | 4 | -92/+83 |
| | | | | | | | | Still not as great as it could be, but useful in this state. | ||||
* | | initial implementation of editgroup 'diff' for review | Bryan Newbold | 2021-11-17 | 4 | -6/+183 |
| | | |||||
* | | web: fix API URL link for review pages of entities | Bryan Newbold | 2021-11-17 | 1 | -2/+2 |
|/ | |||||
* | web: handle ES non-int error codes better | Bryan Newbold | 2021-11-12 | 1 | -9/+12 |
| | |||||
* | Merge branch 'bnewbold-import-refactors' into 'master' | bnewbold | 2021-11-11 | 26 | -1599/+828 |
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | import refactors and deprecations Some of these are from old stale branches (the datacite subject metadata patch), but most are from yesterday and today. Sort of a hodge-podge, but the general theme is getting around to deferred cleanups and refactors specific to importer code before making some behavioral changes. The Datacite-specific stuff could use review here. Remove unused/deprecated/dead code: - cdl_dash_dat and wayback_static importers, which were for specific early example entities and have been superseded by other importers - "extid map" sqlite3 feature from several importers, was only used for initial bulk imports (and maybe should not have been used) Refactors: - moved a number of large datastructures out of importer code and into a dedicated static file (`biblio_lookup_tables.py`). Didn't move all, just the ones that were either generic or very large (making it hard to read code) - shuffled around relative imports and some function names ("clean_str" vs. "clean") Some actual behavioral changes: - remove some Datacite-specific license slugs - stop trying to fix double-slashes in DOIs, that was causing more harm than help (some DOIs do actually have double-slashes!) - remove some excess metadata from datacite 'extra' fields | ||||
| * | update datacite tests for license slug changes | Bryan Newbold | 2021-11-10 | 2 | -8/+7 |
| | | | | | | | | | | Use datacite-specific wrapper function, and remove a couple non-OA/TDM-limited licenses. | ||||
| * | improve lookup_license_slug helper and lookup table | Bryan Newbold | 2021-11-10 | 2 | -56/+62 |
| | | |||||
| * | refactor importer metadata tables into separate file; move some helpers around | Bryan Newbold | 2021-11-10 | 10 | -702/+682 |
| | | | | | | | | | | | | | | - MAX_ABSTRACT_LENGTH set in a single place (importer common) - merge datacite license slug table in to common table, removing some TDM-specific licenses (which do not apply in the context of preserving the full work) | ||||
| * | importers: refactor imports of clean() and other normalization helpers | Bryan Newbold | 2021-11-10 | 12 | -95/+104 |
| | | |||||
| * | remove cdl_dash_dat and wayback_static importers | Bryan Newbold | 2021-11-10 | 4 | -596/+0 |
| | | | | | | | | | | | | | | | | Cleaning out dead code. These importers were used to create demonstration fileset and webcapture entities early in development. They have been replaced by the fileset and webcapture ingest importers. | ||||
| * | datacite import: store less subject metadata | Bryan Newbold | 2021-11-10 | 1 | -1/+7 |
| | | | | | | | | | | | | | | | | Many of these 'subject' objects have the equivalent of several lines of text, with complex URLs that don't compress well. I think it is fine we have included these thus far instead of parsing more deeply, but going forward I don't think this nested 'extra' metadata is worth the database space. |