fatcat - [no description]

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	ES transform: remove prototype microfilm links	Bryan Newbold	2021-12-03	1	-20/+0
\| \| \| \|	This ended up being a feature in scholar.archive.org, not fatcat.
*	chocula importer: handle not-upper-case ISSNs	Bryan Newbold	2021-11-30	1	-2/+6
\|
*	chocula importer: handle broken ISSNs in extra metadata	Bryan Newbold	2021-11-30	1	-2/+7
\|
*	chocula importer: tweak counting, conditions for doing updates	Bryan Newbold	2021-11-30	1	-15/+7
\|
*	chocula importer: move issne/issnp 'extra' to top-level fields if doing updates	Bryan Newbold	2021-11-30	1	-0/+6
\|
*	chocula: don't do name cleanups in importer	Bryan Newbold	2021-11-30	1	-8/+2
\| \| \| \|	This kind of cleanup should be done in 'chocula' instead.
*	container merger: fix bug with filtering by release count	Bryan Newbold	2021-11-30	1	-13/+15
\| \| \| \| \|	Also apply the "human edit" and "release count" checks only to the dupe (to-be-redirected) idents.
*	release merger: same editgroup_id fixes as for file and container mergers	Bryan Newbold	2021-11-24	1	-1/+5
\|
*	container merger: fixes from QA testing	Bryan Newbold	2021-11-24	1	-8/+13
\|
*	mergers: don't try to accept empty editgroups in dry-run-mode	Bryan Newbold	2021-11-24	1	-2/+4
\|
*	ES release transform: handle redirected containers better	Bryan Newbold	2021-11-24	1	-1/+1
\| \| \| \| \|	Despite the inline comment, we were not actually grabbing the "redirected" ident correctly, meaning some counts would not be accurate.
*	container merger: defer allocation of editgroup_id; and dummy code path	Bryan Newbold	2021-11-24	1	-1/+5
\|
*	initial implementation of container merger	Bryan Newbold	2021-11-24	2	-0/+353
\|
*	file merger: allocate editgroup id later in 'merge' process	Bryan Newbold	2021-11-24	1	-1/+5
\| \| \| \| \|	The motivation is to avoid creating empty editgroups in dry-run mode, and when all entities are "skipped"
*	Merge branch 'bnewbold-mergers' into 'master'	bnewbold	2021-11-25	5	-0/+800
\|\ \| \| \| \| \| \| \| \|	entity mergers framework See merge request webgroup/fatcat!133
\| *	mergers common: remove inaccurate comment	Bryan Newbold	2021-11-24	1	-2/+0
\| \| \| \| \| \| \| \|	Caught in review, thanks miku
\| *	file merger: add content_scope to list of merged fields	Bryan Newbold	2021-11-24	1	-1/+1
\| \|
\| *	release merger: some progress, but also disable (not complete)	Bryan Newbold	2021-11-23	1	-12/+72
\| \|
\| *	file merges: fixes from testing in QA	Bryan Newbold	2021-11-23	1	-14/+23
\| \|
\| *	mergers: small tweaks	Bryan Newbold	2021-11-23	2	-3/+3
\| \|
\| *	mergers: remove entity mergers from __init__ (to work around warning)	Bryan Newbold	2021-11-23	1	-2/+0
\| \|
\| *	initial file merger, with tests	Bryan Newbold	2021-11-23	2	-0/+388
\| \|
\| *	mergers: fmt, lint, refactors	Bryan Newbold	2021-11-23	3	-96/+200
\| \| \| \| \| \| \| \| \| \|	These old merger code is from an old branch and needed significant cleanup
\| *	remove top-level fatcat_merge.py; going to call module __main__ going forward	Bryan Newbold	2021-11-23	1	-112/+0
\| \|
\| *	first iteration of mergers	Bryan Newbold	2021-11-23	4	-0/+355
\| \|
* \|	codespell fixes to various other docs	Bryan Newbold	2021-11-24	1	-1/+1
\| \|
* \|	codespell fixes in python code (comments)	Bryan Newbold	2021-11-24	4	-6/+6
\| \|
* \|	codespell fixes in web interface templates	Bryan Newbold	2021-11-24	14	-19/+19
\|/
*	Merge branch 'bnewbold-content-scope'	Bryan Newbold	2021-11-22	5	-1/+8
\|\
\| *	bump python client to 0.5.0	Bryan Newbold	2021-11-17	1	-1/+1
\| \|
\| *	content_scope: include in file ES schema and transform	Bryan Newbold	2021-11-17	1	-0/+1
\| \|
\| *	minimal python test coverage of content_scope fields	Bryan Newbold	2021-11-17	3	-0/+6
\| \|
\| *	python code: update python_openapi_client in lockfile	Bryan Newbold	2021-11-17	1	-1/+1
\| \|
* \|	typo: don't expand containers for release revs (TOML)	Bryan Newbold	2021-11-19	1	-1/+1
\| \|
* \|	web editgroup diff: don't enrich in TOML diff; fix overlapping break	Bryan Newbold	2021-11-19	2	-5/+8
\| \|
* \|	web generic entity helpers: make enrichment optional	Bryan Newbold	2021-11-19	1	-18/+49
\| \|
* \|	polish editgroup diff view	Bryan Newbold	2021-11-18	4	-92/+83
\| \| \| \| \| \| \| \|	Still not as great as it could be, but useful in this state.
* \|	initial implementation of editgroup 'diff' for review	Bryan Newbold	2021-11-17	4	-6/+183
\| \|
* \|	web: fix API URL link for review pages of entities	Bryan Newbold	2021-11-17	1	-2/+2
\|/
*	web: handle ES non-int error codes better	Bryan Newbold	2021-11-12	1	-9/+12
\|
*	Merge branch 'bnewbold-import-refactors' into 'master'	bnewbold	2021-11-11	26	-1599/+828
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	import refactors and deprecations Some of these are from old stale branches (the datacite subject metadata patch), but most are from yesterday and today. Sort of a hodge-podge, but the general theme is getting around to deferred cleanups and refactors specific to importer code before making some behavioral changes. The Datacite-specific stuff could use review here. Remove unused/deprecated/dead code: - cdl_dash_dat and wayback_static importers, which were for specific early example entities and have been superseded by other importers - "extid map" sqlite3 feature from several importers, was only used for initial bulk imports (and maybe should not have been used) Refactors: - moved a number of large datastructures out of importer code and into a dedicated static file (`biblio_lookup_tables.py`). Didn't move all, just the ones that were either generic or very large (making it hard to read code) - shuffled around relative imports and some function names ("clean_str" vs. "clean") Some actual behavioral changes: - remove some Datacite-specific license slugs - stop trying to fix double-slashes in DOIs, that was causing more harm than help (some DOIs do actually have double-slashes!) - remove some excess metadata from datacite 'extra' fields
\| *	update datacite tests for license slug changes	Bryan Newbold	2021-11-10	2	-8/+7
\| \| \| \| \| \| \| \| \| \|	Use datacite-specific wrapper function, and remove a couple non-OA/TDM-limited licenses.
\| *	improve lookup_license_slug helper and lookup table	Bryan Newbold	2021-11-10	2	-56/+62
\| \|
\| *	refactor importer metadata tables into separate file; move some helpers around	Bryan Newbold	2021-11-10	10	-702/+682
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- MAX_ABSTRACT_LENGTH set in a single place (importer common) - merge datacite license slug table in to common table, removing some TDM-specific licenses (which do not apply in the context of preserving the full work)
\| *	importers: refactor imports of clean() and other normalization helpers	Bryan Newbold	2021-11-10	12	-95/+104
\| \|
\| *	remove cdl_dash_dat and wayback_static importers	Bryan Newbold	2021-11-10	4	-596/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cleaning out dead code. These importers were used to create demonstration fileset and webcapture entities early in development. They have been replaced by the fileset and webcapture ingest importers.
\| *	datacite import: store less subject metadata	Bryan Newbold	2021-11-10	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Many of these 'subject' objects have the equivalent of several lines of text, with complex URLs that don't compress well. I think it is fine we have included these thus far instead of parsing more deeply, but going forward I don't think this nested 'extra' metadata is worth the database space.
\| *	importers: use clean_doi() in many more (all?) importers	Bryan Newbold	2021-11-09	6	-12/+29
\| \|
\| *	clean_doi: stop mutating double-slash DOIs, except for 10.1037 prefix	Bryan Newbold	2021-11-09	1	-1/+2
\| \|
\| *	remove deprecated extid sqlite3 lookup table feature from importers	Bryan Newbold	2021-11-09	10	-203/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was used during initial bulk imports, but is no longer used and could create serious metadata problems if used accidentially. In retrospect, it also made metadata provenance less transparent, and may have done more harm than good overall.