aboutsummaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools
Commit message (Expand)AuthorAgeFilesLines
* fix typo in fileset comparison helperBryan Newbold2022-03-231-1/+1
* ingest fileset fixes, and some test coverageBryan Newbold2022-03-232-13/+30
* dataset ingest: JSON object fixesBryan Newbold2022-03-221-5/+5
* Merge branch 'bnewbold-container-web' into 'master'bnewbold2022-03-103-2/+185
|\
| * move container_status ES query code from fatcat_web to fatcat_toolsBryan Newbold2022-02-093-2/+185
* | entity updates: don't try to ingest arxiv DOIs (for now)Bryan Newbold2022-02-281-0/+2
* | datacite importer: skip container_id for some repository sourcesBryan Newbold2022-02-091-0/+34
|/
* doaj importer: TODO note to skip some larger publishersBryan Newbold2022-02-091-0/+4
* container ES transform: include old extra.issne/p fieldsBryan Newbold2022-02-031-1/+4
* Merge branch 'bnewbold-file-es' into 'master'bnewbold2022-01-213-4/+38
|\
| * entity worker: expand creators in release entitiesBryan Newbold2021-12-151-1/+1
| * small default config typo fixes for elasticsearch workersBryan Newbold2021-12-151-2/+2
| * file elasticsearch index workerBryan Newbold2021-12-152-1/+35
* | crossref importer: skip affiliations lacking 'name'Bryan Newbold2021-12-151-0/+3
|/
* mergers: fix typo in env var nameBryan Newbold2021-12-073-3/+3
* ES container schema: add 'sim_pubid' and `ia_sim_collection` fieldsBryan Newbold2021-12-031-0/+2
* ES transform: remove prototype microfilm linksBryan Newbold2021-12-031-20/+0
* chocula importer: handle not-upper-case ISSNsBryan Newbold2021-11-301-2/+6
* chocula importer: handle broken ISSNs in extra metadataBryan Newbold2021-11-301-2/+7
* chocula importer: tweak counting, conditions for doing updatesBryan Newbold2021-11-301-15/+7
* chocula importer: move issne/issnp 'extra' to top-level fields if doing updatesBryan Newbold2021-11-301-0/+6
* chocula: don't do name cleanups in importerBryan Newbold2021-11-301-8/+2
* container merger: fix bug with filtering by release countBryan Newbold2021-11-301-13/+15
* release merger: same editgroup_id fixes as for file and container mergersBryan Newbold2021-11-241-1/+5
* container merger: fixes from QA testingBryan Newbold2021-11-241-8/+13
* mergers: don't try to accept empty editgroups in dry-run-modeBryan Newbold2021-11-241-2/+4
* ES release transform: handle redirected containers betterBryan Newbold2021-11-241-1/+1
* container merger: defer allocation of editgroup_id; and dummy code pathBryan Newbold2021-11-241-1/+5
* initial implementation of container mergerBryan Newbold2021-11-241-0/+237
* file merger: allocate editgroup id later in 'merge' processBryan Newbold2021-11-241-1/+5
* Merge branch 'bnewbold-mergers' into 'master'bnewbold2021-11-254-0/+640
|\
| * mergers common: remove inaccurate commentBryan Newbold2021-11-241-2/+0
| * file merger: add content_scope to list of merged fieldsBryan Newbold2021-11-241-1/+1
| * release merger: some progress, but also disable (not complete)Bryan Newbold2021-11-231-12/+72
| * file merges: fixes from testing in QABryan Newbold2021-11-231-14/+23
| * mergers: small tweaksBryan Newbold2021-11-232-3/+3
| * mergers: remove entity mergers from __init__ (to work around warning)Bryan Newbold2021-11-231-2/+0
| * initial file merger, with testsBryan Newbold2021-11-231-0/+228
| * mergers: fmt, lint, refactorsBryan Newbold2021-11-233-96/+200
| * first iteration of mergersBryan Newbold2021-11-233-0/+243
* | codespell fixes in python code (comments)Bryan Newbold2021-11-242-3/+3
|/
* content_scope: include in file ES schema and transformBryan Newbold2021-11-171-0/+1
* Merge branch 'bnewbold-import-refactors' into 'master'bnewbold2021-11-1118-1462/+811
|\
| * improve lookup_license_slug helper and lookup tableBryan Newbold2021-11-102-56/+62
| * refactor importer metadata tables into separate file; move some helpers aroundBryan Newbold2021-11-1010-702/+682
| * importers: refactor imports of clean() and other normalization helpersBryan Newbold2021-11-1012-95/+104
| * remove cdl_dash_dat and wayback_static importersBryan Newbold2021-11-103-510/+0
| * datacite import: store less subject metadataBryan Newbold2021-11-101-1/+7
| * importers: use clean_doi() in many more (all?) importersBryan Newbold2021-11-096-12/+29
| * clean_doi: stop mutating double-slash DOIs, except for 10.1037 prefixBryan Newbold2021-11-091-1/+2