summaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools
Commit message (Expand)AuthorAgeFilesLines
* chocula importer: move issne/issnp 'extra' to top-level fields if doing updatesBryan Newbold2021-11-301-0/+6
* chocula: don't do name cleanups in importerBryan Newbold2021-11-301-8/+2
* container merger: fix bug with filtering by release countBryan Newbold2021-11-301-13/+15
* release merger: same editgroup_id fixes as for file and container mergersBryan Newbold2021-11-241-1/+5
* container merger: fixes from QA testingBryan Newbold2021-11-241-8/+13
* mergers: don't try to accept empty editgroups in dry-run-modeBryan Newbold2021-11-241-2/+4
* ES release transform: handle redirected containers betterBryan Newbold2021-11-241-1/+1
* container merger: defer allocation of editgroup_id; and dummy code pathBryan Newbold2021-11-241-1/+5
* initial implementation of container mergerBryan Newbold2021-11-241-0/+237
* file merger: allocate editgroup id later in 'merge' processBryan Newbold2021-11-241-1/+5
* Merge branch 'bnewbold-mergers' into 'master'bnewbold2021-11-254-0/+640
|\
| * mergers common: remove inaccurate commentBryan Newbold2021-11-241-2/+0
| * file merger: add content_scope to list of merged fieldsBryan Newbold2021-11-241-1/+1
| * release merger: some progress, but also disable (not complete)Bryan Newbold2021-11-231-12/+72
| * file merges: fixes from testing in QABryan Newbold2021-11-231-14/+23
| * mergers: small tweaksBryan Newbold2021-11-232-3/+3
| * mergers: remove entity mergers from __init__ (to work around warning)Bryan Newbold2021-11-231-2/+0
| * initial file merger, with testsBryan Newbold2021-11-231-0/+228
| * mergers: fmt, lint, refactorsBryan Newbold2021-11-233-96/+200
| * first iteration of mergersBryan Newbold2021-11-233-0/+243
* | codespell fixes in python code (comments)Bryan Newbold2021-11-242-3/+3
|/
* content_scope: include in file ES schema and transformBryan Newbold2021-11-171-0/+1
* Merge branch 'bnewbold-import-refactors' into 'master'bnewbold2021-11-1118-1462/+811
|\
| * improve lookup_license_slug helper and lookup tableBryan Newbold2021-11-102-56/+62
| * refactor importer metadata tables into separate file; move some helpers aroundBryan Newbold2021-11-1010-702/+682
| * importers: refactor imports of clean() and other normalization helpersBryan Newbold2021-11-1012-95/+104
| * remove cdl_dash_dat and wayback_static importersBryan Newbold2021-11-103-510/+0
| * datacite import: store less subject metadataBryan Newbold2021-11-101-1/+7
| * importers: use clean_doi() in many more (all?) importersBryan Newbold2021-11-096-12/+29
| * clean_doi: stop mutating double-slash DOIs, except for 10.1037 prefixBryan Newbold2021-11-091-1/+2
| * remove deprecated extid sqlite3 lookup table feature from importersBryan Newbold2021-11-093-160/+0
* | Merge branch 'bnewbold-cleanups-nov2021' into 'master'bnewbold2021-11-114-0/+748
|\ \
| * | file/release bugfix: handle files with multiple editsBryan Newbold2021-11-091-6/+6
| * | cleanups: add more state=active checksBryan Newbold2021-11-092-0/+8
| * | update link source filters in file/release bugfixBryan Newbold2021-11-091-2/+8
| * | initial file/release bugfix cleanup worker and notesBryan Newbold2021-11-091-0/+231
| * | updates to lowercase DOI cleanupBryan Newbold2021-11-091-7/+15
| * | lowercase DOI lint and check entity statusBryan Newbold2021-11-091-4/+5
| * | more iteration on short wayback timestamp cleanupBryan Newbold2021-11-091-1/+1
| * | cleanups: tweaks to wayback CDX cleanup scriptsBryan Newbold2021-11-091-5/+13
| * | cleanups: initial lowercase DOI cleanup scriptBryan Newbold2021-11-091-0/+145
| * | wayback short ts: another regression test, and some small fmt/tweaksBryan Newbold2021-11-091-3/+38
| * | wayback cleanup: actually update entityBryan Newbold2021-11-091-2/+4
| * | imports: generic file cleanup removes exact duplicate URLsBryan Newbold2021-11-091-0/+9
| * | wayback short ts: add regression test for dupe URLsBryan Newbold2021-11-091-0/+44
| * | short wayback ts: initial cleanup script implementationBryan Newbold2021-11-091-0/+251
| |/
* / pubmed: allow updates if PMCID does not exist yetBryan Newbold2021-11-101-1/+6
|/
* cleanups: create a separate JsonLinePusher for cleanup workers (distinct base...Bryan Newbold2021-11-032-2/+19
* datacite importer: remove unused 'year_only' variableBryan Newbold2021-11-031-2/+3
* pubmed harvester: remove unused variablesBryan Newbold2021-11-031-2/+2