| Commit message (Expand) | Author | Age | Files | Lines |
* | release merger: same editgroup_id fixes as for file and container mergers | Bryan Newbold | 2021-11-24 | 1 | -1/+5 |
* | container merger: fixes from QA testing | Bryan Newbold | 2021-11-24 | 1 | -8/+13 |
* | mergers: don't try to accept empty editgroups in dry-run-mode | Bryan Newbold | 2021-11-24 | 1 | -2/+4 |
* | ES release transform: handle redirected containers better | Bryan Newbold | 2021-11-24 | 1 | -1/+1 |
* | container merger: defer allocation of editgroup_id; and dummy code path | Bryan Newbold | 2021-11-24 | 1 | -1/+5 |
* | initial implementation of container merger | Bryan Newbold | 2021-11-24 | 1 | -0/+237 |
* | file merger: allocate editgroup id later in 'merge' process | Bryan Newbold | 2021-11-24 | 1 | -1/+5 |
* | Merge branch 'bnewbold-mergers' into 'master' | bnewbold | 2021-11-25 | 4 | -0/+640 |
|\ |
|
| * | mergers common: remove inaccurate comment | Bryan Newbold | 2021-11-24 | 1 | -2/+0 |
| * | file merger: add content_scope to list of merged fields | Bryan Newbold | 2021-11-24 | 1 | -1/+1 |
| * | release merger: some progress, but also disable (not complete) | Bryan Newbold | 2021-11-23 | 1 | -12/+72 |
| * | file merges: fixes from testing in QA | Bryan Newbold | 2021-11-23 | 1 | -14/+23 |
| * | mergers: small tweaks | Bryan Newbold | 2021-11-23 | 2 | -3/+3 |
| * | mergers: remove entity mergers from __init__ (to work around warning) | Bryan Newbold | 2021-11-23 | 1 | -2/+0 |
| * | initial file merger, with tests | Bryan Newbold | 2021-11-23 | 1 | -0/+228 |
| * | mergers: fmt, lint, refactors | Bryan Newbold | 2021-11-23 | 3 | -96/+200 |
| * | first iteration of mergers | Bryan Newbold | 2021-11-23 | 3 | -0/+243 |
* | | codespell fixes in python code (comments) | Bryan Newbold | 2021-11-24 | 2 | -3/+3 |
|/ |
|
* | content_scope: include in file ES schema and transform | Bryan Newbold | 2021-11-17 | 1 | -0/+1 |
* | Merge branch 'bnewbold-import-refactors' into 'master' | bnewbold | 2021-11-11 | 18 | -1462/+811 |
|\ |
|
| * | improve lookup_license_slug helper and lookup table | Bryan Newbold | 2021-11-10 | 2 | -56/+62 |
| * | refactor importer metadata tables into separate file; move some helpers around | Bryan Newbold | 2021-11-10 | 10 | -702/+682 |
| * | importers: refactor imports of clean() and other normalization helpers | Bryan Newbold | 2021-11-10 | 12 | -95/+104 |
| * | remove cdl_dash_dat and wayback_static importers | Bryan Newbold | 2021-11-10 | 3 | -510/+0 |
| * | datacite import: store less subject metadata | Bryan Newbold | 2021-11-10 | 1 | -1/+7 |
| * | importers: use clean_doi() in many more (all?) importers | Bryan Newbold | 2021-11-09 | 6 | -12/+29 |
| * | clean_doi: stop mutating double-slash DOIs, except for 10.1037 prefix | Bryan Newbold | 2021-11-09 | 1 | -1/+2 |
| * | remove deprecated extid sqlite3 lookup table feature from importers | Bryan Newbold | 2021-11-09 | 3 | -160/+0 |
* | | Merge branch 'bnewbold-cleanups-nov2021' into 'master' | bnewbold | 2021-11-11 | 4 | -0/+748 |
|\ \ |
|
| * | | file/release bugfix: handle files with multiple edits | Bryan Newbold | 2021-11-09 | 1 | -6/+6 |
| * | | cleanups: add more state=active checks | Bryan Newbold | 2021-11-09 | 2 | -0/+8 |
| * | | update link source filters in file/release bugfix | Bryan Newbold | 2021-11-09 | 1 | -2/+8 |
| * | | initial file/release bugfix cleanup worker and notes | Bryan Newbold | 2021-11-09 | 1 | -0/+231 |
| * | | updates to lowercase DOI cleanup | Bryan Newbold | 2021-11-09 | 1 | -7/+15 |
| * | | lowercase DOI lint and check entity status | Bryan Newbold | 2021-11-09 | 1 | -4/+5 |
| * | | more iteration on short wayback timestamp cleanup | Bryan Newbold | 2021-11-09 | 1 | -1/+1 |
| * | | cleanups: tweaks to wayback CDX cleanup scripts | Bryan Newbold | 2021-11-09 | 1 | -5/+13 |
| * | | cleanups: initial lowercase DOI cleanup script | Bryan Newbold | 2021-11-09 | 1 | -0/+145 |
| * | | wayback short ts: another regression test, and some small fmt/tweaks | Bryan Newbold | 2021-11-09 | 1 | -3/+38 |
| * | | wayback cleanup: actually update entity | Bryan Newbold | 2021-11-09 | 1 | -2/+4 |
| * | | imports: generic file cleanup removes exact duplicate URLs | Bryan Newbold | 2021-11-09 | 1 | -0/+9 |
| * | | wayback short ts: add regression test for dupe URLs | Bryan Newbold | 2021-11-09 | 1 | -0/+44 |
| * | | short wayback ts: initial cleanup script implementation | Bryan Newbold | 2021-11-09 | 1 | -0/+251 |
| |/ |
|
* / | pubmed: allow updates if PMCID does not exist yet | Bryan Newbold | 2021-11-10 | 1 | -1/+6 |
|/ |
|
* | cleanups: create a separate JsonLinePusher for cleanup workers (distinct base... | Bryan Newbold | 2021-11-03 | 2 | -2/+19 |
* | datacite importer: remove unused 'year_only' variable | Bryan Newbold | 2021-11-03 | 1 | -2/+3 |
* | pubmed harvester: remove unused variables | Bryan Newbold | 2021-11-03 | 1 | -2/+2 |
* | pubmed harvester: explicit assertions to mark unreachable code paths | Bryan Newbold | 2021-11-03 | 1 | -0/+2 |
* | typing: add assertions to fatcat_tool code to make type assumptions explicit | Bryan Newbold | 2021-11-03 | 3 | -0/+3 |
* | typing: add annotations to remaining fatcat_tools code | Bryan Newbold | 2021-11-03 | 9 | -122/+186 |