summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* small default config typo fixes for elasticsearch workersBryan Newbold2021-12-151-2/+2
|
* file elasticsearch index workerBryan Newbold2021-12-153-1/+63
|
* updates to guide based on feedbackBryan Newbold2021-12-083-9/+35
|
* mergers: fix typo in env var nameBryan Newbold2021-12-073-3/+3
|
* another file_meta updateBryan Newbold2021-12-061-0/+60
|
* ES container schema: add 'sim_pubid' and `ia_sim_collection` fieldsBryan Newbold2021-12-032-0/+4
|
* ES transform: remove prototype microfilm linksBryan Newbold2021-12-031-20/+0
| | | | This ended up being a feature in scholar.archive.org, not fatcat.
* SQL snashots/exports: updated prod commandsBryan Newbold2021-12-031-13/+15
|
* file_meta cleanup updateBryan Newbold2021-12-011-0/+75
|
* initial 'far-future' release date updatesBryan Newbold2021-11-301-0/+212
|
* chocula update notesBryan Newbold2021-11-301-0/+61
|
* container ISSN-L dedupe notesBryan Newbold2021-11-301-0/+198
|
* chocula importer: handle not-upper-case ISSNsBryan Newbold2021-11-301-2/+6
|
* chocula importer: handle broken ISSNs in extra metadataBryan Newbold2021-11-301-2/+7
|
* chocula importer: tweak counting, conditions for doing updatesBryan Newbold2021-11-301-15/+7
|
* chocula importer: move issne/issnp 'extra' to top-level fields if doing updatesBryan Newbold2021-11-301-0/+6
|
* chocula: don't do name cleanups in importerBryan Newbold2021-11-301-8/+2
| | | | This kind of cleanup should be done in 'chocula' instead.
* container merger: fix bug with filtering by release countBryan Newbold2021-11-301-13/+15
| | | | | Also apply the "human edit" and "release count" checks only to the dupe (to-be-redirected) idents.
* add stats (before re-indexing), and rename old files for consistencyBryan Newbold2021-11-306-0/+47
|
* cleanups: springer 'page-one' sample PDFsBryan Newbold2021-11-292-0/+129
|
* cleanups: truncated wayback PDFs from common crawlBryan Newbold2021-11-292-0/+292
|
* update to truncated wayback timestamp issueBryan Newbold2021-11-291-0/+24
|
* update to file short wayback timestamp cleanupBryan Newbold2021-11-292-1/+30
|
* commit old 2021-11-11 stats fileBryan Newbold2021-11-291-0/+1
|
* clean up extra/ folder a bitBryan Newbold2021-11-2911-24/+0
|
* move notes/bulk_edits/ to extra/bulk_edits/Bryan Newbold2021-11-2923-0/+0
|
* move 'cleanups' directory from notes to extra/Bryan Newbold2021-11-2911-0/+0
|
* Merge branch 'bnewbold-container-merger'Bryan Newbold2021-11-297-4/+532
|\
| * notes on container ISSN-L merging, tested in QABryan Newbold2021-11-242-0/+160
| |
| * release merger: same editgroup_id fixes as for file and container mergersBryan Newbold2021-11-241-1/+5
| |
| * container merger: fixes from QA testingBryan Newbold2021-11-241-8/+13
| |
| * mergers: don't try to accept empty editgroups in dry-run-modeBryan Newbold2021-11-241-2/+4
| |
| * ES release transform: handle redirected containers betterBryan Newbold2021-11-241-1/+1
| | | | | | | | | | Despite the inline comment, we were not actually grabbing the "redirected" ident correctly, meaning some counts would not be accurate.
| * container merger: defer allocation of editgroup_id; and dummy code pathBryan Newbold2021-11-241-1/+5
| |
| * initial implementation of container mergerBryan Newbold2021-11-242-0/+353
| |
* | notes on file_meta partial cleanupBryan Newbold2021-11-244-0/+239
|/
* notes from prod run of file de-dupeBryan Newbold2021-11-242-0/+36
|
* file merger: allocate editgroup id later in 'merge' processBryan Newbold2021-11-241-1/+5
| | | | | The motivation is to avoid creating empty editgroups in dry-run mode, and when all entities are "skipped"
* Merge branch 'bnewbold-mergers' into 'master'bnewbold2021-11-258-0/+1046
|\ | | | | | | | | entity mergers framework See merge request webgroup/fatcat!133
| * mergers common: remove inaccurate commentBryan Newbold2021-11-241-2/+0
| | | | | | | | Caught in review, thanks miku
| * merger proposal typosBryan Newbold2021-11-241-2/+2
| |
| * file merger: add content_scope to list of merged fieldsBryan Newbold2021-11-241-1/+1
| |
| * release merger: some progress, but also disable (not complete)Bryan Newbold2021-11-231-12/+72
| |
| * file merges: fixes from testing in QABryan Newbold2021-11-231-14/+23
| |
| * file de-dupe: notes on prep and QA testingBryan Newbold2021-11-232-0/+136
| |
| * mergers: small tweaksBryan Newbold2021-11-232-3/+3
| |
| * mergers: remove entity mergers from __init__ (to work around warning)Bryan Newbold2021-11-231-2/+0
| |
| * add proposal for entity mergersBryan Newbold2021-11-231-0/+110
| |
| * initial file merger, with testsBryan Newbold2021-11-232-0/+388
| |
| * mergers: fmt, lint, refactorsBryan Newbold2021-11-233-96/+200
| | | | | | | | | | These old merger code is from an old branch and needed significant cleanup