summaryrefslogtreecommitdiffstats
Commit message (Expand)AuthorAgeFilesLines
* ingest tool: clear_scroll allowed in es-public-proxy for some timeBryan Newbold2022-01-211-8/+0
* container counts update process READMEBryan Newbold2022-01-211-0/+41
* Merge branch 'bnewbold-file-es' into 'master'bnewbold2022-01-214-4/+66
|\
| * entity worker: expand creators in release entitiesBryan Newbold2021-12-151-1/+1
| * small default config typo fixes for elasticsearch workersBryan Newbold2021-12-151-2/+2
| * file elasticsearch index workerBryan Newbold2021-12-153-1/+63
* | update statsBryan Newbold2022-01-123-0/+49
* | ES: update README for v05-era indicesBryan Newbold2022-01-121-15/+15
* | ES schema: fix typo in container issns aliasBryan Newbold2022-01-121-1/+1
* | elasticsearch: bump timeout to 40 seconds (from default of 10)Bryan Newbold2022-01-101-1/+1
* | make fmtBryan Newbold2021-12-152-5/+6
* | Merge branch 'martin-sentry-sdk' into 'master'bnewbold2021-12-1610-344/+396
|\ \
| * | move from raven to sentry_sdkMartin Czygan2021-12-1410-344/+396
| |/
* / crossref importer: skip affiliations lacking 'name'Bryan Newbold2021-12-151-0/+3
|/
* updates to guide based on feedbackBryan Newbold2021-12-083-9/+35
* mergers: fix typo in env var nameBryan Newbold2021-12-073-3/+3
* another file_meta updateBryan Newbold2021-12-061-0/+60
* ES container schema: add 'sim_pubid' and `ia_sim_collection` fieldsBryan Newbold2021-12-032-0/+4
* ES transform: remove prototype microfilm linksBryan Newbold2021-12-031-20/+0
* SQL snashots/exports: updated prod commandsBryan Newbold2021-12-031-13/+15
* file_meta cleanup updateBryan Newbold2021-12-011-0/+75
* initial 'far-future' release date updatesBryan Newbold2021-11-301-0/+212
* chocula update notesBryan Newbold2021-11-301-0/+61
* container ISSN-L dedupe notesBryan Newbold2021-11-301-0/+198
* chocula importer: handle not-upper-case ISSNsBryan Newbold2021-11-301-2/+6
* chocula importer: handle broken ISSNs in extra metadataBryan Newbold2021-11-301-2/+7
* chocula importer: tweak counting, conditions for doing updatesBryan Newbold2021-11-301-15/+7
* chocula importer: move issne/issnp 'extra' to top-level fields if doing updatesBryan Newbold2021-11-301-0/+6
* chocula: don't do name cleanups in importerBryan Newbold2021-11-301-8/+2
* container merger: fix bug with filtering by release countBryan Newbold2021-11-301-13/+15
* add stats (before re-indexing), and rename old files for consistencyBryan Newbold2021-11-306-0/+47
* cleanups: springer 'page-one' sample PDFsBryan Newbold2021-11-292-0/+129
* cleanups: truncated wayback PDFs from common crawlBryan Newbold2021-11-292-0/+292
* update to truncated wayback timestamp issueBryan Newbold2021-11-291-0/+24
* update to file short wayback timestamp cleanupBryan Newbold2021-11-292-1/+30
* commit old 2021-11-11 stats fileBryan Newbold2021-11-291-0/+1
* clean up extra/ folder a bitBryan Newbold2021-11-2911-24/+0
* move notes/bulk_edits/ to extra/bulk_edits/Bryan Newbold2021-11-2923-0/+0
* move 'cleanups' directory from notes to extra/Bryan Newbold2021-11-2911-0/+0
* Merge branch 'bnewbold-container-merger'Bryan Newbold2021-11-297-4/+532
|\
| * notes on container ISSN-L merging, tested in QABryan Newbold2021-11-242-0/+160
| * release merger: same editgroup_id fixes as for file and container mergersBryan Newbold2021-11-241-1/+5
| * container merger: fixes from QA testingBryan Newbold2021-11-241-8/+13
| * mergers: don't try to accept empty editgroups in dry-run-modeBryan Newbold2021-11-241-2/+4
| * ES release transform: handle redirected containers betterBryan Newbold2021-11-241-1/+1
| * container merger: defer allocation of editgroup_id; and dummy code pathBryan Newbold2021-11-241-1/+5
| * initial implementation of container mergerBryan Newbold2021-11-242-0/+353
* | notes on file_meta partial cleanupBryan Newbold2021-11-244-0/+239
|/
* notes from prod run of file de-dupeBryan Newbold2021-11-242-0/+36
* file merger: allocate editgroup id later in 'merge' processBryan Newbold2021-11-241-1/+5