summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* update statsBryan Newbold2022-01-123-0/+49
|
* ES: update README for v05-era indicesBryan Newbold2022-01-121-15/+15
|
* ES schema: fix typo in container issns aliasBryan Newbold2022-01-121-1/+1
|
* elasticsearch: bump timeout to 40 seconds (from default of 10)Bryan Newbold2022-01-101-1/+1
|
* make fmtBryan Newbold2021-12-152-5/+6
|
* Merge branch 'martin-sentry-sdk' into 'master'bnewbold2021-12-1610-344/+396
|\ | | | | | | | | move from raven to sentry_sdk See merge request webgroup/fatcat!135
| * move from raven to sentry_sdkMartin Czygan2021-12-1410-344/+396
| | | | | | | | | | | | | | | | | | related docs: * https://docs.sentry.io/platforms/python/guides/flask/migration/ * https://docs.sentry.io/platforms/python/guides/asgi/configuration/integrations/flask/ > `fetch_git_sha` is gone, see: https://forum.sentry.io/t/fetch-git-sha-equivalent-in-the-unified-python-sdk/5521
* | crossref importer: skip affiliations lacking 'name'Bryan Newbold2021-12-151-0/+3
|/ | | | Relatedly, we should start handling ROR affiliations in contribs soon.
* updates to guide based on feedbackBryan Newbold2021-12-083-9/+35
|
* mergers: fix typo in env var nameBryan Newbold2021-12-073-3/+3
|
* another file_meta updateBryan Newbold2021-12-061-0/+60
|
* ES container schema: add 'sim_pubid' and `ia_sim_collection` fieldsBryan Newbold2021-12-032-0/+4
|
* ES transform: remove prototype microfilm linksBryan Newbold2021-12-031-20/+0
| | | | This ended up being a feature in scholar.archive.org, not fatcat.
* SQL snashots/exports: updated prod commandsBryan Newbold2021-12-031-13/+15
|
* file_meta cleanup updateBryan Newbold2021-12-011-0/+75
|
* initial 'far-future' release date updatesBryan Newbold2021-11-301-0/+212
|
* chocula update notesBryan Newbold2021-11-301-0/+61
|
* container ISSN-L dedupe notesBryan Newbold2021-11-301-0/+198
|
* chocula importer: handle not-upper-case ISSNsBryan Newbold2021-11-301-2/+6
|
* chocula importer: handle broken ISSNs in extra metadataBryan Newbold2021-11-301-2/+7
|
* chocula importer: tweak counting, conditions for doing updatesBryan Newbold2021-11-301-15/+7
|
* chocula importer: move issne/issnp 'extra' to top-level fields if doing updatesBryan Newbold2021-11-301-0/+6
|
* chocula: don't do name cleanups in importerBryan Newbold2021-11-301-8/+2
| | | | This kind of cleanup should be done in 'chocula' instead.
* container merger: fix bug with filtering by release countBryan Newbold2021-11-301-13/+15
| | | | | Also apply the "human edit" and "release count" checks only to the dupe (to-be-redirected) idents.
* add stats (before re-indexing), and rename old files for consistencyBryan Newbold2021-11-306-0/+47
|
* cleanups: springer 'page-one' sample PDFsBryan Newbold2021-11-292-0/+129
|
* cleanups: truncated wayback PDFs from common crawlBryan Newbold2021-11-292-0/+292
|
* update to truncated wayback timestamp issueBryan Newbold2021-11-291-0/+24
|
* update to file short wayback timestamp cleanupBryan Newbold2021-11-292-1/+30
|
* commit old 2021-11-11 stats fileBryan Newbold2021-11-291-0/+1
|
* clean up extra/ folder a bitBryan Newbold2021-11-2911-24/+0
|
* move notes/bulk_edits/ to extra/bulk_edits/Bryan Newbold2021-11-2923-0/+0
|
* move 'cleanups' directory from notes to extra/Bryan Newbold2021-11-2911-0/+0
|
* Merge branch 'bnewbold-container-merger'Bryan Newbold2021-11-297-4/+532
|\
| * notes on container ISSN-L merging, tested in QABryan Newbold2021-11-242-0/+160
| |
| * release merger: same editgroup_id fixes as for file and container mergersBryan Newbold2021-11-241-1/+5
| |
| * container merger: fixes from QA testingBryan Newbold2021-11-241-8/+13
| |
| * mergers: don't try to accept empty editgroups in dry-run-modeBryan Newbold2021-11-241-2/+4
| |
| * ES release transform: handle redirected containers betterBryan Newbold2021-11-241-1/+1
| | | | | | | | | | Despite the inline comment, we were not actually grabbing the "redirected" ident correctly, meaning some counts would not be accurate.
| * container merger: defer allocation of editgroup_id; and dummy code pathBryan Newbold2021-11-241-1/+5
| |
| * initial implementation of container mergerBryan Newbold2021-11-242-0/+353
| |
* | notes on file_meta partial cleanupBryan Newbold2021-11-244-0/+239
|/
* notes from prod run of file de-dupeBryan Newbold2021-11-242-0/+36
|
* file merger: allocate editgroup id later in 'merge' processBryan Newbold2021-11-241-1/+5
| | | | | The motivation is to avoid creating empty editgroups in dry-run mode, and when all entities are "skipped"
* Merge branch 'bnewbold-mergers' into 'master'bnewbold2021-11-258-0/+1046
|\ | | | | | | | | entity mergers framework See merge request webgroup/fatcat!133
| * mergers common: remove inaccurate commentBryan Newbold2021-11-241-2/+0
| | | | | | | | Caught in review, thanks miku
| * merger proposal typosBryan Newbold2021-11-241-2/+2
| |
| * file merger: add content_scope to list of merged fieldsBryan Newbold2021-11-241-1/+1
| |
| * release merger: some progress, but also disable (not complete)Bryan Newbold2021-11-231-12/+72
| |
| * file merges: fixes from testing in QABryan Newbold2021-11-231-14/+23
| |