Commit message (Collapse) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
* | mergers: fix typo in env var name | Bryan Newbold | 2021-12-07 | 3 | -3/+3 | |
| | ||||||
* | another file_meta update | Bryan Newbold | 2021-12-06 | 1 | -0/+60 | |
| | ||||||
* | ES container schema: add 'sim_pubid' and `ia_sim_collection` fields | Bryan Newbold | 2021-12-03 | 2 | -0/+4 | |
| | ||||||
* | ES transform: remove prototype microfilm links | Bryan Newbold | 2021-12-03 | 1 | -20/+0 | |
| | | | | This ended up being a feature in scholar.archive.org, not fatcat. | |||||
* | SQL snashots/exports: updated prod commands | Bryan Newbold | 2021-12-03 | 1 | -13/+15 | |
| | ||||||
* | file_meta cleanup update | Bryan Newbold | 2021-12-01 | 1 | -0/+75 | |
| | ||||||
* | initial 'far-future' release date updates | Bryan Newbold | 2021-11-30 | 1 | -0/+212 | |
| | ||||||
* | chocula update notes | Bryan Newbold | 2021-11-30 | 1 | -0/+61 | |
| | ||||||
* | container ISSN-L dedupe notes | Bryan Newbold | 2021-11-30 | 1 | -0/+198 | |
| | ||||||
* | chocula importer: handle not-upper-case ISSNs | Bryan Newbold | 2021-11-30 | 1 | -2/+6 | |
| | ||||||
* | chocula importer: handle broken ISSNs in extra metadata | Bryan Newbold | 2021-11-30 | 1 | -2/+7 | |
| | ||||||
* | chocula importer: tweak counting, conditions for doing updates | Bryan Newbold | 2021-11-30 | 1 | -15/+7 | |
| | ||||||
* | chocula importer: move issne/issnp 'extra' to top-level fields if doing updates | Bryan Newbold | 2021-11-30 | 1 | -0/+6 | |
| | ||||||
* | chocula: don't do name cleanups in importer | Bryan Newbold | 2021-11-30 | 1 | -8/+2 | |
| | | | | This kind of cleanup should be done in 'chocula' instead. | |||||
* | container merger: fix bug with filtering by release count | Bryan Newbold | 2021-11-30 | 1 | -13/+15 | |
| | | | | | Also apply the "human edit" and "release count" checks only to the dupe (to-be-redirected) idents. | |||||
* | add stats (before re-indexing), and rename old files for consistency | Bryan Newbold | 2021-11-30 | 6 | -0/+47 | |
| | ||||||
* | cleanups: springer 'page-one' sample PDFs | Bryan Newbold | 2021-11-29 | 2 | -0/+129 | |
| | ||||||
* | cleanups: truncated wayback PDFs from common crawl | Bryan Newbold | 2021-11-29 | 2 | -0/+292 | |
| | ||||||
* | update to truncated wayback timestamp issue | Bryan Newbold | 2021-11-29 | 1 | -0/+24 | |
| | ||||||
* | update to file short wayback timestamp cleanup | Bryan Newbold | 2021-11-29 | 2 | -1/+30 | |
| | ||||||
* | commit old 2021-11-11 stats file | Bryan Newbold | 2021-11-29 | 1 | -0/+1 | |
| | ||||||
* | clean up extra/ folder a bit | Bryan Newbold | 2021-11-29 | 11 | -24/+0 | |
| | ||||||
* | move notes/bulk_edits/ to extra/bulk_edits/ | Bryan Newbold | 2021-11-29 | 23 | -0/+0 | |
| | ||||||
* | move 'cleanups' directory from notes to extra/ | Bryan Newbold | 2021-11-29 | 11 | -0/+0 | |
| | ||||||
* | Merge branch 'bnewbold-container-merger' | Bryan Newbold | 2021-11-29 | 7 | -4/+532 | |
|\ | ||||||
| * | notes on container ISSN-L merging, tested in QA | Bryan Newbold | 2021-11-24 | 2 | -0/+160 | |
| | | ||||||
| * | release merger: same editgroup_id fixes as for file and container mergers | Bryan Newbold | 2021-11-24 | 1 | -1/+5 | |
| | | ||||||
| * | container merger: fixes from QA testing | Bryan Newbold | 2021-11-24 | 1 | -8/+13 | |
| | | ||||||
| * | mergers: don't try to accept empty editgroups in dry-run-mode | Bryan Newbold | 2021-11-24 | 1 | -2/+4 | |
| | | ||||||
| * | ES release transform: handle redirected containers better | Bryan Newbold | 2021-11-24 | 1 | -1/+1 | |
| | | | | | | | | | | Despite the inline comment, we were not actually grabbing the "redirected" ident correctly, meaning some counts would not be accurate. | |||||
| * | container merger: defer allocation of editgroup_id; and dummy code path | Bryan Newbold | 2021-11-24 | 1 | -1/+5 | |
| | | ||||||
| * | initial implementation of container merger | Bryan Newbold | 2021-11-24 | 2 | -0/+353 | |
| | | ||||||
* | | notes on file_meta partial cleanup | Bryan Newbold | 2021-11-24 | 4 | -0/+239 | |
|/ | ||||||
* | notes from prod run of file de-dupe | Bryan Newbold | 2021-11-24 | 2 | -0/+36 | |
| | ||||||
* | file merger: allocate editgroup id later in 'merge' process | Bryan Newbold | 2021-11-24 | 1 | -1/+5 | |
| | | | | | The motivation is to avoid creating empty editgroups in dry-run mode, and when all entities are "skipped" | |||||
* | Merge branch 'bnewbold-mergers' into 'master' | bnewbold | 2021-11-25 | 8 | -0/+1046 | |
|\ | | | | | | | | | entity mergers framework See merge request webgroup/fatcat!133 | |||||
| * | mergers common: remove inaccurate comment | Bryan Newbold | 2021-11-24 | 1 | -2/+0 | |
| | | | | | | | | Caught in review, thanks miku | |||||
| * | merger proposal typos | Bryan Newbold | 2021-11-24 | 1 | -2/+2 | |
| | | ||||||
| * | file merger: add content_scope to list of merged fields | Bryan Newbold | 2021-11-24 | 1 | -1/+1 | |
| | | ||||||
| * | release merger: some progress, but also disable (not complete) | Bryan Newbold | 2021-11-23 | 1 | -12/+72 | |
| | | ||||||
| * | file merges: fixes from testing in QA | Bryan Newbold | 2021-11-23 | 1 | -14/+23 | |
| | | ||||||
| * | file de-dupe: notes on prep and QA testing | Bryan Newbold | 2021-11-23 | 2 | -0/+136 | |
| | | ||||||
| * | mergers: small tweaks | Bryan Newbold | 2021-11-23 | 2 | -3/+3 | |
| | | ||||||
| * | mergers: remove entity mergers from __init__ (to work around warning) | Bryan Newbold | 2021-11-23 | 1 | -2/+0 | |
| | | ||||||
| * | add proposal for entity mergers | Bryan Newbold | 2021-11-23 | 1 | -0/+110 | |
| | | ||||||
| * | initial file merger, with tests | Bryan Newbold | 2021-11-23 | 2 | -0/+388 | |
| | | ||||||
| * | mergers: fmt, lint, refactors | Bryan Newbold | 2021-11-23 | 3 | -96/+200 | |
| | | | | | | | | | | These old merger code is from an old branch and needed significant cleanup | |||||
| * | remove top-level fatcat_merge.py; going to call module __main__ going forward | Bryan Newbold | 2021-11-23 | 1 | -112/+0 | |
| | | ||||||
| * | first iteration of mergers | Bryan Newbold | 2021-11-23 | 4 | -0/+355 | |
| | | ||||||
* | | CHANGELOG: note about spelling corrections | Bryan Newbold | 2021-11-24 | 1 | -0/+6 | |
| | |