Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | entity worker: expand creators in release entities | Bryan Newbold | 2021-12-15 | 1 | -1/+1 |
| | |||||
* | small default config typo fixes for elasticsearch workers | Bryan Newbold | 2021-12-15 | 1 | -2/+2 |
| | |||||
* | file elasticsearch index worker | Bryan Newbold | 2021-12-15 | 3 | -1/+63 |
| | |||||
* | updates to guide based on feedback | Bryan Newbold | 2021-12-08 | 3 | -9/+35 |
| | |||||
* | mergers: fix typo in env var name | Bryan Newbold | 2021-12-07 | 3 | -3/+3 |
| | |||||
* | another file_meta update | Bryan Newbold | 2021-12-06 | 1 | -0/+60 |
| | |||||
* | ES container schema: add 'sim_pubid' and `ia_sim_collection` fields | Bryan Newbold | 2021-12-03 | 2 | -0/+4 |
| | |||||
* | ES transform: remove prototype microfilm links | Bryan Newbold | 2021-12-03 | 1 | -20/+0 |
| | | | | This ended up being a feature in scholar.archive.org, not fatcat. | ||||
* | SQL snashots/exports: updated prod commands | Bryan Newbold | 2021-12-03 | 1 | -13/+15 |
| | |||||
* | file_meta cleanup update | Bryan Newbold | 2021-12-01 | 1 | -0/+75 |
| | |||||
* | initial 'far-future' release date updates | Bryan Newbold | 2021-11-30 | 1 | -0/+212 |
| | |||||
* | chocula update notes | Bryan Newbold | 2021-11-30 | 1 | -0/+61 |
| | |||||
* | container ISSN-L dedupe notes | Bryan Newbold | 2021-11-30 | 1 | -0/+198 |
| | |||||
* | chocula importer: handle not-upper-case ISSNs | Bryan Newbold | 2021-11-30 | 1 | -2/+6 |
| | |||||
* | chocula importer: handle broken ISSNs in extra metadata | Bryan Newbold | 2021-11-30 | 1 | -2/+7 |
| | |||||
* | chocula importer: tweak counting, conditions for doing updates | Bryan Newbold | 2021-11-30 | 1 | -15/+7 |
| | |||||
* | chocula importer: move issne/issnp 'extra' to top-level fields if doing updates | Bryan Newbold | 2021-11-30 | 1 | -0/+6 |
| | |||||
* | chocula: don't do name cleanups in importer | Bryan Newbold | 2021-11-30 | 1 | -8/+2 |
| | | | | This kind of cleanup should be done in 'chocula' instead. | ||||
* | container merger: fix bug with filtering by release count | Bryan Newbold | 2021-11-30 | 1 | -13/+15 |
| | | | | | Also apply the "human edit" and "release count" checks only to the dupe (to-be-redirected) idents. | ||||
* | add stats (before re-indexing), and rename old files for consistency | Bryan Newbold | 2021-11-30 | 6 | -0/+47 |
| | |||||
* | cleanups: springer 'page-one' sample PDFs | Bryan Newbold | 2021-11-29 | 2 | -0/+129 |
| | |||||
* | cleanups: truncated wayback PDFs from common crawl | Bryan Newbold | 2021-11-29 | 2 | -0/+292 |
| | |||||
* | update to truncated wayback timestamp issue | Bryan Newbold | 2021-11-29 | 1 | -0/+24 |
| | |||||
* | update to file short wayback timestamp cleanup | Bryan Newbold | 2021-11-29 | 2 | -1/+30 |
| | |||||
* | commit old 2021-11-11 stats file | Bryan Newbold | 2021-11-29 | 1 | -0/+1 |
| | |||||
* | clean up extra/ folder a bit | Bryan Newbold | 2021-11-29 | 11 | -24/+0 |
| | |||||
* | move notes/bulk_edits/ to extra/bulk_edits/ | Bryan Newbold | 2021-11-29 | 23 | -0/+0 |
| | |||||
* | move 'cleanups' directory from notes to extra/ | Bryan Newbold | 2021-11-29 | 11 | -0/+0 |
| | |||||
* | Merge branch 'bnewbold-container-merger' | Bryan Newbold | 2021-11-29 | 7 | -4/+532 |
|\ | |||||
| * | notes on container ISSN-L merging, tested in QA | Bryan Newbold | 2021-11-24 | 2 | -0/+160 |
| | | |||||
| * | release merger: same editgroup_id fixes as for file and container mergers | Bryan Newbold | 2021-11-24 | 1 | -1/+5 |
| | | |||||
| * | container merger: fixes from QA testing | Bryan Newbold | 2021-11-24 | 1 | -8/+13 |
| | | |||||
| * | mergers: don't try to accept empty editgroups in dry-run-mode | Bryan Newbold | 2021-11-24 | 1 | -2/+4 |
| | | |||||
| * | ES release transform: handle redirected containers better | Bryan Newbold | 2021-11-24 | 1 | -1/+1 |
| | | | | | | | | | | Despite the inline comment, we were not actually grabbing the "redirected" ident correctly, meaning some counts would not be accurate. | ||||
| * | container merger: defer allocation of editgroup_id; and dummy code path | Bryan Newbold | 2021-11-24 | 1 | -1/+5 |
| | | |||||
| * | initial implementation of container merger | Bryan Newbold | 2021-11-24 | 2 | -0/+353 |
| | | |||||
* | | notes on file_meta partial cleanup | Bryan Newbold | 2021-11-24 | 4 | -0/+239 |
|/ | |||||
* | notes from prod run of file de-dupe | Bryan Newbold | 2021-11-24 | 2 | -0/+36 |
| | |||||
* | file merger: allocate editgroup id later in 'merge' process | Bryan Newbold | 2021-11-24 | 1 | -1/+5 |
| | | | | | The motivation is to avoid creating empty editgroups in dry-run mode, and when all entities are "skipped" | ||||
* | Merge branch 'bnewbold-mergers' into 'master' | bnewbold | 2021-11-25 | 8 | -0/+1046 |
|\ | | | | | | | | | entity mergers framework See merge request webgroup/fatcat!133 | ||||
| * | mergers common: remove inaccurate comment | Bryan Newbold | 2021-11-24 | 1 | -2/+0 |
| | | | | | | | | Caught in review, thanks miku | ||||
| * | merger proposal typos | Bryan Newbold | 2021-11-24 | 1 | -2/+2 |
| | | |||||
| * | file merger: add content_scope to list of merged fields | Bryan Newbold | 2021-11-24 | 1 | -1/+1 |
| | | |||||
| * | release merger: some progress, but also disable (not complete) | Bryan Newbold | 2021-11-23 | 1 | -12/+72 |
| | | |||||
| * | file merges: fixes from testing in QA | Bryan Newbold | 2021-11-23 | 1 | -14/+23 |
| | | |||||
| * | file de-dupe: notes on prep and QA testing | Bryan Newbold | 2021-11-23 | 2 | -0/+136 |
| | | |||||
| * | mergers: small tweaks | Bryan Newbold | 2021-11-23 | 2 | -3/+3 |
| | | |||||
| * | mergers: remove entity mergers from __init__ (to work around warning) | Bryan Newbold | 2021-11-23 | 1 | -2/+0 |
| | | |||||
| * | add proposal for entity mergers | Bryan Newbold | 2021-11-23 | 1 | -0/+110 |
| | | |||||
| * | initial file merger, with tests | Bryan Newbold | 2021-11-23 | 2 | -0/+388 |
| | |