Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | cleanup another script | Martin Czygan | 2021-07-17 | 5 | -311/+72 |
| | |||||
* | cleanup skate-bref-id | Martin Czygan | 2021-07-17 | 2 | -42/+1 |
| | |||||
* | update indexing notes | Martin Czygan | 2021-07-17 | 1 | -0/+38 |
| | |||||
* | tasks: add data point | Martin Czygan | 2021-07-16 | 1 | -2/+3 |
| | |||||
* | reduce: use correct reducer | Martin Czygan | 2021-07-15 | 1 | -2/+2 |
| | |||||
* | tasks: ignore exit code 141 for now | Martin Czygan | 2021-07-15 | 1 | -1/+1 |
| | |||||
* | tasks: add BrefZipWayback | Martin Czygan | 2021-07-15 | 1 | -0/+20 |
| | |||||
* | register reducer | Martin Czygan | 2021-07-15 | 1 | -0/+14 |
| | |||||
* | add ZippyWayback reducer | Martin Czygan | 2021-07-15 | 3 | -54/+114 |
| | |||||
* | tasks: reduce sample size | Martin Czygan | 2021-07-15 | 1 | -1/+1 |
| | |||||
* | tasks: tweak CDXURL | Martin Czygan | 2021-07-15 | 1 | -2/+2 |
| | |||||
* | tasks: fix command | Martin Czygan | 2021-07-15 | 1 | -0/+1 |
| | |||||
* | tasks: tweak CDXURL | Martin Czygan | 2021-07-15 | 1 | -3/+5 |
| | |||||
* | tasks: add CDXURL | Martin Czygan | 2021-07-15 | 1 | -0/+28 |
| | |||||
* | mapper: add cdxu | Martin Czygan | 2021-07-15 | 2 | -0/+22 |
| | |||||
* | tasks: cleanup urls | Martin Czygan | 2021-07-15 | 1 | -0/+1 |
| | |||||
* | notes: add unique example | Martin Czygan | 2021-07-15 | 1 | -1/+1 |
| | |||||
* | tasks: add RefsURL | Martin Czygan | 2021-07-15 | 1 | -0/+26 |
| | |||||
* | map: add another mapper | Martin Czygan | 2021-07-15 | 2 | -3/+17 |
| | |||||
* | cdx reshape: only include hits | Martin Czygan | 2021-07-15 | 1 | -2/+1 |
| | |||||
* | cdx reshape: write json | Martin Czygan | 2021-07-15 | 1 | -2/+2 |
| | |||||
* | extra: cdx reshape | Martin Czygan | 2021-07-15 | 1 | -0/+19 |
| | |||||
* | update notes | Martin Czygan | 2021-07-15 | 1 | -0/+9 |
| | |||||
* | update docs | Martin Czygan | 2021-07-14 | 2 | -11/+11 |
| | |||||
* | reduce: add test | Martin Czygan | 2021-07-14 | 2 | -18/+41 |
| | |||||
* | notes: 2021-07-06 version | Martin Czygan | 2021-07-14 | 1 | -0/+38 |
| | |||||
* | tasks: update docs | Martin Czygan | 2021-07-14 | 1 | -0/+2 |
| | |||||
* | tasks: add performance note | Martin Czygan | 2021-07-14 | 1 | -0/+7 |
| | |||||
* | reduce: add todo | Martin Czygan | 2021-07-14 | 1 | -0/+2 |
| | |||||
* | v0.1.39 | Martin Czygan | 2021-07-14 | 1 | -1/+1 |
| | |||||
* | reduce: add csl field | Martin Czygan | 2021-07-14 | 4 | -8/+72 |
| | |||||
* | reduce: fix off-by-one error | Martin Czygan | 2021-07-14 | 2 | -2/+2 |
| | | | | duplication detection required a +1 on the index in the ref document | ||||
* | tasks: only include docs with a work id | Martin Czygan | 2021-07-14 | 1 | -4/+2 |
| | |||||
* | reduce: temp bug fix for line cutter | Martin Czygan | 2021-07-13 | 2 | -32/+61 |
| | | | | | | | | we wanted to trim whitespace at one point, because values contained the separator values; however this breaks with empty values; move back to not trimming values except for the newline, when requesting the last value; moving forward, we need to clean or reject dirty values or use a different delimiter | ||||
* | v0.1.38 | Martin Czygan | 2021-07-13 | 1 | -1/+1 |
| | |||||
* | reduce: small tweaks | Martin Czygan | 2021-07-13 | 2 | -6/+7 |
| | |||||
* | fix typo | Martin Czygan | 2021-07-13 | 1 | -1/+1 |
| | |||||
* | wip: csl logging | Martin Czygan | 2021-07-13 | 1 | -1/+1 |
| | |||||
* | update docs | Martin Czygan | 2021-07-13 | 1 | -1/+7 |
| | |||||
* | reduce/schema: add csl | Martin Czygan | 2021-07-13 | 3 | -5/+70 |
| | |||||
* | wiki: include lang in encoded page title | Martin Czygan | 2021-07-13 | 2 | -8/+18 |
| | |||||
* | reduce: add todo | Martin Czygan | 2021-07-13 | 1 | -1/+3 |
| | |||||
* | separate slugify functions | Martin Czygan | 2021-07-13 | 4 | -28/+39 |
| | |||||
* | mock out time.Now for tests | Martin Czygan | 2021-07-13 | 4 | -1034/+1041 |
| | |||||
* | reduce: log broken line only | Martin Czygan | 2021-07-10 | 1 | -1/+1 |
| | |||||
* | reduce: add key and indexed ts for exact matches | Martin Czygan | 2021-07-10 | 1 | -0/+2 |
| | |||||
* | batch: drop logging | Martin Czygan | 2021-07-10 | 1 | -4/+0 |
| | |||||
* | batch: log batch size | Martin Czygan | 2021-07-10 | 1 | -1/+1 |
| | |||||
* | reduce: short circuit large groups | Martin Czygan | 2021-07-10 | 1 | -2/+12 |
| | | | | | | | | we saw a jump in memory usage, and it may be related to groups with thousands of elements; e.g. maybe some weird string, that appears too many times as key, e.g. 123/test; as a first measure, we sort circuit further batching; other mitigiation may to be limit groups size completely | ||||
* | schema: prefer isbn13 | Martin Czygan | 2021-07-10 | 1 | -1/+5 |
| |