Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | reduce: log broken line only | Martin Czygan | 2021-07-10 | 1 | -1/+1 |
| | |||||
* | reduce: add key and indexed ts for exact matches | Martin Czygan | 2021-07-10 | 1 | -0/+2 |
| | |||||
* | batch: drop logging | Martin Czygan | 2021-07-10 | 1 | -4/+0 |
| | |||||
* | batch: log batch size | Martin Czygan | 2021-07-10 | 1 | -1/+1 |
| | |||||
* | reduce: short circuit large groups | Martin Czygan | 2021-07-10 | 1 | -2/+12 |
| | | | | | | | | we saw a jump in memory usage, and it may be related to groups with thousands of elements; e.g. maybe some weird string, that appears too many times as key, e.g. 123/test; as a first measure, we sort circuit further batching; other mitigiation may to be limit groups size completely | ||||
* | schema: prefer isbn13 | Martin Czygan | 2021-07-10 | 1 | -1/+5 |
| | |||||
* | schema: render isbn as well | Martin Czygan | 2021-07-10 | 1 | -1/+7 |
| | |||||
* | reduce: ol, fuzzy, w/ unstructured | Martin Czygan | 2021-07-10 | 1 | -1/+1 |
| | |||||
* | schema: add test | Martin Czygan | 2021-07-10 | 2 | -0/+20 |
| | |||||
* | schema: flesh our unstructured rendering | Martin Czygan | 2021-07-10 | 2 | -0/+56 |
| | |||||
* | release to unstructured stub | Martin Czygan | 2021-07-10 | 3 | -2/+84 |
| | |||||
* | update docs | Martin Czygan | 2021-07-10 | 1 | -0/+2 |
| | |||||
* | reduce: open library id tweaks | Martin Czygan | 2021-07-10 | 1 | -5/+27 |
| | |||||
* | tasks: bref, add wikipedia | Martin Czygan | 2021-07-10 | 1 | -2/+3 |
| | |||||
* | reduce: tweak wiki bref | Martin Czygan | 2021-07-10 | 1 | -4/+5 |
| | |||||
* | reduce: filter out duplicate wiki links | Martin Czygan | 2021-07-10 | 1 | -0/+8 |
| | |||||
* | wiki: use lowercase base32 of page title | Martin Czygan | 2021-07-09 | 1 | -2/+3 |
| | | | | * mostly case insensitive, same case as ident | ||||
* | reduce: use a base64 encoded title as key | Martin Czygan | 2021-07-09 | 1 | -1/+7 |
| | |||||
* | tasks: amend wiki bref task | Martin Czygan | 2021-07-09 | 1 | -0/+1 |
| | |||||
* | tasks: fix typo | Martin Czygan | 2021-07-09 | 1 | -1/+1 |
| | |||||
* | tasks: use uncompressed stream | Martin Czygan | 2021-07-09 | 1 | -1/+2 |
| | |||||
* | wiki: cleanup redundant check | Martin Czygan | 2021-07-09 | 1 | -1/+1 |
| | |||||
* | wiki: tweak whitespace handling | Martin Czygan | 2021-07-09 | 1 | -1/+7 |
| | |||||
* | wiki: more aggressive whitespace cleanup | Martin Czygan | 2021-07-09 | 1 | -1/+2 |
| | |||||
* | wiki: try a bit more cleanup | Martin Czygan | 2021-07-09 | 1 | -1/+5 |
| | |||||
* | tasks: wiki, sort by doi in first column | Martin Czygan | 2021-07-09 | 1 | -1/+1 |
| | |||||
* | wiki: verify doi | Martin Czygan | 2021-07-09 | 1 | -1/+1 |
| | |||||
* | unstructured: cleanup obsolete regex | Martin Czygan | 2021-07-09 | 1 | -9/+3 |
| | |||||
* | tasks: BrefZipWikiDOI | Martin Czygan | 2021-07-09 | 1 | -1/+8 |
| | |||||
* | reduce: wiki doc in column 3 | Martin Czygan | 2021-07-09 | 1 | -1/+1 |
| | |||||
* | tests: sync verify test data | Martin Czygan | 2021-07-09 | 6 | -0/+176 |
| | |||||
* | tasks: wiki stub | Martin Czygan | 2021-07-09 | 2 | -0/+17 |
| | |||||
* | wiki: flip doi and page title column | Martin Czygan | 2021-07-09 | 1 | -3/+3 |
| | |||||
* | reduce: move batch size | Martin Czygan | 2021-07-09 | 2 | -9/+9 |
| | |||||
* | cli: try to always display shiv_root | Martin Czygan | 2021-07-08 | 1 | -1/+2 |
| | |||||
* | update proposal status | Martin Czygan | 2021-07-08 | 1 | -2/+2 |
| | |||||
* | reduce: prepare command line help | Martin Czygan | 2021-07-08 | 1 | -0/+12 |
| | |||||
* | note on timings | Martin Czygan | 2021-07-08 | 2 | -1/+9 |
| | |||||
* | update docs | Martin Czygan | 2021-07-08 | 1 | -3/+3 |
| | |||||
* | reduce: set default batch size | Martin Czygan | 2021-07-08 | 1 | -6/+8 |
| | |||||
* | simplify imports | Martin Czygan | 2021-07-08 | 9 | -9/+9 |
| | |||||
* | reduce: separate batch calls | Martin Czygan | 2021-07-08 | 2 | -20/+25 |
| | |||||
* | fix merge conflict | Martin Czygan | 2021-07-07 | 8 | -82/+136 |
|\ | |||||
| * | update docs | Martin Czygan | 2021-07-07 | 1 | -3/+8 |
| | | |||||
| * | skate: no need for alias | Martin Czygan | 2021-07-07 | 1 | -1/+1 |
| | | |||||
| * | add WikipediaDOI | Martin Czygan | 2021-07-07 | 1 | -0/+43 |
| | | |||||
| * | do not compress sort tmp files | Martin Czygan | 2021-07-06 | 1 | -21/+21 |
| | | |||||
| * | run a parity derivation | Martin Czygan | 2021-07-06 | 1 | -2/+2 |
| | | |||||
| * | util: cleanup encoder | Martin Czygan | 2021-07-06 | 1 | -19/+0 |
| | | |||||
| * | reduce: remove log line | Martin Czygan | 2021-07-06 | 1 | -1/+0 |
| | |