Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | reduce: use mockable time | Martin Czygan | 2021-11-23 | 1 | -6/+7 |
| | | | | | While basically the same, we save a bit with a default mock and prepare a bit better for some future encapsulation. | ||||
* | rename module to gitlab.com/internetarchive/refcat | Martin Czygan | 2021-10-20 | 1 | -3/+3 |
| | | | | | This changes all the import paths to the current canonical location on http://gitlab.com/internetarchive/refcat. | ||||
* | misc: fix and improve comments | Martin Czygan | 2021-09-23 | 1 | -0/+15 |
| | |||||
* | reduce: remove log statements | Martin Czygan | 2021-07-28 | 1 | -4/+0 |
| | |||||
* | leave ref.index unchanged | Martin Czygan | 2021-07-28 | 1 | -6/+6 |
| | | | | | | | | | previously, we started with 0-indexed input, but wanted 1-indexed values so we added increments at various points which probably lead to bug (missing refs), since at one point we would fuse the original ref data (w/o increments) with the matched data (w/ increments); with scholar:528804ad2e55983cf3e5e6659d8f46db0cab02b7 we can now leave indices as is | ||||
* | reduce: add case | Martin Czygan | 2021-07-28 | 1 | -0/+1 |
| | |||||
* | reduce: add more logging, temporarily | Martin Czygan | 2021-07-27 | 1 | -1/+6 |
| | |||||
* | update docs | Martin Czygan | 2021-07-27 | 1 | -4/+1 |
| | |||||
* | reuse timestamps | Martin Czygan | 2021-07-27 | 1 | -6/+14 |
| | | | | | | | | | | while time.Now is not really slow, thanks to vDSO (cf. https://git.io/J4SOH), it will be even faster to just call it once at the start of the processing; also: https://twitter.com/davidcrawshaw/status/1414243408936280073 > Turns out http://time.Now was taking its usual amount of time on linux, ~50 nanoseconds [...] | ||||
* | reduce: explicitly name magic numbers | Martin Czygan | 2021-07-27 | 1 | -3/+8 |
| | |||||
* | reduce: use pascal case | Martin Czygan | 2021-07-26 | 1 | -2/+2 |
| | |||||
* | reduce: mention upcoming change to indexing | Martin Czygan | 2021-07-26 | 1 | -1/+1 |
| | | | | see: scholar:528804ad2e55983cf3e5e6659d8f46db0cab02b7 | ||||
* | skate: pass-through match_provenance in more situations | Bryan Newbold | 2021-07-25 | 1 | -0/+2 |
| | |||||
* | schema: switch from '.name' to '.raw_name' for un-parsed CSL name field | Bryan Newbold | 2021-07-25 | 1 | -2/+2 |
| | |||||
* | skate: use date-parts for year, not 'raw' | Bryan Newbold | 2021-07-25 | 1 | -6/+7 |
| | |||||
* | schema: have issued+accessed (CSLDate) actually omitempty | Bryan Newbold | 2021-07-24 | 1 | -1/+1 |
| | | | | | Similar to TargetCSL, these should be pointer types so they don't get encoded as empty objects when not set. | ||||
* | xio: improve naming | Martin Czygan | 2021-07-21 | 1 | -7/+7 |
| | |||||
* | reduce: use fixed length sha1 for url id part | Martin Czygan | 2021-07-20 | 1 | -3/+5 |
| | | | | | base32 would occassionally exceed elasticsearch id field limit ("must be no longer than 512 bytes but was: 649") | ||||
* | reduce: fix wb id | Martin Czygan | 2021-07-20 | 1 | -1/+1 |
| | |||||
* | reduce: a preliminary id for wb links | Martin Czygan | 2021-07-20 | 1 | -0/+5 |
| | |||||
* | reduce: temp fix 0 source release year | Martin Czygan | 2021-07-19 | 1 | -1/+4 |
| | |||||
* | add ZippyWayback reducer | Martin Czygan | 2021-07-15 | 1 | -1/+59 |
| | |||||
* | update docs | Martin Czygan | 2021-07-14 | 1 | -8/+7 |
| | |||||
* | reduce: add test | Martin Czygan | 2021-07-14 | 1 | -18/+21 |
| | |||||
* | reduce: add todo | Martin Czygan | 2021-07-14 | 1 | -0/+2 |
| | |||||
* | reduce: add csl field | Martin Czygan | 2021-07-14 | 1 | -3/+32 |
| | |||||
* | reduce: fix off-by-one error | Martin Czygan | 2021-07-14 | 1 | -1/+1 |
| | | | | duplication detection required a +1 on the index in the ref document | ||||
* | reduce: temp bug fix for line cutter | Martin Czygan | 2021-07-13 | 1 | -1/+5 |
| | | | | | | | | we wanted to trim whitespace at one point, because values contained the separator values; however this breaks with empty values; move back to not trimming values except for the newline, when requesting the last value; moving forward, we need to clean or reject dirty values or use a different delimiter | ||||
* | reduce: small tweaks | Martin Czygan | 2021-07-13 | 1 | -3/+4 |
| | |||||
* | wip: csl logging | Martin Czygan | 2021-07-13 | 1 | -1/+1 |
| | |||||
* | update docs | Martin Czygan | 2021-07-13 | 1 | -1/+7 |
| | |||||
* | reduce/schema: add csl | Martin Czygan | 2021-07-13 | 1 | -1/+7 |
| | |||||
* | wiki: include lang in encoded page title | Martin Czygan | 2021-07-13 | 1 | -7/+12 |
| | |||||
* | reduce: add todo | Martin Czygan | 2021-07-13 | 1 | -1/+3 |
| | |||||
* | mock out time.Now for tests | Martin Czygan | 2021-07-13 | 1 | -3/+6 |
| | |||||
* | reduce: log broken line only | Martin Czygan | 2021-07-10 | 1 | -1/+1 |
| | |||||
* | reduce: add key and indexed ts for exact matches | Martin Czygan | 2021-07-10 | 1 | -0/+2 |
| | |||||
* | reduce: ol, fuzzy, w/ unstructured | Martin Czygan | 2021-07-10 | 1 | -1/+1 |
| | |||||
* | release to unstructured stub | Martin Czygan | 2021-07-10 | 1 | -2/+2 |
| | |||||
* | reduce: open library id tweaks | Martin Czygan | 2021-07-10 | 1 | -5/+27 |
| | |||||
* | reduce: tweak wiki bref | Martin Czygan | 2021-07-10 | 1 | -4/+5 |
| | |||||
* | reduce: filter out duplicate wiki links | Martin Czygan | 2021-07-10 | 1 | -0/+8 |
| | |||||
* | wiki: use lowercase base32 of page title | Martin Czygan | 2021-07-09 | 1 | -2/+3 |
| | | | | * mostly case insensitive, same case as ident | ||||
* | reduce: use a base64 encoded title as key | Martin Czygan | 2021-07-09 | 1 | -1/+7 |
| | |||||
* | reduce: wiki doc in column 3 | Martin Czygan | 2021-07-09 | 1 | -1/+1 |
| | |||||
* | reduce: move batch size | Martin Czygan | 2021-07-09 | 1 | -8/+6 |
| | |||||
* | reduce: set default batch size | Martin Czygan | 2021-07-08 | 1 | -6/+8 |
| | |||||
* | simplify imports | Martin Czygan | 2021-07-08 | 1 | -1/+1 |
| | |||||
* | reduce: separate batch calls | Martin Czygan | 2021-07-08 | 1 | -18/+18 |
| | |||||
* | reduce: remove log line | Martin Czygan | 2021-07-06 | 1 | -1/+0 |
| |