Commit message (Collapse) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
* | skate: use SanitizeDOI in all inputs | Bryan Newbold | 2021-07-25 | 4 | -22/+9 | |
| | ||||||
* | skate: fast SanitizeDOI helper for normalizing DOIs | Bryan Newbold | 2021-07-25 | 2 | -0/+71 | |
| | ||||||
* | skate unstructured: don't parse DOI out of key | Bryan Newbold | 2021-07-25 | 1 | -16/+0 | |
| | | | | | | DOIs in keys, usually from Crossref, are the DOI of the *source* of the reference, not the *target* of the reference. Thus, they should not be parsed and copied to the ref.biblio.doi field. | |||||
* | skate: pass-through match_provenance in more situations | Bryan Newbold | 2021-07-25 | 1 | -0/+2 | |
| | ||||||
* | schema: switch from '.name' to '.raw_name' for un-parsed CSL name field | Bryan Newbold | 2021-07-25 | 3 | -6/+6 | |
| | ||||||
* | skate: use date-parts for year, not 'raw' | Bryan Newbold | 2021-07-25 | 2 | -8/+9 | |
| | ||||||
* | schema: have issued+accessed (CSLDate) actually omitempty | Bryan Newbold | 2021-07-24 | 3 | -5/+5 | |
| | | | | | Similar to TargetCSL, these should be pointer types so they don't get encoded as empty objects when not set. | |||||
* | add test for issued,accessed not being included in output JSON | Bryan Newbold | 2021-07-24 | 1 | -0/+17 | |
| | ||||||
* | fix typo in ref schema | Martin Czygan | 2021-07-23 | 1 | -1/+1 | |
| | ||||||
* | v0.1.40 | Martin Czygan | 2021-07-22 | 1 | -1/+1 | |
| | ||||||
* | cleanup (old) clustering related code | Martin Czygan | 2021-07-22 | 3 | -177/+39 | |
| | ||||||
* | minor doc fixes | Martin Czygan | 2021-07-21 | 2 | -4/+7 | |
| | ||||||
* | xio: improve naming | Martin Czygan | 2021-07-21 | 3 | -33/+30 | |
| | ||||||
* | reduce: use fixed length sha1 for url id part | Martin Czygan | 2021-07-20 | 1 | -3/+5 | |
| | | | | | base32 would occassionally exceed elasticsearch id field limit ("must be no longer than 512 bytes but was: 649") | |||||
* | reduce: fix wb id | Martin Czygan | 2021-07-20 | 1 | -1/+1 | |
| | ||||||
* | reduce: a preliminary id for wb links | Martin Czygan | 2021-07-20 | 1 | -0/+5 | |
| | ||||||
* | reduce: temp fix 0 source release year | Martin Czygan | 2021-07-19 | 1 | -1/+4 | |
| | ||||||
* | cleanup another script | Martin Czygan | 2021-07-17 | 5 | -311/+72 | |
| | ||||||
* | cleanup skate-bref-id | Martin Czygan | 2021-07-17 | 2 | -42/+1 | |
| | ||||||
* | reduce: use correct reducer | Martin Czygan | 2021-07-15 | 1 | -2/+2 | |
| | ||||||
* | register reducer | Martin Czygan | 2021-07-15 | 1 | -0/+14 | |
| | ||||||
* | add ZippyWayback reducer | Martin Czygan | 2021-07-15 | 3 | -54/+114 | |
| | ||||||
* | mapper: add cdxu | Martin Czygan | 2021-07-15 | 2 | -0/+22 | |
| | ||||||
* | map: add another mapper | Martin Czygan | 2021-07-15 | 2 | -3/+17 | |
| | ||||||
* | update docs | Martin Czygan | 2021-07-14 | 2 | -11/+11 | |
| | ||||||
* | reduce: add test | Martin Czygan | 2021-07-14 | 2 | -18/+41 | |
| | ||||||
* | reduce: add todo | Martin Czygan | 2021-07-14 | 1 | -0/+2 | |
| | ||||||
* | v0.1.39 | Martin Czygan | 2021-07-14 | 1 | -1/+1 | |
| | ||||||
* | reduce: add csl field | Martin Czygan | 2021-07-14 | 4 | -8/+72 | |
| | ||||||
* | reduce: fix off-by-one error | Martin Czygan | 2021-07-14 | 2 | -2/+2 | |
| | | | | duplication detection required a +1 on the index in the ref document | |||||
* | reduce: temp bug fix for line cutter | Martin Czygan | 2021-07-13 | 2 | -32/+61 | |
| | | | | | | | | we wanted to trim whitespace at one point, because values contained the separator values; however this breaks with empty values; move back to not trimming values except for the newline, when requesting the last value; moving forward, we need to clean or reject dirty values or use a different delimiter | |||||
* | v0.1.38 | Martin Czygan | 2021-07-13 | 1 | -1/+1 | |
| | ||||||
* | reduce: small tweaks | Martin Czygan | 2021-07-13 | 2 | -6/+7 | |
| | ||||||
* | fix typo | Martin Czygan | 2021-07-13 | 1 | -1/+1 | |
| | ||||||
* | wip: csl logging | Martin Czygan | 2021-07-13 | 1 | -1/+1 | |
| | ||||||
* | update docs | Martin Czygan | 2021-07-13 | 1 | -1/+7 | |
| | ||||||
* | reduce/schema: add csl | Martin Czygan | 2021-07-13 | 3 | -5/+70 | |
| | ||||||
* | wiki: include lang in encoded page title | Martin Czygan | 2021-07-13 | 2 | -8/+18 | |
| | ||||||
* | reduce: add todo | Martin Czygan | 2021-07-13 | 1 | -1/+3 | |
| | ||||||
* | separate slugify functions | Martin Czygan | 2021-07-13 | 4 | -28/+39 | |
| | ||||||
* | mock out time.Now for tests | Martin Czygan | 2021-07-13 | 4 | -1034/+1041 | |
| | ||||||
* | reduce: log broken line only | Martin Czygan | 2021-07-10 | 1 | -1/+1 | |
| | ||||||
* | reduce: add key and indexed ts for exact matches | Martin Czygan | 2021-07-10 | 1 | -0/+2 | |
| | ||||||
* | batch: drop logging | Martin Czygan | 2021-07-10 | 1 | -4/+0 | |
| | ||||||
* | batch: log batch size | Martin Czygan | 2021-07-10 | 1 | -1/+1 | |
| | ||||||
* | reduce: short circuit large groups | Martin Czygan | 2021-07-10 | 1 | -2/+12 | |
| | | | | | | | | we saw a jump in memory usage, and it may be related to groups with thousands of elements; e.g. maybe some weird string, that appears too many times as key, e.g. 123/test; as a first measure, we sort circuit further batching; other mitigiation may to be limit groups size completely | |||||
* | schema: prefer isbn13 | Martin Czygan | 2021-07-10 | 1 | -1/+5 | |
| | ||||||
* | schema: render isbn as well | Martin Czygan | 2021-07-10 | 1 | -1/+7 | |
| | ||||||
* | reduce: ol, fuzzy, w/ unstructured | Martin Czygan | 2021-07-10 | 1 | -1/+1 | |
| | ||||||
* | schema: add test | Martin Czygan | 2021-07-10 | 2 | -0/+20 | |
| |