aboutsummaryrefslogtreecommitdiffstats
path: root/skate/reduce.go
Commit message (Collapse)AuthorAgeFilesLines
* update docsMartin Czygan2021-07-271-4/+1
|
* reuse timestampsMartin Czygan2021-07-271-6/+14
| | | | | | | | | | while time.Now is not really slow, thanks to vDSO (cf. https://git.io/J4SOH), it will be even faster to just call it once at the start of the processing; also: https://twitter.com/davidcrawshaw/status/1414243408936280073 > Turns out http://time.Now was taking its usual amount of time on linux, ~50 nanoseconds [...]
* reduce: explicitly name magic numbersMartin Czygan2021-07-271-3/+8
|
* reduce: use pascal caseMartin Czygan2021-07-261-2/+2
|
* reduce: mention upcoming change to indexingMartin Czygan2021-07-261-1/+1
| | | | see: scholar:528804ad2e55983cf3e5e6659d8f46db0cab02b7
* skate: pass-through match_provenance in more situationsBryan Newbold2021-07-251-0/+2
|
* schema: switch from '.name' to '.raw_name' for un-parsed CSL name fieldBryan Newbold2021-07-251-2/+2
|
* skate: use date-parts for year, not 'raw'Bryan Newbold2021-07-251-6/+7
|
* schema: have issued+accessed (CSLDate) actually omitemptyBryan Newbold2021-07-241-1/+1
| | | | | Similar to TargetCSL, these should be pointer types so they don't get encoded as empty objects when not set.
* xio: improve namingMartin Czygan2021-07-211-7/+7
|
* reduce: use fixed length sha1 for url id partMartin Czygan2021-07-201-3/+5
| | | | | base32 would occassionally exceed elasticsearch id field limit ("must be no longer than 512 bytes but was: 649")
* reduce: fix wb idMartin Czygan2021-07-201-1/+1
|
* reduce: a preliminary id for wb linksMartin Czygan2021-07-201-0/+5
|
* reduce: temp fix 0 source release yearMartin Czygan2021-07-191-1/+4
|
* add ZippyWayback reducerMartin Czygan2021-07-151-1/+59
|
* update docsMartin Czygan2021-07-141-8/+7
|
* reduce: add testMartin Czygan2021-07-141-18/+21
|
* reduce: add todoMartin Czygan2021-07-141-0/+2
|
* reduce: add csl fieldMartin Czygan2021-07-141-3/+32
|
* reduce: fix off-by-one errorMartin Czygan2021-07-141-1/+1
| | | | duplication detection required a +1 on the index in the ref document
* reduce: temp bug fix for line cutterMartin Czygan2021-07-131-1/+5
| | | | | | | | we wanted to trim whitespace at one point, because values contained the separator values; however this breaks with empty values; move back to not trimming values except for the newline, when requesting the last value; moving forward, we need to clean or reject dirty values or use a different delimiter
* reduce: small tweaksMartin Czygan2021-07-131-3/+4
|
* wip: csl loggingMartin Czygan2021-07-131-1/+1
|
* update docsMartin Czygan2021-07-131-1/+7
|
* reduce/schema: add cslMartin Czygan2021-07-131-1/+7
|
* wiki: include lang in encoded page titleMartin Czygan2021-07-131-7/+12
|
* reduce: add todoMartin Czygan2021-07-131-1/+3
|
* mock out time.Now for testsMartin Czygan2021-07-131-3/+6
|
* reduce: log broken line onlyMartin Czygan2021-07-101-1/+1
|
* reduce: add key and indexed ts for exact matchesMartin Czygan2021-07-101-0/+2
|
* reduce: ol, fuzzy, w/ unstructuredMartin Czygan2021-07-101-1/+1
|
* release to unstructured stubMartin Czygan2021-07-101-2/+2
|
* reduce: open library id tweaksMartin Czygan2021-07-101-5/+27
|
* reduce: tweak wiki brefMartin Czygan2021-07-101-4/+5
|
* reduce: filter out duplicate wiki linksMartin Czygan2021-07-101-0/+8
|
* wiki: use lowercase base32 of page titleMartin Czygan2021-07-091-2/+3
| | | | * mostly case insensitive, same case as ident
* reduce: use a base64 encoded title as keyMartin Czygan2021-07-091-1/+7
|
* reduce: wiki doc in column 3Martin Czygan2021-07-091-1/+1
|
* reduce: move batch sizeMartin Czygan2021-07-091-8/+6
|
* reduce: set default batch sizeMartin Czygan2021-07-081-6/+8
|
* simplify importsMartin Czygan2021-07-081-1/+1
|
* reduce: separate batch callsMartin Czygan2021-07-081-18/+18
|
* reduce: remove log lineMartin Czygan2021-07-061-1/+0
|
* reduce: move to threaded versionsMartin Czygan2021-07-061-25/+30
|
* add resource usage noteMartin Czygan2021-07-061-1/+1
|
* wip: improve reduce performanceMartin Czygan2021-07-061-50/+8
|
* wip: debug with stdlib jsonMartin Czygan2021-07-051-1/+3
|
* we need a safe encoder, not just a safe writerMartin Czygan2021-07-051-1/+1
|
* reduce: hard-code batch size for testingMartin Czygan2021-07-051-0/+1
|
* test-run: batch reduce processing for performanceMartin Czygan2021-07-051-4/+6
|