aboutsummaryrefslogtreecommitdiffstats
path: root/skate
Commit message (Collapse)AuthorAgeFilesLines
* update dependenciesMartin Czygan2021-07-272-6/+0
|
* remove unused/partially implemented skate-dot for nowMartin Czygan2021-07-272-75/+1
|
* minor tweaks and doc improvementsMartin Czygan2021-07-272-49/+45
|
* update todo notesMartin Czygan2021-07-271-2/+22
|
* reuse timestampsMartin Czygan2021-07-271-6/+14
| | | | | | | | | | while time.Now is not really slow, thanks to vDSO (cf. https://git.io/J4SOH), it will be even faster to just call it once at the start of the processing; also: https://twitter.com/davidcrawshaw/status/1414243408936280073 > Turns out http://time.Now was taking its usual amount of time on linux, ~50 nanoseconds [...]
* reduce: explicitly name magic numbersMartin Czygan2021-07-271-3/+8
|
* schema: add note regarding field nameMartin Czygan2021-07-271-1/+1
|
* v0.2.1Martin Czygan2021-07-271-1/+1
|
* schema: tweaksMartin Czygan2021-07-272-5/+112
| | | | add String() to CSLDate; we only cover a few typical cases
* cleanup and docsMartin Czygan2021-07-271-25/+4
|
* reduce: use pascal caseMartin Czygan2021-07-261-2/+2
|
* v0.2.0Martin Czygan2021-07-261-1/+1
| | | | | | | | | | | | lots of tweaks * normalize (eg, lower-case) DOIs in all (or at least most?) situations, especially for equality comparisons * don't try to parse DOI from ref_key (which may contain a source DOI, but not a target DOI) * switch to using date-parts for year in target_csl output * switch from author.name to author.raw_name in target_csl output (neither are standard; raw_name indicates this better) * pass through match_provenance in unmatched case * in target_csl output, don't always include issued and accessed dates as * empty objects (could save significant ES index disk space?)
* switch to sligthly more performance string builderMartin Czygan2021-07-263-43/+41
|
* reduce: mention upcoming change to indexingMartin Czygan2021-07-261-1/+1
| | | | see: scholar:528804ad2e55983cf3e5e6659d8f46db0cab02b7
* skate: use SanitizeDOI in all inputsBryan Newbold2021-07-254-22/+9
|
* skate: fast SanitizeDOI helper for normalizing DOIsBryan Newbold2021-07-252-0/+71
|
* skate unstructured: don't parse DOI out of keyBryan Newbold2021-07-251-16/+0
| | | | | | DOIs in keys, usually from Crossref, are the DOI of the *source* of the reference, not the *target* of the reference. Thus, they should not be parsed and copied to the ref.biblio.doi field.
* skate: pass-through match_provenance in more situationsBryan Newbold2021-07-251-0/+2
|
* schema: switch from '.name' to '.raw_name' for un-parsed CSL name fieldBryan Newbold2021-07-253-6/+6
|
* skate: use date-parts for year, not 'raw'Bryan Newbold2021-07-252-8/+9
|
* schema: have issued+accessed (CSLDate) actually omitemptyBryan Newbold2021-07-243-5/+5
| | | | | Similar to TargetCSL, these should be pointer types so they don't get encoded as empty objects when not set.
* add test for issued,accessed not being included in output JSONBryan Newbold2021-07-241-0/+17
|
* fix typo in ref schemaMartin Czygan2021-07-231-1/+1
|
* v0.1.40Martin Czygan2021-07-221-1/+1
|
* cleanup (old) clustering related codeMartin Czygan2021-07-223-177/+39
|
* minor doc fixesMartin Czygan2021-07-212-4/+7
|
* xio: improve namingMartin Czygan2021-07-213-33/+30
|
* reduce: use fixed length sha1 for url id partMartin Czygan2021-07-201-3/+5
| | | | | base32 would occassionally exceed elasticsearch id field limit ("must be no longer than 512 bytes but was: 649")
* reduce: fix wb idMartin Czygan2021-07-201-1/+1
|
* reduce: a preliminary id for wb linksMartin Czygan2021-07-201-0/+5
|
* reduce: temp fix 0 source release yearMartin Czygan2021-07-191-1/+4
|
* cleanup another scriptMartin Czygan2021-07-175-311/+72
|
* cleanup skate-bref-idMartin Czygan2021-07-172-42/+1
|
* reduce: use correct reducerMartin Czygan2021-07-151-2/+2
|
* register reducerMartin Czygan2021-07-151-0/+14
|
* add ZippyWayback reducerMartin Czygan2021-07-153-54/+114
|
* mapper: add cdxuMartin Czygan2021-07-152-0/+22
|
* map: add another mapperMartin Czygan2021-07-152-3/+17
|
* update docsMartin Czygan2021-07-142-11/+11
|
* reduce: add testMartin Czygan2021-07-142-18/+41
|
* reduce: add todoMartin Czygan2021-07-141-0/+2
|
* v0.1.39Martin Czygan2021-07-141-1/+1
|
* reduce: add csl fieldMartin Czygan2021-07-144-8/+72
|
* reduce: fix off-by-one errorMartin Czygan2021-07-142-2/+2
| | | | duplication detection required a +1 on the index in the ref document
* reduce: temp bug fix for line cutterMartin Czygan2021-07-132-32/+61
| | | | | | | | we wanted to trim whitespace at one point, because values contained the separator values; however this breaks with empty values; move back to not trimming values except for the newline, when requesting the last value; moving forward, we need to clean or reject dirty values or use a different delimiter
* v0.1.38Martin Czygan2021-07-131-1/+1
|
* reduce: small tweaksMartin Czygan2021-07-132-6/+7
|
* fix typoMartin Czygan2021-07-131-1/+1
|
* wip: csl loggingMartin Czygan2021-07-131-1/+1
|
* update docsMartin Czygan2021-07-131-1/+7
|