Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | update todo notes | Martin Czygan | 2021-07-27 | 1 | -2/+22 |
| | |||||
* | reuse timestamps | Martin Czygan | 2021-07-27 | 1 | -6/+14 |
| | | | | | | | | | | while time.Now is not really slow, thanks to vDSO (cf. https://git.io/J4SOH), it will be even faster to just call it once at the start of the processing; also: https://twitter.com/davidcrawshaw/status/1414243408936280073 > Turns out http://time.Now was taking its usual amount of time on linux, ~50 nanoseconds [...] | ||||
* | reduce: explicitly name magic numbers | Martin Czygan | 2021-07-27 | 1 | -3/+8 |
| | |||||
* | schema: add note regarding field name | Martin Czygan | 2021-07-27 | 1 | -1/+1 |
| | |||||
* | v0.2.1 | Martin Czygan | 2021-07-27 | 1 | -1/+1 |
| | |||||
* | schema: tweaks | Martin Czygan | 2021-07-27 | 2 | -5/+112 |
| | | | | add String() to CSLDate; we only cover a few typical cases | ||||
* | cleanup and docs | Martin Czygan | 2021-07-27 | 1 | -25/+4 |
| | |||||
* | reduce: use pascal case | Martin Czygan | 2021-07-26 | 1 | -2/+2 |
| | |||||
* | v0.2.0 | Martin Czygan | 2021-07-26 | 1 | -1/+1 |
| | | | | | | | | | | | | lots of tweaks * normalize (eg, lower-case) DOIs in all (or at least most?) situations, especially for equality comparisons * don't try to parse DOI from ref_key (which may contain a source DOI, but not a target DOI) * switch to using date-parts for year in target_csl output * switch from author.name to author.raw_name in target_csl output (neither are standard; raw_name indicates this better) * pass through match_provenance in unmatched case * in target_csl output, don't always include issued and accessed dates as * empty objects (could save significant ES index disk space?) | ||||
* | switch to sligthly more performance string builder | Martin Czygan | 2021-07-26 | 3 | -43/+41 |
| | |||||
* | reduce: mention upcoming change to indexing | Martin Czygan | 2021-07-26 | 1 | -1/+1 |
| | | | | see: scholar:528804ad2e55983cf3e5e6659d8f46db0cab02b7 | ||||
* | Merge branch 'bnewbold-skate-tweaks' into 'master' | Martin Czygan | 2021-07-26 | 9 | -54/+116 |
|\ | | | | | | | | | proposed changes and fixes to skate matching See merge request martin/cgraph!3 | ||||
| * | skate: use SanitizeDOI in all inputs | Bryan Newbold | 2021-07-25 | 4 | -22/+9 |
| | | |||||
| * | skate: fast SanitizeDOI helper for normalizing DOIs | Bryan Newbold | 2021-07-25 | 2 | -0/+71 |
| | | |||||
| * | skate unstructured: don't parse DOI out of key | Bryan Newbold | 2021-07-25 | 1 | -16/+0 |
| | | | | | | | | | | | | DOIs in keys, usually from Crossref, are the DOI of the *source* of the reference, not the *target* of the reference. Thus, they should not be parsed and copied to the ref.biblio.doi field. | ||||
| * | skate: pass-through match_provenance in more situations | Bryan Newbold | 2021-07-25 | 1 | -0/+2 |
| | | |||||
| * | schema: switch from '.name' to '.raw_name' for un-parsed CSL name field | Bryan Newbold | 2021-07-25 | 3 | -6/+6 |
| | | |||||
| * | skate: use date-parts for year, not 'raw' | Bryan Newbold | 2021-07-25 | 2 | -8/+9 |
| | | |||||
| * | schema: have issued+accessed (CSLDate) actually omitempty | Bryan Newbold | 2021-07-24 | 3 | -5/+5 |
| | | | | | | | | | | Similar to TargetCSL, these should be pointer types so they don't get encoded as empty objects when not set. | ||||
| * | add test for issued,accessed not being included in output JSON | Bryan Newbold | 2021-07-24 | 1 | -0/+17 |
| | | |||||
* | | ci: show coverage | Martin Czygan | 2021-07-26 | 1 | -2/+1 |
| | | |||||
* | | add ci script | Martin Czygan | 2021-07-26 | 1 | -0/+6 |
|/ | |||||
* | tasks: simplify url list task | Martin Czygan | 2021-07-23 | 1 | -4/+1 |
| | |||||
* | tasks: update docs | Martin Czygan | 2021-07-23 | 1 | -64/+15 |
| | |||||
* | fix typo in ref schema | Martin Czygan | 2021-07-23 | 1 | -1/+1 |
| | |||||
* | start mag notes | Martin Czygan | 2021-07-22 | 1 | -0/+21 |
| | |||||
* | v0.1.4 | Martin Czygan | 2021-07-22 | 1 | -1/+1 |
| | |||||
* | update docs | Martin Czygan | 2021-07-22 | 1 | -4/+1 |
| | |||||
* | update makefile | Martin Czygan | 2021-07-22 | 1 | -3/+0 |
| | |||||
* | apply style fixes | Martin Czygan | 2021-07-22 | 2 | -0/+4 |
| | |||||
* | update README | Martin Czygan | 2021-07-22 | 5 | -4/+39 |
| | |||||
* | cli: show full path | Martin Czygan | 2021-07-22 | 1 | -1/+1 |
| | |||||
* | cli: display TAG directory | Martin Czygan | 2021-07-22 | 1 | -0/+1 |
| | |||||
* | add missing import | Martin Czygan | 2021-07-22 | 1 | -0/+1 |
| | |||||
* | add luigi as dependency | Martin Czygan | 2021-07-22 | 1 | -0/+1 |
| | |||||
* | remove reference to gluish | Martin Czygan | 2021-07-22 | 1 | -1/+1 |
| | |||||
* | cleanup currently unused dependencies | Martin Czygan | 2021-07-22 | 4 | -18/+344 |
| | | | | code from gluish copied into base.py | ||||
* | v0.1.40 | Martin Czygan | 2021-07-22 | 1 | -1/+1 |
| | |||||
* | cleanup (old) clustering related code | Martin Czygan | 2021-07-22 | 3 | -177/+39 |
| | |||||
* | minor doc fixes | Martin Czygan | 2021-07-21 | 2 | -4/+7 |
| | |||||
* | xio: improve naming | Martin Czygan | 2021-07-21 | 3 | -33/+30 |
| | |||||
* | reduce: use fixed length sha1 for url id part | Martin Czygan | 2021-07-20 | 1 | -3/+5 |
| | | | | | base32 would occassionally exceed elasticsearch id field limit ("must be no longer than 512 bytes but was: 649") | ||||
* | tasks: increase default limit for cdx | Martin Czygan | 2021-07-20 | 1 | -1/+1 |
| | |||||
* | reduce: fix wb id | Martin Czygan | 2021-07-20 | 1 | -1/+1 |
| | |||||
* | reduce: a preliminary id for wb links | Martin Czygan | 2021-07-20 | 1 | -0/+5 |
| | |||||
* | es indexing: update notes | Martin Czygan | 2021-07-20 | 1 | -1/+2 |
| | |||||
* | reduce: temp fix 0 source release year | Martin Czygan | 2021-07-19 | 1 | -1/+4 |
| | |||||
* | update notes on es indexing | Martin Czygan | 2021-07-19 | 1 | -0/+11 |
| | |||||
* | cleanup another script | Martin Czygan | 2021-07-17 | 5 | -311/+72 |
| | |||||
* | cleanup skate-bref-id | Martin Czygan | 2021-07-17 | 2 | -42/+1 |
| |