aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* add missing importMartin Czygan2021-07-221-0/+1
|
* add luigi as dependencyMartin Czygan2021-07-221-0/+1
|
* remove reference to gluishMartin Czygan2021-07-221-1/+1
|
* cleanup currently unused dependenciesMartin Czygan2021-07-224-18/+344
| | | | code from gluish copied into base.py
* v0.1.40Martin Czygan2021-07-221-1/+1
|
* cleanup (old) clustering related codeMartin Czygan2021-07-223-177/+39
|
* minor doc fixesMartin Czygan2021-07-212-4/+7
|
* xio: improve namingMartin Czygan2021-07-213-33/+30
|
* reduce: use fixed length sha1 for url id partMartin Czygan2021-07-201-3/+5
| | | | | base32 would occassionally exceed elasticsearch id field limit ("must be no longer than 512 bytes but was: 649")
* tasks: increase default limit for cdxMartin Czygan2021-07-201-1/+1
|
* reduce: fix wb idMartin Czygan2021-07-201-1/+1
|
* reduce: a preliminary id for wb linksMartin Czygan2021-07-201-0/+5
|
* es indexing: update notesMartin Czygan2021-07-201-1/+2
|
* reduce: temp fix 0 source release yearMartin Czygan2021-07-191-1/+4
|
* update notes on es indexingMartin Czygan2021-07-191-0/+11
|
* cleanup another scriptMartin Czygan2021-07-175-311/+72
|
* cleanup skate-bref-idMartin Czygan2021-07-172-42/+1
|
* update indexing notesMartin Czygan2021-07-171-0/+38
|
* tasks: add data pointMartin Czygan2021-07-161-2/+3
|
* reduce: use correct reducerMartin Czygan2021-07-151-2/+2
|
* tasks: ignore exit code 141 for nowMartin Czygan2021-07-151-1/+1
|
* tasks: add BrefZipWaybackMartin Czygan2021-07-151-0/+20
|
* register reducerMartin Czygan2021-07-151-0/+14
|
* add ZippyWayback reducerMartin Czygan2021-07-153-54/+114
|
* tasks: reduce sample sizeMartin Czygan2021-07-151-1/+1
|
* tasks: tweak CDXURLMartin Czygan2021-07-151-2/+2
|
* tasks: fix commandMartin Czygan2021-07-151-0/+1
|
* tasks: tweak CDXURLMartin Czygan2021-07-151-3/+5
|
* tasks: add CDXURLMartin Czygan2021-07-151-0/+28
|
* mapper: add cdxuMartin Czygan2021-07-152-0/+22
|
* tasks: cleanup urlsMartin Czygan2021-07-151-0/+1
|
* notes: add unique exampleMartin Czygan2021-07-151-1/+1
|
* tasks: add RefsURLMartin Czygan2021-07-151-0/+26
|
* map: add another mapperMartin Czygan2021-07-152-3/+17
|
* cdx reshape: only include hitsMartin Czygan2021-07-151-2/+1
|
* cdx reshape: write jsonMartin Czygan2021-07-151-2/+2
|
* extra: cdx reshapeMartin Czygan2021-07-151-0/+19
|
* update notesMartin Czygan2021-07-151-0/+9
|
* update docsMartin Czygan2021-07-142-11/+11
|
* reduce: add testMartin Czygan2021-07-142-18/+41
|
* notes: 2021-07-06 versionMartin Czygan2021-07-141-0/+38
|
* tasks: update docsMartin Czygan2021-07-141-0/+2
|
* tasks: add performance noteMartin Czygan2021-07-141-0/+7
|
* reduce: add todoMartin Czygan2021-07-141-0/+2
|
* v0.1.39Martin Czygan2021-07-141-1/+1
|
* reduce: add csl fieldMartin Czygan2021-07-144-8/+72
|
* reduce: fix off-by-one errorMartin Czygan2021-07-142-2/+2
| | | | duplication detection required a +1 on the index in the ref document
* tasks: only include docs with a work idMartin Czygan2021-07-141-4/+2
|
* reduce: temp bug fix for line cutterMartin Czygan2021-07-132-32/+61
| | | | | | | | we wanted to trim whitespace at one point, because values contained the separator values; however this breaks with empty values; move back to not trimming values except for the newline, when requesting the last value; moving forward, we need to clean or reject dirty values or use a different delimiter
* v0.1.38Martin Czygan2021-07-131-1/+1
|