Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | improve text scrubbing | Bryan Newbold | 2020-06-03 | 1 | -0/+15 |
| | | | | | | | | | | Was going to use textpipe, but dependency was too large and failed to install with halfway modern GCC (due to CLD2 issue): https://github.com/GregBowyer/cld2-cffi/issues/12 So instead basically pulled out the clean_text function, which is quite short. | ||||
* | first pass transform from pipelines to ES schema | Bryan Newbold | 2020-05-20 | 1 | -1/+1 |
| | |||||
* | initial progress on work pipeline | Bryan Newbold | 2020-05-16 | 1 | -2/+2 |
| | |||||
* | crude djvu XML parsing | Bryan Newbold | 2020-05-16 | 2 | -0/+5158 |
| | |||||
* | basic biblio converter | Bryan Newbold | 2020-05-16 | 1 | -1/+10 |
| | |||||
* | start implementing ES transform helpers | Bryan Newbold | 2020-05-14 | 2 | -0/+20 |