Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | minimum viable tests for GROBID XML parsing and refs transform | Bryan Newbold | 2020-09-14 | 3 | -0/+535 |
| | |||||
* | another clean_str() test case | Bryan Newbold | 2020-08-12 | 1 | -0/+4 |
| | |||||
* | transform: more string cleaning | Bryan Newbold | 2020-08-12 | 1 | -1/+19 |
| | |||||
* | scrub_text: single-token strings skipped | Bryan Newbold | 2020-08-06 | 1 | -1/+1 |
| | |||||
* | start some annotaition fixes for pytype | Bryan Newbold | 2020-06-03 | 1 | -1/+1 |
| | |||||
* | flake8-annotation linting | Bryan Newbold | 2020-06-03 | 3 | -4/+4 |
| | | | | Added some new annotations; need to finish more. | ||||
* | flake8 fixes (partial) | Bryan Newbold | 2020-06-03 | 2 | -3/+0 |
| | |||||
* | reformat python code with black | Bryan Newbold | 2020-06-03 | 3 | -13/+19 |
| | |||||
* | improve text scrubbing | Bryan Newbold | 2020-06-03 | 1 | -0/+15 |
| | | | | | | | | | | Was going to use textpipe, but dependency was too large and failed to install with halfway modern GCC (due to CLD2 issue): https://github.com/GregBowyer/cld2-cffi/issues/12 So instead basically pulled out the clean_text function, which is quite short. | ||||
* | first pass transform from pipelines to ES schema | Bryan Newbold | 2020-05-20 | 1 | -1/+1 |
| | |||||
* | initial progress on work pipeline | Bryan Newbold | 2020-05-16 | 1 | -2/+2 |
| | |||||
* | crude djvu XML parsing | Bryan Newbold | 2020-05-16 | 2 | -0/+5158 |
| | |||||
* | basic biblio converter | Bryan Newbold | 2020-05-16 | 1 | -1/+10 |
| | |||||
* | start implementing ES transform helpers | Bryan Newbold | 2020-05-14 | 2 | -0/+20 |