aboutsummaryrefslogtreecommitdiffstats
path: root/tests
Commit message (Collapse)AuthorAgeFilesLines
* transform: more string cleaningBryan Newbold2020-08-121-1/+19
|
* scrub_text: single-token strings skippedBryan Newbold2020-08-061-1/+1
|
* start some annotaition fixes for pytypeBryan Newbold2020-06-031-1/+1
|
* flake8-annotation lintingBryan Newbold2020-06-033-4/+4
| | | | Added some new annotations; need to finish more.
* flake8 fixes (partial)Bryan Newbold2020-06-032-3/+0
|
* reformat python code with blackBryan Newbold2020-06-033-13/+19
|
* improve text scrubbingBryan Newbold2020-06-031-0/+15
| | | | | | | | | | Was going to use textpipe, but dependency was too large and failed to install with halfway modern GCC (due to CLD2 issue): https://github.com/GregBowyer/cld2-cffi/issues/12 So instead basically pulled out the clean_text function, which is quite short.
* first pass transform from pipelines to ES schemaBryan Newbold2020-05-201-1/+1
|
* initial progress on work pipelineBryan Newbold2020-05-161-2/+2
|
* crude djvu XML parsingBryan Newbold2020-05-162-0/+5158
|
* basic biblio converterBryan Newbold2020-05-161-1/+10
|
* start implementing ES transform helpersBryan Newbold2020-05-142-0/+20