Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | reformat python code with black | Bryan Newbold | 2020-06-03 | 1 | -38/+64 |
| | |||||
* | improve text scrubbing | Bryan Newbold | 2020-06-03 | 1 | -13/+21 |
| | | | | | | | | | | Was going to use textpipe, but dependency was too large and failed to install with halfway modern GCC (due to CLD2 issue): https://github.com/GregBowyer/cld2-cffi/issues/12 So instead basically pulled out the clean_text function, which is quite short. | ||||
* | add prefix scrubing (esp. for abstracts) | Bryan Newbold | 2020-05-21 | 1 | -0/+18 |
| | |||||
* | use beautiful soup for XML scrubing | Bryan Newbold | 2020-05-21 | 1 | -7/+6 |
| | |||||
* | be more inclusive of author names | Bryan Newbold | 2020-05-21 | 1 | -4/+4 |
| | |||||
* | fixes from manual testing | Bryan Newbold | 2020-05-20 | 1 | -7/+11 |
| | |||||
* | first pass transform from pipelines to ES schema | Bryan Newbold | 2020-05-20 | 1 | -0/+334 |