Commit message (Collapse) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
* | collapse pages by SIM issue | Bryan Newbold | 2020-06-04 | 1 | -0/+1 | |
| | ||||||
* | fmt | Bryan Newbold | 2020-06-04 | 1 | -0/+2 | |
| | ||||||
* | start some annotaition fixes for pytype | Bryan Newbold | 2020-06-03 | 1 | -1/+3 | |
| | ||||||
* | more flake8 | Bryan Newbold | 2020-06-03 | 1 | -1/+1 | |
| | ||||||
* | flake8 fixes (partial) | Bryan Newbold | 2020-06-03 | 1 | -1/+1 | |
| | ||||||
* | reformat python code with black | Bryan Newbold | 2020-06-03 | 1 | -38/+64 | |
| | ||||||
* | improve text scrubbing | Bryan Newbold | 2020-06-03 | 1 | -13/+21 | |
| | | | | | | | | | | Was going to use textpipe, but dependency was too large and failed to install with halfway modern GCC (due to CLD2 issue): https://github.com/GregBowyer/cld2-cffi/issues/12 So instead basically pulled out the clean_text function, which is quite short. | |||||
* | add prefix scrubing (esp. for abstracts) | Bryan Newbold | 2020-05-21 | 1 | -0/+18 | |
| | ||||||
* | use beautiful soup for XML scrubing | Bryan Newbold | 2020-05-21 | 1 | -7/+6 | |
| | ||||||
* | be more inclusive of author names | Bryan Newbold | 2020-05-21 | 1 | -4/+4 | |
| | ||||||
* | fixes from manual testing | Bryan Newbold | 2020-05-20 | 1 | -7/+11 | |
| | ||||||
* | first pass transform from pipelines to ES schema | Bryan Newbold | 2020-05-20 | 1 | -0/+334 | |