Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | refs transform: both GROBID and fatcat refs | Bryan Newbold | 2020-09-13 | 1 | -5/+12 |
| | |||||
* | ref transform: support more GROBID fields | Bryan Newbold | 2020-09-13 | 1 | -10/+16 |
| | |||||
* | fixes to refs transform (for non-str author fields) | Bryan Newbold | 2020-09-04 | 1 | -2/+6 |
| | |||||
* | heavy to refs command | Bryan Newbold | 2020-09-04 | 1 | -2/+142 |
| | |||||
* | use simple names, not domain names, for some platforms | Bryan Newbold | 2020-08-12 | 1 | -3/+3 |
| | |||||
* | biblio metadata hacks at transform time | Bryan Newbold | 2020-08-12 | 1 | -2/+98 |
| | |||||
* | don't index sim_page without issue_item and first_page | Bryan Newbold | 2020-08-06 | 1 | -0/+3 |
| | |||||
* | handle integer conversion and bounding for ES schema | Bryan Newbold | 2020-08-06 | 1 | -10/+13 |
| | |||||
* | json: exclude None in output, and sort keys | Bryan Newbold | 2020-07-27 | 1 | -1/+1 |
| | | | | | | | | | | These are both size/performance enhancements. Not including 'None' values will reduce document sizes on-disk and over network, particularly for intermediate objects. Sorting by key should improve compression ratios across multiple documents, both on-disk (gzip) and in elasticsearch itself: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html#_put_fields_in_the_same_order_in_documents | ||||
* | ensure SIM release date parses before assigning | Bryan Newbold | 2020-07-21 | 1 | -1/+6 |
| | |||||
* | make fmt | Bryan Newbold | 2020-06-29 | 1 | -8/+13 |
| | |||||
* | include GROBID-extracted abstracts in search documents | Bryan Newbold | 2020-06-29 | 1 | -10/+15 |
| | |||||
* | small improvements to SIM metadata maps | Bryan Newbold | 2020-06-29 | 1 | -6/+11 |
| | |||||
* | fixes for pdf_meta dict | Bryan Newbold | 2020-06-29 | 1 | -1/+2 |
| | |||||
* | remove old COVID19 thumbnail hack | Bryan Newbold | 2020-06-29 | 1 | -1/+2 |
| | |||||
* | fetch pdftotext and pdf_meta from blobs, postgrest | Bryan Newbold | 2020-06-29 | 1 | -21/+13 |
| | | | | | This replaces the temporary COVID-19 content hack with production content (text, thumbnail URLs) stored in postgrest and seaweedfs. | ||||
* | collapse pages by SIM issue | Bryan Newbold | 2020-06-04 | 1 | -0/+3 |
| | |||||
* | flake8-annotation linting | Bryan Newbold | 2020-06-03 | 1 | -3/+3 |
| | | | | Added some new annotations; need to finish more. | ||||
* | flake8 fixes (partial) | Bryan Newbold | 2020-06-03 | 1 | -11/+2 |
| | |||||
* | reformat python code with black | Bryan Newbold | 2020-06-03 | 1 | -109/+158 |
| | |||||
* | fixes from running pipeline | Bryan Newbold | 2020-06-03 | 1 | -1/+2 |
| | | | | Not caught by mypi/lint? Hrm. | ||||
* | compute and use tags | Bryan Newbold | 2020-06-03 | 1 | -0/+41 |
| | |||||
* | fixes from manual testing | Bryan Newbold | 2020-05-20 | 1 | -5/+4 |
| | |||||
* | fixes to release+sim pipeline | Bryan Newbold | 2020-05-20 | 1 | -1/+2 |
| | |||||
* | indexing tweaks | Bryan Newbold | 2020-05-20 | 1 | -3/+4 |
| | |||||
* | first pass transform from pipelines to ES schema | Bryan Newbold | 2020-05-20 | 1 | -0/+306 |