Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | deliver_file2disk.py fixes from prod | Bryan Newbold | 2020-05-29 | 1 | -0/+6 |
| | |||||
* | WIP on new makefile to drive pipeline | Bryan Newbold | 2020-05-29 | 1 | -0/+80 |
| | |||||
* | metadata parse: new column titles | Bryan Newbold | 2020-05-29 | 1 | -2/+4 |
| | |||||
* | clarify header of deliver_file2disk.py | Bryan Newbold | 2020-05-13 | 1 | -6/+0 |
| | |||||
* | improve pipeline commands | Bryan Newbold | 2020-05-13 | 2 | -8/+23 |
| | |||||
* | commit old CNKI/wanfang notes | Bryan Newbold | 2020-05-13 | 1 | -4/+8 |
| | |||||
* | small improvements to deliver_file2disk.py | Bryan Newbold | 2020-05-13 | 1 | -0/+4 |
| | |||||
* | deliver_file2disk: handle another wayback error condition | Bryan Newbold | 2020-04-13 | 1 | -0/+2 |
| | |||||
* | pipeline scripts | Bryan Newbold | 2020-04-10 | 2 | -2/+63 |
| | |||||
* | add jupyter notebook and client notes | Bryan Newbold | 2020-04-10 | 2 | -0/+264 |
| | |||||
* | handle ext_ids without _id in release schema | Bryan Newbold | 2020-04-09 | 1 | -4/+7 |
| | |||||
* | SERP: default to publisher DOI, not fatcat landing | Bryan Newbold | 2020-04-09 | 1 | -0/+2 |
| | |||||
* | direct-link to arxiv papers | Bryan Newbold | 2020-04-09 | 1 | -0/+2 |
| | |||||
* | add bs4 and lxml for abstract HTML stripping | Bryan Newbold | 2020-04-09 | 2 | -10/+67 |
| | |||||
* | clean up jinja2 template whitespace in HTML output | Bryan Newbold | 2020-04-09 | 1 | -0/+4 |
| | |||||
* | attempt somewhat more robust abstract cleaning | Bryan Newbold | 2020-04-09 | 1 | -7/+4 |
| | | | | | | Note: there is still a security and robustness issue here in that highlights are marked "safe". Should come up with a better mechanism for escaping/safing. | ||||
* | transform: remove more tags from abstracts | Bryan Newbold | 2020-04-09 | 1 | -1/+1 |
| | |||||
* | transform hacks for new fatcat documents | Bryan Newbold | 2020-04-09 | 1 | -1/+16 |
| | |||||
* | fix webface bug with missing import | Bryan Newbold | 2020-04-09 | 1 | -1/+1 |
| | |||||
* | fix webface bugs with default filter values | Bryan Newbold | 2020-04-09 | 2 | -4/+4 |
| | |||||
* | add dedupe and query-fatcat commands | Bryan Newbold | 2020-04-09 | 3 | -0/+137 |
| | |||||
* | bugfix: handle missing file mimetypes | Bryan Newbold | 2020-04-09 | 2 | -2/+2 |
| | |||||
* | inclusion notes and pipeline update | Bryan Newbold | 2020-04-09 | 2 | -16/+70 |
| | |||||
* | small search tweaks and fixes | Bryan Newbold | 2020-04-08 | 3 | -29/+10 |
| | |||||
* | refactor parse_cord19_csv.py into tool | Bryan Newbold | 2020-04-08 | 3 | -39/+55 |
| | |||||
* | helper bash scripts for PDF derivatives | Bryan Newbold | 2020-04-08 | 2 | -0/+38 |
| | | | | | These are lazy (won't run derive process if output already exists), which makes the process much faster | ||||
* | serp: change stage tag colors | Bryan Newbold | 2020-04-08 | 1 | -2/+4 |
| | | | | | pink isn't very visible. and 'accepted' is reasonable status, so display as brown not red. | ||||
* | parse_cnki_tables: pass file path as arg | Bryan Newbold | 2020-04-08 | 1 | -1/+1 |
| | |||||
* | special-case arxiv/medrxiv/biorxiv container names | Bryan Newbold | 2020-04-08 | 1 | -0/+11 |
| | |||||
* | transform: try to cleanup abstracts | Bryan Newbold | 2020-04-08 | 1 | -3/+31 |
| | |||||
* | tweak desktop style: larger SERP font | Bryan Newbold | 2020-04-08 | 3 | -50/+65 |
| | |||||
* | initial implementation of filters | Bryan Newbold | 2020-04-08 | 3 | -10/+98 |
| | |||||
* | refactor search to use elasticsearch-dsl | Bryan Newbold | 2020-04-08 | 2 | -68/+64 |
| | |||||
* | webface: disable 404, 5xx custom error pages | Bryan Newbold | 2020-04-08 | 1 | -11/+11 |
| | | | | | | | | These break with 'lang_code' errors in our translation setup, I think because in this context we don't yet know what the language is. A work-around might be to manually set the language in the error handler function. | ||||
* | add martin (@miku) as a contributor | Bryan Newbold | 2020-04-08 | 1 | -0/+2 |
| | |||||
* | Merge branch 'martin-i18n-de-updates' into 'master' | bnewbold | 2020-04-08 | 2 | -12/+11 |
|\ | | | | | | | | | i18n: update a few [de] translations See merge request bnewbold/covid19.fatcat.wiki!1 | ||||
| * | i18n: update a few [de] translations | Martin Czygan | 2020-04-08 | 2 | -12/+11 |
|/ | |||||
* | update wanfang fulltext scraping | Bryan Newbold | 2020-04-06 | 1 | -8/+32 |
| | |||||
* | pipeline note tweaks for 2020-04-03 | Bryan Newbold | 2020-04-03 | 1 | -6/+3 |
| | |||||
* | more i18n polish | Bryan Newbold | 2020-04-03 | 3 | -6/+6 |
| | |||||
* | include ia_pdf_url when available | Bryan Newbold | 2020-04-03 | 1 | -0/+4 |
| | |||||
* | wrangle lang_code / error templates a bit more | Bryan Newbold | 2020-04-03 | 2 | -8/+15 |
| | | | | | Still not perfect: a 404 is always in 'en' and all links are then not in the local lang. | ||||
* | fix mangled HTML in translated about+sources | Bryan Newbold | 2020-04-03 | 4 | -40/+40 |
| | |||||
* | README: more deps | Bryan Newbold | 2020-04-03 | 1 | -0/+3 |
| | |||||
* | update gitignore | Bryan Newbold | 2020-04-03 | 1 | -5/+2 |
| | |||||
* | fixes from prod | Bryan Newbold | 2020-04-03 | 4 | -9/+19 |
| | |||||
* | update pipeline commands | Bryan Newbold | 2020-04-03 | 1 | -12/+16 |
| | |||||
* | tool: reorder commands | Bryan Newbold | 2020-04-03 | 1 | -24/+18 |
| | |||||
* | don't hide abstracts in fatcat fetch | Bryan Newbold | 2020-04-03 | 1 | -3/+3 |
| | |||||
* | missing translations on front page | Bryan Newbold | 2020-04-03 | 6 | -24/+61 |
| |