aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* improve pipeline commandsBryan Newbold2020-05-132-8/+23
|
* commit old CNKI/wanfang notesBryan Newbold2020-05-131-4/+8
|
* small improvements to deliver_file2disk.pyBryan Newbold2020-05-131-0/+4
|
* deliver_file2disk: handle another wayback error conditionBryan Newbold2020-04-131-0/+2
|
* pipeline scriptsBryan Newbold2020-04-102-2/+63
|
* add jupyter notebook and client notesBryan Newbold2020-04-102-0/+264
|
* handle ext_ids without _id in release schemaBryan Newbold2020-04-091-4/+7
|
* SERP: default to publisher DOI, not fatcat landingBryan Newbold2020-04-091-0/+2
|
* direct-link to arxiv papersBryan Newbold2020-04-091-0/+2
|
* add bs4 and lxml for abstract HTML strippingBryan Newbold2020-04-092-10/+67
|
* clean up jinja2 template whitespace in HTML outputBryan Newbold2020-04-091-0/+4
|
* attempt somewhat more robust abstract cleaningBryan Newbold2020-04-091-7/+4
| | | | | | Note: there is still a security and robustness issue here in that highlights are marked "safe". Should come up with a better mechanism for escaping/safing.
* transform: remove more tags from abstractsBryan Newbold2020-04-091-1/+1
|
* transform hacks for new fatcat documentsBryan Newbold2020-04-091-1/+16
|
* fix webface bug with missing importBryan Newbold2020-04-091-1/+1
|
* fix webface bugs with default filter valuesBryan Newbold2020-04-092-4/+4
|
* add dedupe and query-fatcat commandsBryan Newbold2020-04-093-0/+137
|
* bugfix: handle missing file mimetypesBryan Newbold2020-04-092-2/+2
|
* inclusion notes and pipeline updateBryan Newbold2020-04-092-16/+70
|
* small search tweaks and fixesBryan Newbold2020-04-083-29/+10
|
* refactor parse_cord19_csv.py into toolBryan Newbold2020-04-083-39/+55
|
* helper bash scripts for PDF derivativesBryan Newbold2020-04-082-0/+38
| | | | | These are lazy (won't run derive process if output already exists), which makes the process much faster
* serp: change stage tag colorsBryan Newbold2020-04-081-2/+4
| | | | | pink isn't very visible. and 'accepted' is reasonable status, so display as brown not red.
* parse_cnki_tables: pass file path as argBryan Newbold2020-04-081-1/+1
|
* special-case arxiv/medrxiv/biorxiv container namesBryan Newbold2020-04-081-0/+11
|
* transform: try to cleanup abstractsBryan Newbold2020-04-081-3/+31
|
* tweak desktop style: larger SERP fontBryan Newbold2020-04-083-50/+65
|
* initial implementation of filtersBryan Newbold2020-04-083-10/+98
|
* refactor search to use elasticsearch-dslBryan Newbold2020-04-082-68/+64
|
* webface: disable 404, 5xx custom error pagesBryan Newbold2020-04-081-11/+11
| | | | | | | | These break with 'lang_code' errors in our translation setup, I think because in this context we don't yet know what the language is. A work-around might be to manually set the language in the error handler function.
* add martin (@miku) as a contributorBryan Newbold2020-04-081-0/+2
|
* Merge branch 'martin-i18n-de-updates' into 'master'bnewbold2020-04-082-12/+11
|\ | | | | | | | | i18n: update a few [de] translations See merge request bnewbold/covid19.fatcat.wiki!1
| * i18n: update a few [de] translationsMartin Czygan2020-04-082-12/+11
|/
* update wanfang fulltext scrapingBryan Newbold2020-04-061-8/+32
|
* pipeline note tweaks for 2020-04-03Bryan Newbold2020-04-031-6/+3
|
* more i18n polishBryan Newbold2020-04-033-6/+6
|
* include ia_pdf_url when availableBryan Newbold2020-04-031-0/+4
|
* wrangle lang_code / error templates a bit moreBryan Newbold2020-04-032-8/+15
| | | | | Still not perfect: a 404 is always in 'en' and all links are then not in the local lang.
* fix mangled HTML in translated about+sourcesBryan Newbold2020-04-034-40/+40
|
* README: more depsBryan Newbold2020-04-031-0/+3
|
* update gitignoreBryan Newbold2020-04-031-5/+2
|
* fixes from prodBryan Newbold2020-04-034-9/+19
|
* update pipeline commandsBryan Newbold2020-04-031-12/+16
|
* tool: reorder commandsBryan Newbold2020-04-031-24/+18
|
* don't hide abstracts in fatcat fetchBryan Newbold2020-04-031-3/+3
|
* missing translations on front pageBryan Newbold2020-04-036-24/+61
|
* first stab at translationsBryan Newbold2020-04-0311-9/+694
|
* tweak home pageBryan Newbold2020-04-032-36/+22
|
* switch to 25 results by defaultBryan Newbold2020-04-031-2/+2
|
* README, about page, sources pageBryan Newbold2020-04-033-10/+206
|