aboutsummaryrefslogtreecommitdiffstats
Commit message (Expand)AuthorAgeFilesLines
* deliver_file2disk.py fixes from prodBryan Newbold2020-05-291-0/+6
* WIP on new makefile to drive pipelineBryan Newbold2020-05-291-0/+80
* metadata parse: new column titlesBryan Newbold2020-05-291-2/+4
* clarify header of deliver_file2disk.pyBryan Newbold2020-05-131-6/+0
* improve pipeline commandsBryan Newbold2020-05-132-8/+23
* commit old CNKI/wanfang notesBryan Newbold2020-05-131-4/+8
* small improvements to deliver_file2disk.pyBryan Newbold2020-05-131-0/+4
* deliver_file2disk: handle another wayback error conditionBryan Newbold2020-04-131-0/+2
* pipeline scriptsBryan Newbold2020-04-102-2/+63
* add jupyter notebook and client notesBryan Newbold2020-04-102-0/+264
* handle ext_ids without _id in release schemaBryan Newbold2020-04-091-4/+7
* SERP: default to publisher DOI, not fatcat landingBryan Newbold2020-04-091-0/+2
* direct-link to arxiv papersBryan Newbold2020-04-091-0/+2
* add bs4 and lxml for abstract HTML strippingBryan Newbold2020-04-092-10/+67
* clean up jinja2 template whitespace in HTML outputBryan Newbold2020-04-091-0/+4
* attempt somewhat more robust abstract cleaningBryan Newbold2020-04-091-7/+4
* transform: remove more tags from abstractsBryan Newbold2020-04-091-1/+1
* transform hacks for new fatcat documentsBryan Newbold2020-04-091-1/+16
* fix webface bug with missing importBryan Newbold2020-04-091-1/+1
* fix webface bugs with default filter valuesBryan Newbold2020-04-092-4/+4
* add dedupe and query-fatcat commandsBryan Newbold2020-04-093-0/+137
* bugfix: handle missing file mimetypesBryan Newbold2020-04-092-2/+2
* inclusion notes and pipeline updateBryan Newbold2020-04-092-16/+70
* small search tweaks and fixesBryan Newbold2020-04-083-29/+10
* refactor parse_cord19_csv.py into toolBryan Newbold2020-04-083-39/+55
* helper bash scripts for PDF derivativesBryan Newbold2020-04-082-0/+38
* serp: change stage tag colorsBryan Newbold2020-04-081-2/+4
* parse_cnki_tables: pass file path as argBryan Newbold2020-04-081-1/+1
* special-case arxiv/medrxiv/biorxiv container namesBryan Newbold2020-04-081-0/+11
* transform: try to cleanup abstractsBryan Newbold2020-04-081-3/+31
* tweak desktop style: larger SERP fontBryan Newbold2020-04-083-50/+65
* initial implementation of filtersBryan Newbold2020-04-083-10/+98
* refactor search to use elasticsearch-dslBryan Newbold2020-04-082-68/+64
* webface: disable 404, 5xx custom error pagesBryan Newbold2020-04-081-11/+11
* add martin (@miku) as a contributorBryan Newbold2020-04-081-0/+2
* Merge branch 'martin-i18n-de-updates' into 'master'bnewbold2020-04-082-12/+11
|\
| * i18n: update a few [de] translationsMartin Czygan2020-04-082-12/+11
|/
* update wanfang fulltext scrapingBryan Newbold2020-04-061-8/+32
* pipeline note tweaks for 2020-04-03Bryan Newbold2020-04-031-6/+3
* more i18n polishBryan Newbold2020-04-033-6/+6
* include ia_pdf_url when availableBryan Newbold2020-04-031-0/+4
* wrangle lang_code / error templates a bit moreBryan Newbold2020-04-032-8/+15
* fix mangled HTML in translated about+sourcesBryan Newbold2020-04-034-40/+40
* README: more depsBryan Newbold2020-04-031-0/+3
* update gitignoreBryan Newbold2020-04-031-5/+2
* fixes from prodBryan Newbold2020-04-034-9/+19
* update pipeline commandsBryan Newbold2020-04-031-12/+16
* tool: reorder commandsBryan Newbold2020-04-031-24/+18
* don't hide abstracts in fatcat fetchBryan Newbold2020-04-031-3/+3
* missing translations on front pageBryan Newbold2020-04-036-24/+61