aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* improve text scrubbingBryan Newbold2020-06-032-13/+36
| | | | | | | | | | Was going to use textpipe, but dependency was too large and failed to install with halfway modern GCC (due to CLD2 issue): https://github.com/GregBowyer/cld2-cffi/issues/12 So instead basically pulled out the clean_text function, which is quite short.
* partially resolve HTML form/hidden weirdnessBryan Newbold2020-06-032-3/+13
|
* basic paginationBryan Newbold2020-06-032-0/+27
| | | | Not well tested
* tweak thumbnail vertical alignmentBryan Newbold2020-06-031-1/+1
|
* compute and use tagsBryan Newbold2020-06-032-2/+42
|
* add additional fatcat metadata tagBryan Newbold2020-06-031-5/+11
|
* start fleshing out /about and /helpBryan Newbold2020-06-034-12/+152
|
* replace one of the PLOS thumbnails on homepageBryan Newbold2020-06-031-10/+11
|
* change availability filter phrasing; default to fulltextBryan Newbold2020-06-031-6/+6
|
* tweak search box placeholder textBryan Newbold2020-06-031-1/+1
|
* most links in new tab (target=_blank)Bryan Newbold2020-06-034-30/+30
|
* commit prototype pipeline notes (in README)Bryan Newbold2020-06-031-0/+47
|
* more petabox timeout handlingBryan Newbold2020-05-212-0/+6
|
* handle petabox read timeouts a bitBryan Newbold2020-05-212-2/+12
|
* HTML strip in ES indexingBryan Newbold2020-05-211-4/+4
|
* add prefix scrubing (esp. for abstracts)Bryan Newbold2020-05-211-0/+18
|
* use beautiful soup for XML scrubingBryan Newbold2020-05-211-7/+6
|
* make mypy happyBryan Newbold2020-05-211-1/+1
|
* helpers to fetch small-ish data samplesBryan Newbold2020-05-211-0/+8
|
* implement crude availability filterBryan Newbold2020-05-211-0/+11
|
* fix typo in indexed document linksBryan Newbold2020-05-211-1/+1
|
* be more inclusive of author namesBryan Newbold2020-05-211-4/+4
|
* fix abstracts; experiment with search stemmingBryan Newbold2020-05-213-8/+36
|
* first pass improving search scoringBryan Newbold2020-05-212-5/+36
|
* better translation marking; add some basic de and zhBryan Newbold2020-05-2112-61/+534
| | | | Current translations are just from Google Translate
* UI mobile/tablet scaling; search error improvementsBryan Newbold2020-05-215-12/+42
|
* mobile CSS/style changes, and other small UI tweaksBryan Newbold2020-05-215-42/+100
|
* fix typo with UnicodeDecodeError catchBryan Newbold2020-05-211-1/+1
|
* clean up domain/env detection codeBryan Newbold2020-05-213-29/+16
|
* search query improvementsBryan Newbold2020-05-215-145/+236
| | | | | | - wire up most of the filters and sort order - query sticks around in search box - crude error message (needs work)
* less whitespace in jinja2 outputBryan Newbold2020-05-211-0/+4
|
* abstracts as object, not nested, until query parserBryan Newbold2020-05-211-5/+3
|
* skip pdftotext loading on unicode errorBryan Newbold2020-05-201-0/+2
|
* skip SIM items w/o page_numbers (instead of asserting)Bryan Newbold2020-05-202-2/+6
|
* fewer, longer highlights (2x of 250 chars)Bryan Newbold2020-05-201-4/+4
|
* schema: releases as objects, not nestedBryan Newbold2020-05-201-1/+1
| | | | | | With nested, we can't do simple aliases. In the future a proper query parser will make this possible.
* schema: many more aliasesBryan Newbold2020-05-201-1/+19
|
* add a helper tag for search index documentBryan Newbold2020-05-201-1/+5
|
* fix some ext_id linksBryan Newbold2020-05-201-4/+4
|
* fixes from manual testingBryan Newbold2020-05-206-25/+33
|
* local pdftotext cache dir hackBryan Newbold2020-05-202-1/+19
|
* fixes to release+sim pipelineBryan Newbold2020-05-203-12/+39
|
* working docker-compose with elasticsearch (with plugins)Bryan Newbold2020-05-202-0/+24
|
* fixes to schema; actually working nowBryan Newbold2020-05-201-3/+4
|
* default search locations for different environmentsBryan Newbold2020-05-201-1/+4
|
* local/dev indexing commandBryan Newbold2020-05-201-8/+8
|
* indexing tweaksBryan Newbold2020-05-202-16/+11
|
* update search template for schemaBryan Newbold2020-05-201-129/+95
|
* first pass transform from pipelines to ES schemaBryan Newbold2020-05-206-27/+541
|
* WIP on SIM pipelineBryan Newbold2020-05-192-2/+175
|