aboutsummaryrefslogtreecommitdiffstats
path: root/proposals
Commit message (Expand)AuthorAgeFilesLines
* 'trawling' proposal (in progress)Bryan Newbold2022-01-271-0/+177
* codespell fixes in proposalsBryan Newbold2021-11-248-16/+16
* sql: grobid_refs table JSON as 'JSON' not 'JSONB'Bryan Newbold2021-11-041-2/+2
* update grobid refs proposalBryan Newbold2021-11-041-10/+72
* initial proposal for GROBID refs table and pipelineBryan Newbold2021-11-041-0/+63
* sql: fixes to ingest_fileset_platform schema (from table creation)Bryan Newbold2021-11-011-6/+6
* commit SPN account changesBryan Newbold2021-10-151-0/+14
* persist support for ingest platform table, using existing persist workerBryan Newbold2021-10-151-2/+2
* document passing back platform_base_urlBryan Newbold2021-10-151-0/+1
* filesets: iteration of implementation and docsBryan Newbold2021-10-151-14/+19
* updates to fileset ingest proposalBryan Newbold2021-10-152-239/+337
* fileset ingest notesBryan Newbold2021-10-151-3/+23
* dataset ingest: start enumerating examplesBryan Newbold2021-10-151-0/+34
* initial dataset/fileset ingest proposalBryan Newbold2021-10-151-0/+185
* ingest: basic 'component' and 'src' supportBryan Newbold2021-10-042-0/+167
* crossref DB proposal, and include in SQL schemaBryan Newbold2021-06-021-0/+86
* update HTML ingest proposalBryan Newbold2020-12-231-1/+3
* html: update proposal (docs)Bryan Newbold2020-11-061-19/+49
* xml: re-encode XML docs into UTF-8 for persistingBryan Newbold2020-11-031-1/+18
* XML ingest proposalBryan Newbold2020-11-031-0/+64
* commit WIP HTML ingest proposalBryan Newbold2020-11-031-0/+97
* store no-capture URLs in terminal_urlBryan Newbold2020-10-121-0/+36
* seaweedfs proposal: fix typos and wordingMartin Czygan2020-07-011-9/+11
* tweak pdf_meta SQL schemaBryan Newbold2020-06-171-5/+5
* tweak kafka topic names and seaweedfs layoutBryan Newbold2020-06-171-3/+4
* pdf thumbnail+text+meta proposalBryan Newbold2020-06-171-0/+327
* Merge branch 'martin-seaweed-s3' into 'master'bnewbold2020-05-261-0/+424
|\
| * notes on seaweedfs (s3 backend)Martin Czygan2020-05-201-0/+424
* | NSQ for job task manager/schedulerBryan Newbold2020-04-281-0/+79
|/
* ingest: add force_recrawl flag to skip historical wayback lookupBryan Newbold2020-03-021-0/+1
* move edit_extra path to top-levelBryan Newbold2020-02-181-2/+1
* include rel and oa_status in ingest request 'extra'Bryan Newbold2020-02-181-0/+4
* move pdf_trio results back under key in JSON/KafkaBryan Newbold2020-02-131-15/+18
* pdftrio JSON object as top-level in Kafka resultsBryan Newbold2020-02-121-16/+16
* pdftrio basic python codeBryan Newbold2020-02-121-2/+2
* pdftrio proposal and start on schema+kafkaBryan Newbold2020-02-121-0/+101
* 2020q1 fulltext ingest plansBryan Newbold2020-01-291-0/+272
* clarify ingest result schema and semanticsBryan Newbold2020-01-151-23/+34
* clarify pmc/pmcid pairingBryan Newbold2020-01-141-3/+3
* yet more tweaks to ingest proposalBryan Newbold2020-01-021-3/+2
* update ingest proposal source/link namingBryan Newbold2019-12-131-16/+26
* sql schema change proposalsBryan Newbold2019-12-111-0/+40
* pdftotext proposalBryan Newbold2019-12-111-0/+123
* update ingest proposalBryan Newbold2019-12-111-11/+145
* add structure of ingest proposalBryan Newbold2019-11-131-0/+129