aboutsummaryrefslogtreecommitdiffstats
path: root/sql
Commit message (Collapse)AuthorAgeFilesLines
* sql: fix reingest query missing type on LEFT JOIN; wrap in read-only transactionBryan Newbold2022-04-045-5/+27
|
* sql: script to reingest recent spn2 lookup failure in bulk modeBryan Newbold2022-02-085-18/+71
|
* 2021-12-02 database table size statsBryan Newbold2021-12-071-0/+22
|
* sandcrawler SQL dump and upload updatesBryan Newbold2021-12-071-4/+12
|
* update fatcat_file SQL table schema, and add backfill notesBryan Newbold2021-12-071-1/+3
|
* update fatcat_file SQL table schema, and add backfill notesBryan Newbold2021-12-011-0/+13
|
* sandcrawler SQL statsBryan Newbold2021-11-272-12/+425
|
* sql: grobid_refs table JSON as 'JSON' not 'JSONB'Bryan Newbold2021-11-041-1/+1
| | | | | I keep flip-flopping on this, but our disk usage is really large, and if 'JSON' is smaller than 'JSONB' in postgresql at all it is worth it.
* record SQL table sizes at start of crossref re-ingestBryan Newbold2021-11-041-0/+19
|
* add grobid_refs and crossref_with_refs to sandcrawler-db SQL schemaBryan Newbold2021-11-041-0/+21
|
* SPN reingest: 6 hour minimum, 6 month maxBryan Newbold2021-11-031-2/+2
|
* sql: fix typo in quarterly (not weekly) scriptBryan Newbold2021-11-031-1/+1
|
* sql: fixes to ingest_fileset_platform schema (from table creation)Bryan Newbold2021-11-011-6/+6
|
* commit old ingest domain summaryBryan Newbold2021-10-151-0/+345
|
* sql fileset ingest table iterationBryan Newbold2021-10-151-12/+11
|
* sql: initial ingest fileset tableBryan Newbold2021-10-151-0/+38
|
* sql: fix typo in CHECK statementBryan Newbold2021-10-151-1/+1
|
* new SQL recent SPN request monitoring queryBryan Newbold2021-10-041-0/+32
|
* refactor reingest scriptsBryan Newbold2021-09-306-150/+90
|
* new 'daily' and 'priority' ingest request topicsBryan Newbold2021-09-302-2/+2
| | | | | | | | | The old ingest request queue was always getting lopsided, suspect because it was scaled up (additional partitions) at some point in the past, hoping new topics will fix this. New '-priority' queue is like '-bulk', but for smaller-volume SPN-like requests. Eg, interactive mode.
* reingest: skip spn2 'unknown' errorsBryan Newbold2021-07-212-0/+2
|
* crossref DB proposal, and include in SQL schemaBryan Newbold2021-06-021-0/+7
|
* sql: do periodically retry spn2-wayback-errorBryan Newbold2021-04-272-2/+0
|
* reingest scripts to run as sandcrawlerBryan Newbold2021-04-092-12/+12
|
* sql: notes on sql restoreBryan Newbold2021-04-091-0/+9
|
* sql: update paths to work with svc506 machineBryan Newbold2021-04-0912-49/+49
|
* sql: before/after pg13 table size statsBryan Newbold2021-04-092-1/+43
|
* sql: update periodic retry/reingest scriptsBryan Newbold2021-04-094-6/+14
|
* SQL snapshot doc updateBryan Newbold2021-04-071-2/+5
|
* 2021-04-07 sandcrawler DB statsBryan Newbold2021-04-071-0/+428
|
* SQL: more ingest monitoringBryan Newbold2020-11-163-1/+660
|
* tweak html_meta SQL schemaBryan Newbold2020-11-031-2/+2
|
* SQL: unmatched glutton query (old)Bryan Newbold2020-11-031-0/+19
|
* monitoring: past-7-days summary queryBryan Newbold2020-11-031-0/+26
|
* html: start on SQL tableBryan Newbold2020-11-031-0/+15
|
* SQL: update weekly/quarterly ingest retry scriptsBryan Newbold2020-10-215-18/+119
|
* sql stats: larger limits (more complete lists)Bryan Newbold2020-10-211-8/+8
|
* update SQL ingest monitoring commands to be past-month by defaultBryan Newbold2020-10-171-5/+5
|
* dump_file_meta helperBryan Newbold2020-10-011-0/+12
|
* updated sandcrawler-db statsBryan Newbold2020-09-152-6/+346
|
* WIP weekly re-ingest scriptBryan Newbold2020-08-172-0/+97
|
* grobid+pdftext missing catch-up commandsBryan Newbold2020-08-054-10/+49
|
* commit stats from a couple weeks backBryan Newbold2020-08-051-0/+347
|
* sql stats commands updatesBryan Newbold2020-08-051-2/+2
|
* commented special modes for dump_unextracted_pdf.sqlBryan Newbold2020-06-251-1/+4
|
* pdftrio SQL queriesBryan Newbold2020-06-251-0/+65
|
* SQL commands for re-trying PDF ingestsBryan Newbold2020-06-251-0/+158
|
* unextracted PDF job dump commandBryan Newbold2020-06-251-0/+16
|
* tweak pdf_meta SQL schemaBryan Newbold2020-06-171-0/+26
|
* update sandcrawler stats for early mayBryan Newbold2020-05-041-0/+418
|