| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | sandcrawler SQL stats | Bryan Newbold | 2021-11-27 | 2 | -12/+425 | 
| | | |||||
| * | sql: grobid_refs table JSON as 'JSON' not 'JSONB' | Bryan Newbold | 2021-11-04 | 1 | -1/+1 | 
| | | | | | | I keep flip-flopping on this, but our disk usage is really large, and if 'JSON' is smaller than 'JSONB' in postgresql at all it is worth it. | ||||
| * | record SQL table sizes at start of crossref re-ingest | Bryan Newbold | 2021-11-04 | 1 | -0/+19 | 
| | | |||||
| * | add grobid_refs and crossref_with_refs to sandcrawler-db SQL schema | Bryan Newbold | 2021-11-04 | 1 | -0/+21 | 
| | | |||||
| * | SPN reingest: 6 hour minimum, 6 month max | Bryan Newbold | 2021-11-03 | 1 | -2/+2 | 
| | | |||||
| * | sql: fix typo in quarterly (not weekly) script | Bryan Newbold | 2021-11-03 | 1 | -1/+1 | 
| | | |||||
| * | sql: fixes to ingest_fileset_platform schema (from table creation) | Bryan Newbold | 2021-11-01 | 1 | -6/+6 | 
| | | |||||
| * | commit old ingest domain summary | Bryan Newbold | 2021-10-15 | 1 | -0/+345 | 
| | | |||||
| * | sql fileset ingest table iteration | Bryan Newbold | 2021-10-15 | 1 | -12/+11 | 
| | | |||||
| * | sql: initial ingest fileset table | Bryan Newbold | 2021-10-15 | 1 | -0/+38 | 
| | | |||||
| * | sql: fix typo in CHECK statement | Bryan Newbold | 2021-10-15 | 1 | -1/+1 | 
| | | |||||
| * | new SQL recent SPN request monitoring query | Bryan Newbold | 2021-10-04 | 1 | -0/+32 | 
| | | |||||
| * | refactor reingest scripts | Bryan Newbold | 2021-09-30 | 6 | -150/+90 | 
| | | |||||
| * | new 'daily' and 'priority' ingest request topics | Bryan Newbold | 2021-09-30 | 2 | -2/+2 | 
| | | | | | | | | | | The old ingest request queue was always getting lopsided, suspect because it was scaled up (additional partitions) at some point in the past, hoping new topics will fix this. New '-priority' queue is like '-bulk', but for smaller-volume SPN-like requests. Eg, interactive mode. | ||||
| * | reingest: skip spn2 'unknown' errors | Bryan Newbold | 2021-07-21 | 2 | -0/+2 | 
| | | |||||
| * | crossref DB proposal, and include in SQL schema | Bryan Newbold | 2021-06-02 | 1 | -0/+7 | 
| | | |||||
| * | sql: do periodically retry spn2-wayback-error | Bryan Newbold | 2021-04-27 | 2 | -2/+0 | 
| | | |||||
| * | reingest scripts to run as sandcrawler | Bryan Newbold | 2021-04-09 | 2 | -12/+12 | 
| | | |||||
| * | sql: notes on sql restore | Bryan Newbold | 2021-04-09 | 1 | -0/+9 | 
| | | |||||
| * | sql: update paths to work with svc506 machine | Bryan Newbold | 2021-04-09 | 12 | -49/+49 | 
| | | |||||
| * | sql: before/after pg13 table size stats | Bryan Newbold | 2021-04-09 | 2 | -1/+43 | 
| | | |||||
| * | sql: update periodic retry/reingest scripts | Bryan Newbold | 2021-04-09 | 4 | -6/+14 | 
| | | |||||
| * | SQL snapshot doc update | Bryan Newbold | 2021-04-07 | 1 | -2/+5 | 
| | | |||||
| * | 2021-04-07 sandcrawler DB stats | Bryan Newbold | 2021-04-07 | 1 | -0/+428 | 
| | | |||||
| * | SQL: more ingest monitoring | Bryan Newbold | 2020-11-16 | 3 | -1/+660 | 
| | | |||||
| * | tweak html_meta SQL schema | Bryan Newbold | 2020-11-03 | 1 | -2/+2 | 
| | | |||||
| * | SQL: unmatched glutton query (old) | Bryan Newbold | 2020-11-03 | 1 | -0/+19 | 
| | | |||||
| * | monitoring: past-7-days summary query | Bryan Newbold | 2020-11-03 | 1 | -0/+26 | 
| | | |||||
| * | html: start on SQL table | Bryan Newbold | 2020-11-03 | 1 | -0/+15 | 
| | | |||||
| * | SQL: update weekly/quarterly ingest retry scripts | Bryan Newbold | 2020-10-21 | 5 | -18/+119 | 
| | | |||||
| * | sql stats: larger limits (more complete lists) | Bryan Newbold | 2020-10-21 | 1 | -8/+8 | 
| | | |||||
| * | update SQL ingest monitoring commands to be past-month by default | Bryan Newbold | 2020-10-17 | 1 | -5/+5 | 
| | | |||||
| * | dump_file_meta helper | Bryan Newbold | 2020-10-01 | 1 | -0/+12 | 
| | | |||||
| * | updated sandcrawler-db stats | Bryan Newbold | 2020-09-15 | 2 | -6/+346 | 
| | | |||||
| * | WIP weekly re-ingest script | Bryan Newbold | 2020-08-17 | 2 | -0/+97 | 
| | | |||||
| * | grobid+pdftext missing catch-up commands | Bryan Newbold | 2020-08-05 | 4 | -10/+49 | 
| | | |||||
| * | commit stats from a couple weeks back | Bryan Newbold | 2020-08-05 | 1 | -0/+347 | 
| | | |||||
| * | sql stats commands updates | Bryan Newbold | 2020-08-05 | 1 | -2/+2 | 
| | | |||||
| * | commented special modes for dump_unextracted_pdf.sql | Bryan Newbold | 2020-06-25 | 1 | -1/+4 | 
| | | |||||
| * | pdftrio SQL queries | Bryan Newbold | 2020-06-25 | 1 | -0/+65 | 
| | | |||||
| * | SQL commands for re-trying PDF ingests | Bryan Newbold | 2020-06-25 | 1 | -0/+158 | 
| | | |||||
| * | unextracted PDF job dump command | Bryan Newbold | 2020-06-25 | 1 | -0/+16 | 
| | | |||||
| * | tweak pdf_meta SQL schema | Bryan Newbold | 2020-06-17 | 1 | -0/+26 | 
| | | |||||
| * | update sandcrawler stats for early may | Bryan Newbold | 2020-05-04 | 1 | -0/+418 | 
| | | |||||
| * | more monitoring queries | Bryan Newbold | 2020-03-30 | 1 | -5/+29 | 
| | | |||||
| * | make monitoring commands ingest_request local, not ingest_file_result | Bryan Newbold | 2020-03-17 | 1 | -2/+2 | 
| | | |||||
| * | DOI prefix example queries (SQL) | Bryan Newbold | 2020-03-10 | 1 | -3/+17 | 
| | | |||||
| * | helpful daily/weekly monitoring SQL queries | Bryan Newbold | 2020-03-10 | 1 | -0/+94 | 
| | | |||||
| * | sandcrawler schema: add MD5 index | Bryan Newbold | 2020-03-05 | 1 | -0/+1 | 
| | | |||||
| * | more SQL queries | Bryan Newbold | 2020-03-02 | 1 | -0/+57 | 
| | | |||||
