index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
sql
Commit message (
Expand
)
Author
Age
Files
Lines
*
sql: Makefile for SQL dumps/uploads
Bryan Newbold
2022-11-23
1
-0
/
+35
*
reingests: update scripts and SQL
Bryan Newbold
2022-10-03
7
-6
/
+127
*
sandcrawler SQL-based status (sept 2022)
Bryan Newbold
2022-09-07
1
-0
/
+438
*
stats: may 2022 ingest-by-domain stats
Bryan Newbold
2022-07-07
1
-0
/
+410
*
some weekly crawl numbers (not very helpful)
Bryan Newbold
2022-05-03
1
-0
/
+191
*
switch default kafka-broker host from wbgrp-svc263 to wbgrp-svc350
Bryan Newbold
2022-05-03
4
-4
/
+4
*
April 2022 sandcrawler DB stats
Bryan Newbold
2022-04-27
1
-0
/
+432
*
sql: add source/created index on ingest_request table
Bryan Newbold
2022-04-04
1
-0
/
+1
*
sql: fix reingest query missing type on LEFT JOIN; wrap in read-only transaction
Bryan Newbold
2022-04-04
5
-5
/
+27
*
sql: script to reingest recent spn2 lookup failure in bulk mode
Bryan Newbold
2022-02-08
5
-18
/
+71
*
2021-12-02 database table size stats
Bryan Newbold
2021-12-07
1
-0
/
+22
*
sandcrawler SQL dump and upload updates
Bryan Newbold
2021-12-07
1
-4
/
+12
*
update fatcat_file SQL table schema, and add backfill notes
Bryan Newbold
2021-12-07
1
-1
/
+3
*
update fatcat_file SQL table schema, and add backfill notes
Bryan Newbold
2021-12-01
1
-0
/
+13
*
sandcrawler SQL stats
Bryan Newbold
2021-11-27
2
-12
/
+425
*
sql: grobid_refs table JSON as 'JSON' not 'JSONB'
Bryan Newbold
2021-11-04
1
-1
/
+1
*
record SQL table sizes at start of crossref re-ingest
Bryan Newbold
2021-11-04
1
-0
/
+19
*
add grobid_refs and crossref_with_refs to sandcrawler-db SQL schema
Bryan Newbold
2021-11-04
1
-0
/
+21
*
SPN reingest: 6 hour minimum, 6 month max
Bryan Newbold
2021-11-03
1
-2
/
+2
*
sql: fix typo in quarterly (not weekly) script
Bryan Newbold
2021-11-03
1
-1
/
+1
*
sql: fixes to ingest_fileset_platform schema (from table creation)
Bryan Newbold
2021-11-01
1
-6
/
+6
*
commit old ingest domain summary
Bryan Newbold
2021-10-15
1
-0
/
+345
*
sql fileset ingest table iteration
Bryan Newbold
2021-10-15
1
-12
/
+11
*
sql: initial ingest fileset table
Bryan Newbold
2021-10-15
1
-0
/
+38
*
sql: fix typo in CHECK statement
Bryan Newbold
2021-10-15
1
-1
/
+1
*
new SQL recent SPN request monitoring query
Bryan Newbold
2021-10-04
1
-0
/
+32
*
refactor reingest scripts
Bryan Newbold
2021-09-30
6
-150
/
+90
*
new 'daily' and 'priority' ingest request topics
Bryan Newbold
2021-09-30
2
-2
/
+2
*
reingest: skip spn2 'unknown' errors
Bryan Newbold
2021-07-21
2
-0
/
+2
*
crossref DB proposal, and include in SQL schema
Bryan Newbold
2021-06-02
1
-0
/
+7
*
sql: do periodically retry spn2-wayback-error
Bryan Newbold
2021-04-27
2
-2
/
+0
*
reingest scripts to run as sandcrawler
Bryan Newbold
2021-04-09
2
-12
/
+12
*
sql: notes on sql restore
Bryan Newbold
2021-04-09
1
-0
/
+9
*
sql: update paths to work with svc506 machine
Bryan Newbold
2021-04-09
12
-49
/
+49
*
sql: before/after pg13 table size stats
Bryan Newbold
2021-04-09
2
-1
/
+43
*
sql: update periodic retry/reingest scripts
Bryan Newbold
2021-04-09
4
-6
/
+14
*
SQL snapshot doc update
Bryan Newbold
2021-04-07
1
-2
/
+5
*
2021-04-07 sandcrawler DB stats
Bryan Newbold
2021-04-07
1
-0
/
+428
*
SQL: more ingest monitoring
Bryan Newbold
2020-11-16
3
-1
/
+660
*
tweak html_meta SQL schema
Bryan Newbold
2020-11-03
1
-2
/
+2
*
SQL: unmatched glutton query (old)
Bryan Newbold
2020-11-03
1
-0
/
+19
*
monitoring: past-7-days summary query
Bryan Newbold
2020-11-03
1
-0
/
+26
*
html: start on SQL table
Bryan Newbold
2020-11-03
1
-0
/
+15
*
SQL: update weekly/quarterly ingest retry scripts
Bryan Newbold
2020-10-21
5
-18
/
+119
*
sql stats: larger limits (more complete lists)
Bryan Newbold
2020-10-21
1
-8
/
+8
*
update SQL ingest monitoring commands to be past-month by default
Bryan Newbold
2020-10-17
1
-5
/
+5
*
dump_file_meta helper
Bryan Newbold
2020-10-01
1
-0
/
+12
*
updated sandcrawler-db stats
Bryan Newbold
2020-09-15
2
-6
/
+346
*
WIP weekly re-ingest script
Bryan Newbold
2020-08-17
2
-0
/
+97
*
grobid+pdftext missing catch-up commands
Bryan Newbold
2020-08-05
4
-10
/
+49
[next]