index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
sql
Commit message (
Expand
)
Author
Age
Files
Lines
*
2021-04-07 sandcrawler DB stats
Bryan Newbold
2021-04-07
1
-0
/
+428
*
SQL: more ingest monitoring
Bryan Newbold
2020-11-16
3
-1
/
+660
*
tweak html_meta SQL schema
Bryan Newbold
2020-11-03
1
-2
/
+2
*
SQL: unmatched glutton query (old)
Bryan Newbold
2020-11-03
1
-0
/
+19
*
monitoring: past-7-days summary query
Bryan Newbold
2020-11-03
1
-0
/
+26
*
html: start on SQL table
Bryan Newbold
2020-11-03
1
-0
/
+15
*
SQL: update weekly/quarterly ingest retry scripts
Bryan Newbold
2020-10-21
5
-18
/
+119
*
sql stats: larger limits (more complete lists)
Bryan Newbold
2020-10-21
1
-8
/
+8
*
update SQL ingest monitoring commands to be past-month by default
Bryan Newbold
2020-10-17
1
-5
/
+5
*
dump_file_meta helper
Bryan Newbold
2020-10-01
1
-0
/
+12
*
updated sandcrawler-db stats
Bryan Newbold
2020-09-15
2
-6
/
+346
*
WIP weekly re-ingest script
Bryan Newbold
2020-08-17
2
-0
/
+97
*
grobid+pdftext missing catch-up commands
Bryan Newbold
2020-08-05
4
-10
/
+49
*
commit stats from a couple weeks back
Bryan Newbold
2020-08-05
1
-0
/
+347
*
sql stats commands updates
Bryan Newbold
2020-08-05
1
-2
/
+2
*
commented special modes for dump_unextracted_pdf.sql
Bryan Newbold
2020-06-25
1
-1
/
+4
*
pdftrio SQL queries
Bryan Newbold
2020-06-25
1
-0
/
+65
*
SQL commands for re-trying PDF ingests
Bryan Newbold
2020-06-25
1
-0
/
+158
*
unextracted PDF job dump command
Bryan Newbold
2020-06-25
1
-0
/
+16
*
tweak pdf_meta SQL schema
Bryan Newbold
2020-06-17
1
-0
/
+26
*
update sandcrawler stats for early may
Bryan Newbold
2020-05-04
1
-0
/
+418
*
more monitoring queries
Bryan Newbold
2020-03-30
1
-5
/
+29
*
make monitoring commands ingest_request local, not ingest_file_result
Bryan Newbold
2020-03-17
1
-2
/
+2
*
DOI prefix example queries (SQL)
Bryan Newbold
2020-03-10
1
-3
/
+17
*
helpful daily/weekly monitoring SQL queries
Bryan Newbold
2020-03-10
1
-0
/
+94
*
sandcrawler schema: add MD5 index
Bryan Newbold
2020-03-05
1
-0
/
+1
*
more SQL queries
Bryan Newbold
2020-03-02
1
-0
/
+57
*
recent sandcrawler-db / ingest stats (interesting)
Bryan Newbold
2020-02-24
2
-0
/
+488
*
dump_regrobid_pdf_petabox.sql script
Bryan Newbold
2020-02-12
1
-0
/
+15
*
sandcrawler-db extra stats
Bryan Newbold
2020-02-12
1
-0
/
+42
*
pdftrio proposal and start on schema+kafka
Bryan Newbold
2020-02-12
1
-0
/
+13
*
more random sandcrawler-db queries
Bryan Newbold
2020-02-03
2
-32
/
+62
*
more SQL commands
Bryan Newbold
2020-02-02
1
-0
/
+15
*
sql stats: typo fix
Bryan Newbold
2020-01-28
1
-1
/
+1
*
sql howto: database dumps
Bryan Newbold
2020-01-28
1
-0
/
+7
*
clarify ingest result schema and semantics
Bryan Newbold
2020-01-15
1
-0
/
+16
*
database stats
Bryan Newbold
2020-01-14
2
-0
/
+289
*
sql: more cool random queries
Bryan Newbold
2020-01-02
1
-0
/
+5
*
SQL docs update for diesel change
Bryan Newbold
2020-01-02
2
-0
/
+48
*
move SQL schema to diesel migration pattern
Bryan Newbold
2020-01-02
5
-70
/
+157
*
add some GROBID metadata schema docs to SQL schema
Bryan Newbold
2019-12-11
1
-0
/
+11
*
add note to CDX backfill script that we should be filtering (oops)
Bryan Newbold
2019-11-12
1
-0
/
+1
*
SQL stats and commands (mostly from sept 2019)
Bryan Newbold
2019-11-12
4
-0
/
+96
*
rename postgrest directory sql
Bryan Newbold
2019-09-23
9
-0
/
+768