<feed xmlns='http://www.w3.org/2005/Atom'>
<title>sandcrawler/sql, branch trawler</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/sandcrawler/atom?h=trawler</id>
<link rel='self' href='https://git.bnewbold.net/sandcrawler/atom?h=trawler'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/'/>
<updated>2021-12-08T03:10:23+00:00</updated>
<entry>
<title>2021-12-02 database table size stats</title>
<updated>2021-12-08T03:10:23+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-04T00:37:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=57441fda8be33594898c1836fba22b12fb3e94e8'/>
<id>urn:sha1:57441fda8be33594898c1836fba22b12fb3e94e8</id>
<content type='text'>
</content>
</entry>
<entry>
<title>sandcrawler SQL dump and upload updates</title>
<updated>2021-12-08T03:10:23+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-04T00:37:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=9a32daa502e2c729cf896ae5e7cb27a3aa6bb68d'/>
<id>urn:sha1:9a32daa502e2c729cf896ae5e7cb27a3aa6bb68d</id>
<content type='text'>
</content>
</entry>
<entry>
<title>update fatcat_file SQL table schema, and add backfill notes</title>
<updated>2021-12-08T03:10:23+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-02T03:06:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=3f01f73563f40869c82b7ad3e21c4183fdee8206'/>
<id>urn:sha1:3f01f73563f40869c82b7ad3e21c4183fdee8206</id>
<content type='text'>
</content>
</entry>
<entry>
<title>update fatcat_file SQL table schema, and add backfill notes</title>
<updated>2021-12-02T03:06:33+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-02T03:06:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=4f02b47f57364195e7302ec80565ce51fd20048d'/>
<id>urn:sha1:4f02b47f57364195e7302ec80565ce51fd20048d</id>
<content type='text'>
</content>
</entry>
<entry>
<title>sandcrawler SQL stats</title>
<updated>2021-11-27T19:57:34+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-27T19:57:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=d6858033082825cb56a5000e74fe46c4cbbee86c'/>
<id>urn:sha1:d6858033082825cb56a5000e74fe46c4cbbee86c</id>
<content type='text'>
</content>
</entry>
<entry>
<title>sql: grobid_refs table JSON as 'JSON' not 'JSONB'</title>
<updated>2021-11-05T00:19:52+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-02T03:05:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=4315b44a93ca31725b9b0a2a55c310725ac55efe'/>
<id>urn:sha1:4315b44a93ca31725b9b0a2a55c310725ac55efe</id>
<content type='text'>
I keep flip-flopping on this, but our disk usage is really large, and if
'JSON' is smaller than 'JSONB' in postgresql at all it is worth it.
</content>
</entry>
<entry>
<title>record SQL table sizes at start of crossref re-ingest</title>
<updated>2021-11-05T00:19:52+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-02T01:10:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=7d4e3cd7229f55af8a5e443e6a22a4550b05ff7b'/>
<id>urn:sha1:7d4e3cd7229f55af8a5e443e6a22a4550b05ff7b</id>
<content type='text'>
</content>
</entry>
<entry>
<title>add grobid_refs and crossref_with_refs to sandcrawler-db SQL schema</title>
<updated>2021-11-05T00:19:52+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-10-30T01:38:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=59af5ddd0a9587eaf53b4f6965c0d6290295ce55'/>
<id>urn:sha1:59af5ddd0a9587eaf53b4f6965c0d6290295ce55</id>
<content type='text'>
</content>
</entry>
<entry>
<title>SPN reingest: 6 hour minimum, 6 month max</title>
<updated>2021-11-04T01:33:07+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-04T01:33:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=848556a64d13955c2978bad352f2e2cd9edb62d0'/>
<id>urn:sha1:848556a64d13955c2978bad352f2e2cd9edb62d0</id>
<content type='text'>
</content>
</entry>
<entry>
<title>sql: fix typo in quarterly (not weekly) script</title>
<updated>2021-11-04T01:32:12+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-04T01:32:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=4109cd96a95e321c2e420e453da5f2779a5f873b'/>
<id>urn:sha1:4109cd96a95e321c2e420e453da5f2779a5f873b</id>
<content type='text'>
</content>
</entry>
</feed>
