<feed xmlns='http://www.w3.org/2005/Atom'>
<title>sandcrawler, branch trawler</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/sandcrawler/atom?h=trawler</id>
<link rel='self' href='https://git.bnewbold.net/sandcrawler/atom?h=trawler'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/'/>
<updated>2021-12-09T22:12:18Z</updated>
<entry>
<title>notes on re-GROBID-ing (and re-extracting) some files</title>
<updated>2021-12-09T22:12:18Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-09T22:12:18Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=e5c021bfeb03c50924160616dc64d44617d45933'/>
<id>urn:sha1:e5c021bfeb03c50924160616dc64d44617d45933</id>
<content type='text'>
</content>
</entry>
<entry>
<title>grobid: set a maximum file size (256 MByte)</title>
<updated>2021-12-08T03:44:53Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-08T03:44:53Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=89b5f51e57d3a0cc043640262e396e28297e7c00'/>
<id>urn:sha1:89b5f51e57d3a0cc043640262e396e28297e7c00</id>
<content type='text'>
</content>
</entry>
<entry>
<title>worker: add kafka_group_suffix option</title>
<updated>2021-12-08T03:10:23Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-08T03:09:54Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=833f9bb5181419ca9f5af0f9ba0e2e047ee164d4'/>
<id>urn:sha1:833f9bb5181419ca9f5af0f9ba0e2e047ee164d4</id>
<content type='text'>
</content>
</entry>
<entry>
<title>ingest tool: allow configuration of GROBID endpoint</title>
<updated>2021-12-08T03:10:23Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-04T00:38:28Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=5c82ee1b965e1f3901294c752d8b2d24c6bdc974'/>
<id>urn:sha1:5c82ee1b965e1f3901294c752d8b2d24c6bdc974</id>
<content type='text'>
</content>
</entry>
<entry>
<title>2021-12-02 database table size stats</title>
<updated>2021-12-08T03:10:23Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-04T00:37:45Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=57441fda8be33594898c1836fba22b12fb3e94e8'/>
<id>urn:sha1:57441fda8be33594898c1836fba22b12fb3e94e8</id>
<content type='text'>
</content>
</entry>
<entry>
<title>sandcrawler SQL dump and upload updates</title>
<updated>2021-12-08T03:10:23Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-04T00:37:22Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=9a32daa502e2c729cf896ae5e7cb27a3aa6bb68d'/>
<id>urn:sha1:9a32daa502e2c729cf896ae5e7cb27a3aa6bb68d</id>
<content type='text'>
</content>
</entry>
<entry>
<title>update fatcat_file SQL table schema, and add backfill notes</title>
<updated>2021-12-08T03:10:23Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-02T03:06:33Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=3f01f73563f40869c82b7ad3e21c4183fdee8206'/>
<id>urn:sha1:3f01f73563f40869c82b7ad3e21c4183fdee8206</id>
<content type='text'>
</content>
</entry>
<entry>
<title>update fatcat_file SQL table schema, and add backfill notes</title>
<updated>2021-12-02T03:06:33Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-02T03:06:33Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=4f02b47f57364195e7302ec80565ce51fd20048d'/>
<id>urn:sha1:4f02b47f57364195e7302ec80565ce51fd20048d</id>
<content type='text'>
</content>
</entry>
<entry>
<title>commit old patch crawl notes</title>
<updated>2021-12-02T03:06:00Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-02T03:06:00Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=85a9c9008ab66680047fb151996c55566d56cbe3'/>
<id>urn:sha1:85a9c9008ab66680047fb151996c55566d56cbe3</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Revert "pipenv: update deps"</title>
<updated>2021-12-02T01:37:19Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-02T01:37:10Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=6777004f20f742134105c18d6bae06d0ce362d50'/>
<id>urn:sha1:6777004f20f742134105c18d6bae06d0ce362d50</id>
<content type='text'>
This reverts commit 7a5b203dbb37958a452eb1be3bd1bf8ed94cbbce.

There is a problem with `internetarchive` 2.2.0, so reverting for now.
</content>
</entry>
</feed>
