<feed xmlns='http://www.w3.org/2005/Atom'>
<title>sandcrawler/notes/tasks, branch trawler</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/sandcrawler/atom?h=trawler</id>
<link rel='self' href='https://git.bnewbold.net/sandcrawler/atom?h=trawler'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/'/>
<updated>2021-12-09T22:12:18+00:00</updated>
<entry>
<title>notes on re-GROBID-ing (and re-extracting) some files</title>
<updated>2021-12-09T22:12:18+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-09T22:12:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=e5c021bfeb03c50924160616dc64d44617d45933'/>
<id>urn:sha1:e5c021bfeb03c50924160616dc64d44617d45933</id>
<content type='text'>
</content>
</entry>
<entry>
<title>wrap up crossref refs backfill notes</title>
<updated>2021-11-11T01:25:34+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-11T01:25:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=bdccd79d741cab89cd28202a352044ed55624503'/>
<id>urn:sha1:bdccd79d741cab89cd28202a352044ed55624503</id>
<content type='text'>
</content>
</entry>
<entry>
<title>update crossref/grobid refs generation notes</title>
<updated>2021-11-05T00:19:52+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-05T00:17:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=c0da811394b9de8e30e94fa46933c72b8e5fdb19'/>
<id>urn:sha1:c0da811394b9de8e30e94fa46933c72b8e5fdb19</id>
<content type='text'>
</content>
</entry>
<entry>
<title>grobid refs backfill progress</title>
<updated>2021-11-05T00:19:52+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-02T03:04:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=2e9cd60819531ad73ce71f3a84109ad164624a40'/>
<id>urn:sha1:2e9cd60819531ad73ce71f3a84109ad164624a40</id>
<content type='text'>
</content>
</entry>
<entry>
<title>start notes on crossref refs backfill</title>
<updated>2021-11-05T00:19:52+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-02T00:55:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=1996cabae1be70d40cee03d1804a33797fc6d663'/>
<id>urn:sha1:1996cabae1be70d40cee03d1804a33797fc6d663</id>
<content type='text'>
</content>
</entry>
<entry>
<title>old (2020) notes on pdfextract cleanup</title>
<updated>2021-10-04T19:56:56+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-10-04T19:56:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=a613cf2fa66e59c412b9de15e487ab5d3431bb51'/>
<id>urn:sha1:a613cf2fa66e59c412b9de15e487ab5d3431bb51</id>
<content type='text'>
</content>
</entry>
<entry>
<title>notes on dumping PDF URL lists for partners</title>
<updated>2021-10-04T19:56:29+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-10-04T19:56:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=eebacc4fb395d829e342dab02f42591b29b942ae'/>
<id>urn:sha1:eebacc4fb395d829e342dab02f42591b29b942ae</id>
<content type='text'>
</content>
</entry>
<entry>
<title>notes on file_meta task (from august)</title>
<updated>2020-10-02T02:52:06+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2020-10-02T02:52:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=9753876b85c767a9848467065b4d4dd613d5ed68'/>
<id>urn:sha1:9753876b85c767a9848467065b4d4dd613d5ed68</id>
<content type='text'>
</content>
</entry>
<entry>
<title>follow-up notes on processing 'holes'</title>
<updated>2020-09-02T23:10:13+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2020-09-02T23:10:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=8cc3cebd2392d16026214f5e92b99a322ef2e044'/>
<id>urn:sha1:8cc3cebd2392d16026214f5e92b99a322ef2e044</id>
<content type='text'>
</content>
</entry>
<entry>
<title>grobid+pdftext missing catch-up commands</title>
<updated>2020-08-05T20:10:56+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2020-08-05T20:10:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=af140307a025738767e740fea8da8d15e20fb983'/>
<id>urn:sha1:af140307a025738767e740fea8da8d15e20fb983</id>
<content type='text'>
</content>
</entry>
</feed>
