<feed xmlns='http://www.w3.org/2005/Atom'>
<title>sandcrawler/notes, branch trawler</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/sandcrawler/atom?h=trawler</id>
<link rel='self' href='https://git.bnewbold.net/sandcrawler/atom?h=trawler'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/'/>
<updated>2021-12-09T22:12:18+00:00</updated>
<entry>
<title>notes on re-GROBID-ing (and re-extracting) some files</title>
<updated>2021-12-09T22:12:18+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-09T22:12:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=e5c021bfeb03c50924160616dc64d44617d45933'/>
<id>urn:sha1:e5c021bfeb03c50924160616dc64d44617d45933</id>
<content type='text'>
</content>
</entry>
<entry>
<title>commit old patch crawl notes</title>
<updated>2021-12-02T03:06:00+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-12-02T03:06:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=85a9c9008ab66680047fb151996c55566d56cbe3'/>
<id>urn:sha1:85a9c9008ab66680047fb151996c55566d56cbe3</id>
<content type='text'>
</content>
</entry>
<entry>
<title>wrap up crossref refs backfill notes</title>
<updated>2021-11-11T01:25:34+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-11T01:25:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=bdccd79d741cab89cd28202a352044ed55624503'/>
<id>urn:sha1:bdccd79d741cab89cd28202a352044ed55624503</id>
<content type='text'>
</content>
</entry>
<entry>
<title>update crossref/grobid refs generation notes</title>
<updated>2021-11-05T00:19:52+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-05T00:17:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=c0da811394b9de8e30e94fa46933c72b8e5fdb19'/>
<id>urn:sha1:c0da811394b9de8e30e94fa46933c72b8e5fdb19</id>
<content type='text'>
</content>
</entry>
<entry>
<title>grobid refs backfill progress</title>
<updated>2021-11-05T00:19:52+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-02T03:04:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=2e9cd60819531ad73ce71f3a84109ad164624a40'/>
<id>urn:sha1:2e9cd60819531ad73ce71f3a84109ad164624a40</id>
<content type='text'>
</content>
</entry>
<entry>
<title>start notes on crossref refs backfill</title>
<updated>2021-11-05T00:19:52+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-02T00:55:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=1996cabae1be70d40cee03d1804a33797fc6d663'/>
<id>urn:sha1:1996cabae1be70d40cee03d1804a33797fc6d663</id>
<content type='text'>
</content>
</entry>
<entry>
<title>old (2020) notes on pdfextract cleanup</title>
<updated>2021-10-04T19:56:56+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-10-04T19:56:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=a613cf2fa66e59c412b9de15e487ab5d3431bb51'/>
<id>urn:sha1:a613cf2fa66e59c412b9de15e487ab5d3431bb51</id>
<content type='text'>
</content>
</entry>
<entry>
<title>notes on dumping PDF URL lists for partners</title>
<updated>2021-10-04T19:56:29+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-10-04T19:56:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=eebacc4fb395d829e342dab02f42591b29b942ae'/>
<id>urn:sha1:eebacc4fb395d829e342dab02f42591b29b942ae</id>
<content type='text'>
</content>
</entry>
<entry>
<title>daily OA crawl improvements/notes</title>
<updated>2021-09-08T19:16:44+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-09-08T19:16:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=ce25d59845083ca0beab98144b0c43bfc4254d6d'/>
<id>urn:sha1:ce25d59845083ca0beab98144b0c43bfc4254d6d</id>
<content type='text'>
</content>
</entry>
<entry>
<title>OAI-PMH patch and ingest improvement notes</title>
<updated>2021-09-04T01:34:33+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-09-04T01:34:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=d749a7a6a1c1d439596c5d053daf904638b4dbc2'/>
<id>urn:sha1:d749a7a6a1c1d439596c5d053daf904638b4dbc2</id>
<content type='text'>
</content>
</entry>
</feed>
