<feed xmlns='http://www.w3.org/2005/Atom'>
<title>sandcrawler/python/tests/files, branch trawler</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/sandcrawler/atom?h=trawler</id>
<link rel='self' href='https://git.bnewbold.net/sandcrawler/atom?h=trawler'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/'/>
<updated>2021-11-05T00:19:52+00:00</updated>
<entry>
<title>initial crossref-refs via GROBID helper routine</title>
<updated>2021-11-05T00:19:52+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-10-29T19:16:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=d3fa74e941aa11f79cee2d0adcb5cbc70884ef48'/>
<id>urn:sha1:d3fa74e941aa11f79cee2d0adcb5cbc70884ef48</id>
<content type='text'>
</content>
</entry>
<entry>
<title>updates/corrections to old small.json GROBID metadata example file</title>
<updated>2021-10-28T02:10:57+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-10-28T02:10:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=7ccfd1989fe4192d66a3d6f8dd9807dcfee055a5'/>
<id>urn:sha1:7ccfd1989fe4192d66a3d6f8dd9807dcfee055a5</id>
<content type='text'>
</content>
</entry>
<entry>
<title>xml: re-encode XML docs into UTF-8 for persisting</title>
<updated>2020-11-04T06:40:14+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2020-11-04T06:40:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=653fac9632c6ae9dd036ad844454cf419cd5320b'/>
<id>urn:sha1:653fac9632c6ae9dd036ad844454cf419cd5320b</id>
<content type='text'>
</content>
</entry>
<entry>
<title>html: work around firstmonday DOCTYPE issue</title>
<updated>2020-10-31T00:20:22+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2020-10-31T00:20:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=e61d6e8cc3b6824816a83dff56ffbdbbb6329e57'/>
<id>urn:sha1:e61d6e8cc3b6824816a83dff56ffbdbbb6329e57</id>
<content type='text'>
</content>
</entry>
<entry>
<title>html: more metadata tests</title>
<updated>2020-10-29T21:31:21+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2020-10-29T21:31:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=3d56509ef83226a808ebb078f5cac9815afb5d9d'/>
<id>urn:sha1:3d56509ef83226a808ebb078f5cac9815afb5d9d</id>
<content type='text'>
</content>
</entry>
<entry>
<title>start HTML metadata extraction code</title>
<updated>2020-10-27T22:42:22+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2020-10-27T22:27:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=ae851f3f205b741dbc826c3197cdd3cc9bde8802'/>
<id>urn:sha1:ae851f3f205b741dbc826c3197cdd3cc9bde8802</id>
<content type='text'>
</content>
</entry>
<entry>
<title>basic elife+plos extraction tests</title>
<updated>2020-01-10T00:30:02+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2020-01-09T02:24:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=7eb8f74bc15d1acb5771320ec4e2342d85077555'/>
<id>urn:sha1:7eb8f74bc15d1acb5771320ec4e2342d85077555</id>
<content type='text'>
Ripped out some HTML, but these could have been minimized even further
to keep repository from growing large.
</content>
</entry>
<entry>
<title>teixml2json test update for skipping null JSON keys</title>
<updated>2020-01-03T02:12:58+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2019-12-27T05:33:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=9756f536b24ba13c34d0dcc49153020477cce012'/>
<id>urn:sha1:9756f536b24ba13c34d0dcc49153020477cce012</id>
<content type='text'>
</content>
</entry>
<entry>
<title>grobid2json: language_code</title>
<updated>2019-10-04T21:09:59+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2019-10-04T21:09:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=ee797ddc0a1377423cfe1939634e6d019eecea9e'/>
<id>urn:sha1:ee797ddc0a1377423cfe1939634e6d019eecea9e</id>
<content type='text'>
</content>
</entry>
<entry>
<title>python tests for pusher classes</title>
<updated>2019-10-03T01:02:40+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2019-10-03T01:02:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=910e59e9011935fecaae62520eb3fc30cbd65800'/>
<id>urn:sha1:910e59e9011935fecaae62520eb3fc30cbd65800</id>
<content type='text'>
</content>
</entry>
</feed>
