<feed xmlns='http://www.w3.org/2005/Atom'>
<title>fatcat/extra/elasticsearch, branch v0.3.3</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/fatcat/atom?h=v0.3.3</id>
<link rel='self' href='https://git.bnewbold.net/fatcat/atom?h=v0.3.3'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/'/>
<updated>2020-07-01T23:34:13+00:00</updated>
<entry>
<title>commit example of an elasticsearch SQL query</title>
<updated>2020-07-01T23:34:13+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-07-01T23:34:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=274da5d3994e9f1a4ddabf2d3ddba06c5db1aa73'/>
<id>urn:sha1:274da5d3994e9f1a4ddabf2d3ddba06c5db1aa73</id>
<content type='text'>
</content>
</entry>
<entry>
<title>ES schema: add best_url to file schema</title>
<updated>2020-06-04T21:12:30+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-06-04T21:01:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=a42d5f0d00e76bf8474647fae4e1d9d61693a7d9'/>
<id>urn:sha1:a42d5f0d00e76bf8474647fae4e1d9d61693a7d9</id>
<content type='text'>
This will increase index size (URLs are often long in our corpus, and we
have many file entities), but seems worth it.

Initially added `ia_url` as a second field, guaranteed to always be an
*.archive.org URL, but `best_url` defaults to that anyways so didn't
seem worthwhile.
</content>
</entry>
<entry>
<title>ES README: really need to limit to 1k esbulk batches</title>
<updated>2020-02-27T07:11:11+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-02-27T07:11:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=b4ab8501636b2891976ff867a064e02a478de065'/>
<id>urn:sha1:b4ab8501636b2891976ff867a064e02a478de065</id>
<content type='text'>
</content>
</entry>
<entry>
<title>update ES transform README</title>
<updated>2020-02-26T20:27:30+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-02-26T20:27:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=0ab3f66664fd4cc63cf9040e351d725c6a5c22b9'/>
<id>urn:sha1:0ab3f66664fd4cc63cf9040e351d725c6a5c22b9</id>
<content type='text'>
- smaller batch sizes to prevent esbulk errors
- file transform/index
</content>
</entry>
<entry>
<title>ES container last tweaks</title>
<updated>2020-02-26T19:29:30+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-02-26T19:28:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=4e6bc246d01183f4c7ffad7d0d474e683f04c07f'/>
<id>urn:sha1:4e6bc246d01183f4c7ffad7d0d474e683f04c07f</id>
<content type='text'>
</content>
</entry>
<entry>
<title>ES release: last minor tweaks</title>
<updated>2020-02-26T19:22:30+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-02-26T19:22:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=0450f22006c9b991cdc4695458fc3b3e3e97bfbb'/>
<id>urn:sha1:0450f22006c9b991cdc4695458fc3b3e3e97bfbb</id>
<content type='text'>
</content>
</entry>
<entry>
<title>release schema: do doc_value on DOIs</title>
<updated>2020-02-13T22:23:01+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-02-13T22:22:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=2f8788152ff740d049d11e2e263cac978d526e2a'/>
<id>urn:sha1:2f8788152ff740d049d11e2e263cac978d526e2a</id>
<content type='text'>
Because DOIs are pseudo-structured (prefix, and often structure within
the publisher-controlled area), I suspect we will in fact be wanting to
do analytics over these strings.
</content>
</entry>
<entry>
<title>ES release: actually do want doc_values for work_id</title>
<updated>2020-02-05T23:42:45+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-02-05T23:42:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=3655bbe6c539fdeccfbfaa19b6fc93a4859e0ca7'/>
<id>urn:sha1:3655bbe6c539fdeccfbfaa19b6fc93a4859e0ca7</id>
<content type='text'>
Eg, for fast "unique count"
</content>
</entry>
<entry>
<title>fix axiv/arxiv typo in release schema</title>
<updated>2020-02-04T23:10:26+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-02-04T23:10:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=8007cdfc4e06753a9bbba56d1fa7f9686775e5e8'/>
<id>urn:sha1:8007cdfc4e06753a9bbba56d1fa7f9686775e5e8</id>
<content type='text'>
</content>
</entry>
<entry>
<title>ES release schema: fix typo</title>
<updated>2020-01-31T21:33:38+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-01-31T21:33:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=fbd79c7315cad4789eb0e92c136c59da8f38c4f3'/>
<id>urn:sha1:fbd79c7315cad4789eb0e92c136c59da8f38c4f3</id>
<content type='text'>
</content>
</entry>
</feed>
