summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--notes/background_reading.md49
1 files changed, 49 insertions, 0 deletions
diff --git a/notes/background_reading.md b/notes/background_reading.md
new file mode 100644
index 0000000..054479a
--- /dev/null
+++ b/notes/background_reading.md
@@ -0,0 +1,49 @@
+
+## Fulltext Search
+
+["Is searching full text more effective than searching abstracts?"](https://fatcat.wiki/release/od4ug42qtjgm7g5um6hua7t4tm)
+
+["The structural and content aspects of abstracts versus bodies of full text journal articles are different"](https://fatcat.wiki/release/h7dvxxzhwncq7jjbjwst7vuvz4)
+
+Anserini:
+- [github repo](https://github.com/castorini/anserini)
+- [Elastirini: Anserini Integration with Elasticsearch](https://github.com/castorini/anserini/blob/master/docs/elastirini.md)
+
+[Forty Days and Forty Nights: Re-indexing 7+ million books (part 1)](https://www.hathitrust.org/blogs/large-scale-search/forty-days-and-forty-nights-re-indexing-7-million-books-part-1) (2011)
+
+[Challenges for HathiTrust full-text search](https://www.hathitrust.org/blogs/large-scale-search/challenges)
+
+As a partial solution, perhaps for autocorrect,
+[sonic](https://github.com/valeriansaliou/sonic) is a fast fst search index.
+
+
+## Search Experience, Design, Features
+
+[Hathitrust Features Analysis](https://www.hathitrust.org/full-text-search-features-and-analysis)
+
+"From Keyword Search to Exploration: Designing Future Search Interfaces for the Web"
+https://web.archive.org/web/20110611085129/http://www.cs.swan.ac.uk/~csmax/pubs/FnTWebSci-Wilson.pdf
+
+"The Researcher's Journey: Scholarly Navigation of an Academic Library Web Site"
+https://www.tandfonline.com/doi/full/10.1080/19322909.2010.525368
+
+Do, or Do Not, Make Them Think?: A Usability Study of an Academic Library Search Box
+https://www.tandfonline.com/doi/full/10.1080/19322909.2019.1684223
+
+Designing Search: Effective Search Interfaces for Academic Library Web Sites
+https://www.tandfonline.com/doi/abs/10.1080/19322900802473944
+
+"The Archived Web: Doing History in the Digital Age" (review)
+https://www.tandfonline.com/doi/full/10.1080/19322909.2019.1656496
+
+
+## Derived Datasets
+
+[2016 Hathitrust extracted features](https://www.hathitrust.org/extracted-features-announcement)
+
+
+## Tech Tools
+
+[Pub2TEI: JATS XML to GROBID-style TEI-XML](https://github.com/kermitt2/Pub2TEI)
+
+https://github.com/kermitt2/article-dataset-builder