aboutsummaryrefslogtreecommitdiffstats
path: root/notes/examples/random_datasets.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2022-12-23 15:24:33 -0800
committerBryan Newbold <bnewbold@archive.org>2022-12-23 15:24:33 -0800
commitc42a7f46d29ca9d6e7c84b1b0617f272f71f25a5 (patch)
treeefe52857722e33473626b74e41a5fcdbcdd30a6c /notes/examples/random_datasets.md
parent3a7a946a1f271cd0f334129fa3bb51c451b82966 (diff)
downloadsandcrawler-c42a7f46d29ca9d6e7c84b1b0617f272f71f25a5.tar.gz
sandcrawler-c42a7f46d29ca9d6e7c84b1b0617f272f71f25a5.zip
notes: old examples
Diffstat (limited to 'notes/examples/random_datasets.md')
-rw-r--r--notes/examples/random_datasets.md19
1 files changed, 19 insertions, 0 deletions
diff --git a/notes/examples/random_datasets.md b/notes/examples/random_datasets.md
new file mode 100644
index 0000000..b69132c
--- /dev/null
+++ b/notes/examples/random_datasets.md
@@ -0,0 +1,19 @@
+
+Possible external datasets to ingest (which are not entire platforms):
+
+- https://research.google/tools/datasets/
+- https://openslr.org/index.html
+- https://www.kaggle.com/datasets?sort=votes&tasks=true
+- https://archive.ics.uci.edu/ml/datasets.php
+
+Existing archive.org datasets to ingest:
+
+- https://archive.org/details/allthemusicllc-datasets
+
+Papers on archive.org to ingest:
+
+- <https://archive.org/details/journals?and%5B%5D=%21collection%3Aarxiv+%21collection%3Ajstor_ejc+%21collection%3Apubmed&sin=>
+- <https://archive.org/details/biorxiv>
+- <https://archive.org/details/philosophicaltransactions?tab=collection>
+- <https://archive.org/search.php?query=doi%3A%2A>
+- <https://archive.org/details/folkscanomy_academic>