aboutsummaryrefslogtreecommitdiffstats
path: root/proposals
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2021-10-04 16:13:30 -0700
committerBryan Newbold <bnewbold@archive.org>2021-10-15 18:15:25 -0700
commit147319ae00a6b788104209083f65cbaa4329c862 (patch)
treeb3a880781acccbff1f298b74f28b067b670df605 /proposals
parent271f110e5ad4091e8d683b4365bc565ae0466916 (diff)
downloadsandcrawler-147319ae00a6b788104209083f65cbaa4329c862.tar.gz
sandcrawler-147319ae00a6b788104209083f65cbaa4329c862.zip
dataset ingest: start enumerating examples
Diffstat (limited to 'proposals')
-rw-r--r--proposals/2021-09-09_dataset_ingest.md34
1 files changed, 34 insertions, 0 deletions
diff --git a/proposals/2021-09-09_dataset_ingest.md b/proposals/2021-09-09_dataset_ingest.md
index 801a8e5..d4d2be4 100644
--- a/proposals/2021-09-09_dataset_ingest.md
+++ b/proposals/2021-09-09_dataset_ingest.md
@@ -183,3 +183,37 @@ Second implement fatcat importer and test locally and/or in QA.
Lastly implement infrastructure, automation, and other "glue".
+
+## Example Entities
+
+### ArchiveOrg: CAT dataset
+
+<https://archive.org/details/CAT_DATASET>
+
+`release_36vy7s5gtba67fmyxlmijpsaui`
+
+###
+
+<https://archive.org/details/academictorrents_70e0794e2292fc051a13f05ea6f5b6c16f3d3635>
+
+doi:10.1371/journal.pone.0120448
+
+Single .rar file
+
+### Dataverse
+
+<https://dataverse.rsu.lv/dataset.xhtml?persistentId=doi:10.48510/FK2/IJO02B>
+
+Single excel file
+
+### Dataverse
+
+<https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/CLSFKX&version=1.1>
+
+doi:10.7910/DVN/CLSFKX
+
+Mulitple files; multiple versions?
+
+Single file inside:
+
+<https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/CLSFKX/XWEHBB>