aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2019-05-07 17:30:10 -0700
committerBryan Newbold <bnewbold@robocracy.org>2020-08-19 12:08:53 -0700
commit88a99387e09c7c43803129e72215ef3f6b4cafc6 (patch)
treef039da420f4f4cb032561d194ae809396037dbff
parent5007ee299ce07b31db6d48cd4ab2587f87af53ab (diff)
downloadfatcat-88a99387e09c7c43803129e72215ef3f6b4cafc6.tar.gz
fatcat-88a99387e09c7c43803129e72215ef3f6b4cafc6.zip
initial sitemap.xml notes/template
-rw-r--r--extra/sitemap/README.md23
-rw-r--r--extra/sitemap/sitemap.xml6
2 files changed, 29 insertions, 0 deletions
diff --git a/extra/sitemap/README.md b/extra/sitemap/README.md
new file mode 100644
index 00000000..6963bb1f
--- /dev/null
+++ b/extra/sitemap/README.md
@@ -0,0 +1,23 @@
+
+Google has a limit of 50k lines / 10 MByte for text sitemap files, and 50K
+lines / 50 MByte for XML site map files.
+
+With a baseline of 100 million entities, that requires an index file pointing
+to at least 2000x individual sitemaps. 3 hex characters is 12 bits, or 4096
+options; seems like an ok granularity to start with.
+
+Should look in to what archive.org does to generate their sitemap.xml, seems
+simple, and comes in batches of exactly 50k.
+
+## Text Sitemaps
+
+Should be possible to create simple text-style sitemaps, one URL per line, and
+link to these from a sitemap index. This is appealing because the sitemaps can
+be generated very quickly from identifier SQL dump files, run through UNIX
+commands (eg, to split and turn into URLs). Some script to create an XML
+sitemap index to point at all the sitemaps would still be needed though.
+
+
+## Resources
+
+Google sitemap verifier: https://support.google.com/webmasters/answer/7451001
diff --git a/extra/sitemap/sitemap.xml b/extra/sitemap/sitemap.xml
new file mode 100644
index 00000000..4404bdc2
--- /dev/null
+++ b/extra/sitemap/sitemap.xml
@@ -0,0 +1,6 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
+ <url>
+ <loc>{{page[0]|safe}}</loc>
+ </url>
+</urlset>