aboutsummaryrefslogtreecommitdiffstats
path: root/extra/sitemap/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'extra/sitemap/README.md')
-rw-r--r--extra/sitemap/README.md21
1 files changed, 21 insertions, 0 deletions
diff --git a/extra/sitemap/README.md b/extra/sitemap/README.md
new file mode 100644
index 0000000..242378a
--- /dev/null
+++ b/extra/sitemap/README.md
@@ -0,0 +1,21 @@
+
+## HOWTO: Update
+
+Requires [fatcat-cli](https://gitlab.com/bnewbold/fatcat-cli) and `jq`
+installed. Run these commands on a production machine.
+
+ cd /srv/fatcat_scholar/sitemap
+ export DATE=`date --iso-8601`
+ /srv/fatcat_scholar/src/extra/sitemap/work_urls_query.sh $DATE
+ rm *.txt.gz
+ /srv/fatcat/src/extra/sitemap/generate_sitemap_indices.py
+
+## Background
+
+Google has a limit of 50k lines / 10 MByte for text sitemap files, and 50K
+lines / 50 MByte for XML site map files. Google Scholar has indicated a smaller
+20k URL / 5 MB limit.
+
+## Resources
+
+Google sitemap verifier: https://support.google.com/webmasters/answer/7451001