aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-04-03 15:16:17 -0700
committerBryan Newbold <bnewbold@archive.org>2020-04-03 15:16:17 -0700
commitfb767adb9472ff85b46b5a383f3986950b12dd27 (patch)
tree724af4412353c627b0eae26fd4d7fd1164bf2b55
parent4cbbdf33ee2a9651f79f96e4bf290d8bc721f69d (diff)
downloadfatcat-covid19-fb767adb9472ff85b46b5a383f3986950b12dd27.tar.gz
fatcat-covid19-fb767adb9472ff85b46b5a383f3986950b12dd27.zip
move more directories around
-rw-r--r--extra/scrape/.gitignore (renamed from scrape/.gitignore)0
-rw-r--r--extra/scrape/README.md (renamed from scrape/README.md)0
-rwxr-xr-xextra/scrape/parse_cnki_tables.py (renamed from scrape/parse_cnki_tables.py)0
-rwxr-xr-xextra/scrape/parse_wanfang_html.py (renamed from scrape/parse_wanfang_html.py)0
-rw-r--r--notes/fulltext_search.md3
-rw-r--r--notes/machine_translation.md22
6 files changed, 25 insertions, 0 deletions
diff --git a/scrape/.gitignore b/extra/scrape/.gitignore
index b2bc71b..b2bc71b 100644
--- a/scrape/.gitignore
+++ b/extra/scrape/.gitignore
diff --git a/scrape/README.md b/extra/scrape/README.md
index 97bb6fe..97bb6fe 100644
--- a/scrape/README.md
+++ b/extra/scrape/README.md
diff --git a/scrape/parse_cnki_tables.py b/extra/scrape/parse_cnki_tables.py
index 3763550..3763550 100755
--- a/scrape/parse_cnki_tables.py
+++ b/extra/scrape/parse_cnki_tables.py
diff --git a/scrape/parse_wanfang_html.py b/extra/scrape/parse_wanfang_html.py
index 85187f5..85187f5 100755
--- a/scrape/parse_wanfang_html.py
+++ b/extra/scrape/parse_wanfang_html.py
diff --git a/notes/fulltext_search.md b/notes/fulltext_search.md
new file mode 100644
index 0000000..938abff
--- /dev/null
+++ b/notes/fulltext_search.md
@@ -0,0 +1,3 @@
+
+https://ambar.cloud/blog/2016/11/24/es-hl-large-docs/
+https://ambar.cloud/blog/2017/01/02/es-large-text/
diff --git a/notes/machine_translation.md b/notes/machine_translation.md
new file mode 100644
index 0000000..519f0c1
--- /dev/null
+++ b/notes/machine_translation.md
@@ -0,0 +1,22 @@
+
+Overall concept is to use machine translation to fill gaps or provide a
+starting point for human translation. Quality would obviously be best from
+experience human translators with domain knowledge.
+
+Ideas:
+- translate GROBID XML or JSON output offline
+- call a platform/API to translate individual papers
+- can directly link to google translate (or other platform) of the paper
+
+## Free Software
+
+NiuTrans: Chinese-to-english focus, C++
+
+Apertium: popular, but no Chinese?
+
+Moses: no Chinese examples?
+
+
+## Platforms
+
+Google Translate