aboutsummaryrefslogtreecommitdiffstats
path: root/python
diff options
context:
space:
mode:
Diffstat (limited to 'python')
-rw-r--r--python/notes/overview.md20
1 files changed, 20 insertions, 0 deletions
diff --git a/python/notes/overview.md b/python/notes/overview.md
new file mode 100644
index 0000000..27b9177
--- /dev/null
+++ b/python/notes/overview.md
@@ -0,0 +1,20 @@
+# Generic data processing approach
+
+## A basic setup
+
+* Python for orchestration; data deps; structured outputs; multistage pipelines
+ with inspectable intermediate results
+* Go for custom, fast tools, when needed
+
+## Quick Fusion
+
+For data fusion, e.g. merging OL works and editions to get a joint dataset:
+
+* select and tabularize data
+* a one-off merge script using, e.g. Pandas
+
+## Hadoop Scripting
+
+* Pig Latin
+* PySpark
+