blob: d14d2ae05019b71b2812a0ea9f0d644f249a9f2e (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
As of March 2018, the archive runs Pig version 0.12.0, via CDH5.0.1 (Cloudera
Distribution).
"Local mode" unit tests in this folder run with Pig version 0.17.0 (controlled
by `fetch_deps.sh`) due to [dependency/jar issues][pig-bug] in local mode of
0.12.0.
[pig-bug]: https://issues.apache.org/jira/browse/PIG-3530
## Development and Testing
To run tests, you need Java installed and `JAVA_HOME` configured.
Fetch dependencies (pig):
./fetch_deps.sh
Write .pig scripts here, and add a pytho wrapper test to `./tests/` when done.
Test vector files (input/output) can go in `./tests/files/`.
Run the tests with:
pipenv run pytest
Could also, in theory, use a docker image ([local-pig][]), but it's pretty easy
to just download.
[local-pig]: https://hub.docker.com/r/chalimartines/local-pig
## Run in Production
pig -param INPUT="/user/bnewbold/pdfs/global-20171227034923" \
-param OUTPUT="/user/bnewbold/pdfs/gwb-pdf-20171227034923-surt-filter" \
filter-cdx-paper-pdfs.pig
|