aboutsummaryrefslogtreecommitdiffstats
path: root/pig/README.md
blob: d14d2ae05019b71b2812a0ea9f0d644f249a9f2e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

As of March 2018, the archive runs Pig version 0.12.0, via CDH5.0.1 (Cloudera
Distribution).

"Local mode" unit tests in this folder run with Pig version 0.17.0 (controlled
by `fetch_deps.sh`) due to [dependency/jar issues][pig-bug] in local mode of
0.12.0.

[pig-bug]: https://issues.apache.org/jira/browse/PIG-3530

## Development and Testing

To run tests, you need Java installed and `JAVA_HOME` configured.

Fetch dependencies (pig):

    ./fetch_deps.sh

Write .pig scripts here, and add a pytho wrapper test to `./tests/` when done.
Test vector files (input/output) can go in `./tests/files/`.

Run the tests with:

    pipenv run pytest

Could also, in theory, use a docker image ([local-pig][]), but it's pretty easy
to just download.

[local-pig]: https://hub.docker.com/r/chalimartines/local-pig

## Run in Production

    pig -param INPUT="/user/bnewbold/pdfs/global-20171227034923" \
        -param OUTPUT="/user/bnewbold/pdfs/gwb-pdf-20171227034923-surt-filter" \
        filter-cdx-paper-pdfs.pig