Getting Started with Screwjack(Basic)¶
Before you trying following, you should ensure screwjack is installed. Please refer Introduction for detail installation steps.
Step 1: Initialize a module¶
First, assume you want create a basic module, which is a template with basic functinality. If you interested in writing a ‘Hive’ module, please refer to Getting Started with Screwjack(Hive).
So, you can create a basic module with screwjack:
screwjack init basic --name="SVM" --description="A simple SVM"
Then, it will prompt to setup other options, like the following. In this tutorial, we will use scikit-learn, which are packed in base image zetdata/sci-python:2.7.
Module Version [0.1]:
Module Entry Command [python main.py]:
Base Image [zetdata/ubuntu:trusty]: zetdata/sci-python:2.7
Sucessfully created 'svm'
Or, you can use single command to do this:
screwjack init basic --name=SVM --description="A simple SVM" --version="0.1" --cmd="/usr/bin/python main.py" --base-image="zetdata/sci-python:2.7"
Now, you will get a directory with initial verison of basic module:
svm
|-- Dockerfile
|-- main.py
|-- requirements.txt
`-- spec.json
0 directories, 4 files
Then you should change to the directory of the new module, the following steps will assume we are working at that directory.
cd svm
Step 2: Add Input/Output/Param¶
Image we want create a module with two *Input*s, one Output, and one parameter. Just like the following diagram shows:
Now we can add a parameter using the following command:
screwjack param_add C float
And, we add two Inputs by the following commands. The first argument X means the name of the input/output, and the second argument csv means the type for this input/output. A type can be any string, like “csv”, “hive.hdfs.table:sub:x”. For more information about types, please follow Input/Output Types.
screwjack input_add X csv
screwjack input_add Y csv
Finally, a Output:
screwjack output_add MODEL model.svm
Step 3: Fill your code implementation¶
Now, you can write your awesome implementation now:
vim main.py
In this tutorial, we would like implement our main.py like this:
from datacanvas import DatacanvasRuntime
from sklearn.svm import LinearSVC
import numpy as np
import pickle
def main():
rt = DatacanvasRuntime()
settings = rt.settings
# TODO: Add your code here
X = np.genfromtxt(settings.Input.X, delimiter=',', skip_header=1)
Y = np.genfromtxt(settings.Input.Y, delimiter=',', skip_header=1)
svc = LinearSVC(C=float(settings.Param.C))
svc.fit(X,Y)
with open(settings.Output.MODEL, "w") as f:
pickle.dump(svc, f)
print("Done")
if __name__ == "__main__":
main()
If you want add additional files for this module, don’t forget add files in Dockerfile.
vim Dockerfile
For example, if you have additional file, you should append the following line into Dockerfile:
ADD your_additional_file /home/run/
In the case if you want add additional folder, you should append a line like this:
ADD your_additional_folder /home/run/your_additional_folder
For more information about Dockerfile, please reference Dockerfile.
Step 4.1: Test in local¶
After write own implementation into this module, we might want test it. The screwjack run subcommands are design for this.
screwjack run local --help
Usage: screwjack run local [OPTIONS]
Options:
--param-C TEXT Param(string)
--X TEXT Input
--Y TEXT Input
--MODEL TEXT Output
--help Show this message and exit.
Now, we can test our module in local environment, which is very close to your developing environment.
screwjack run local --param-C=0.1 --X=a.csv --Y=b.csv --MODEL=tmp.model
Note, in case you can not import “sklearn.svm”, you probably should install scikit-learn locally
pip install -U scikit-learn
Step 4.2: Test in docker¶
Then, we can try to execute module by docker:
screwjack run docker --param-C=0.1 --X=a.csv --Y=b.csv --MODEL=tmp.model