- Overview
- Prerequisites
- Features
- How to bring a Python script on Klera
- Example Script
- Features
- Guidelines
- Sending intermediate output from python script to Klera
Klera Script Integration Service
Overview
Klera Script Integration Service enables the user to register a Python Script as an operation and/or a formula. Once registered, these scripts can then be run as otherKlera operations.
Prerequisite
Hardware Requirements:
Parameter | Specifications (Minimum Recommended) |
RAM | 4 GB |
CPU | 2 Cores or 4 Hyper threads |
Disk | 100 GB |
Note: RAM requirements depend on the scripts that the user registers.
Software Requirements:
Parameter | Specifications (Minimum Recommended) |
Operating System |
Windows Server 2012 R2 (64 Bit) Windows Server 2019 Server |
Features
- Register: To register a Python script as a Klera operation or formula or both
- Show: To show the registered Python scripts
- Delete: To delete the selected script
- Edit: To edit the selected script
- Export: Enables user to export an operation created by using the Python script
- Import: Import an operation
Notes:
- Supported data types:
- Integer, Long, Double, Boolean, String, Date, Time, DateTime
- If an argument of Python script can accept Integer/Long/Double, use “Numeric” as data type for that argument in “Register Python Script” form.
- An argument can accept more than one data type
- Date, Time, DateTime data is received as
- Long (epoch in millisecond), if the argument is of type ‘Data’
- String (date/datetime/time), if the argument is of type ‘Param’
- Variables and imports declared at the script level are not accessible in the functions. If you need to access them in functions, then declare them as Global. This is applicable to lambda functions as well.
- Catch exceptions (try except) in Python scripts so that scripts are run smoothly without exiting due to exceptions.
- Script can accept single object or list of objects of supported data types.
- Script can give one or more outputs (DST). Each output should be a dictionary of pandas data frames. Data types of output columns are detected dynamically.
How to bring a Python script on Klera
- Input
- Add a line at the top of the script. Define a list with name "klera_in" which holds input variable names. #If the script requires two input arguments "features" and "n_clusters", define the list 'klera_in' as following: klera_in = ["features", "n_clusters"]
- Supported data types: Integer, Long, Double, Boolean, String, Date, Time, DateTime, File
- If an argument of Python script can accept Integer/Long/Double, then use “Numeric” as data type for that argument in “Register Python Script” form.
- Script can accept single object or list of objects of supported data types.
- If an argument accepts predefined values, then the values should be mentioned in the script through ‘klera_in_details’ dictionary.
- klera_in_details is a dictionary. Keys are input arguments. All the input arguments must be present as keys.
- If an argument takes predefined values, value for that key is a dictionary. Value for “enum” key of this dictionary should be the actual values and value for the key “enumTitles” should be the display values. Values of the “enumTitles” key are used for in the drop-down in the input form for the given input argument. If “enumTitles” is not present, actual values are used for display in the form.
- If an argument does not have any predefined value, then the value for that key should be an empty dictionary. All the input arguments should be present as keys in the ‘klera_in_details’ dictionary.
- Ex:
klera_in_details ={
"text":{},
"from_lang":{
"enum":["en","hi"],
"enumTitles":["English","Hindi"]
},
"to_lang":{
"enum":["en","hi"],
"enumTitles":["English","Hindi"]
},
"Date_column":
{
"description":"Date",
"datatype":["INT"],
"argtype":"Data",
"multivalued":False,
"multicolumn":False
},
"Steps":
{
"description":"Steps Count",
"datatype":["INT","LONG"],
"argtype":"Param",
"multivalued":False,
"multicolumn":False,
"min":5,
"max":10,
"default":7
}
}
- In the above example, ‘text’ argument does not have predefined set of values. It is shown as a text box in the input form. ‘from_lang’ accepts predefined values. Its actual values are ["en","hi"] and the corresponding display values are ["English","Hindi"]. This field is shown as a drop-down in the form. When user selects a display value, its corresponding actual value is sent to Python script. ‘Date_column’ argument is defined with datatype, argtype, etc…. All these will be filled automatically while registering the script. “Steps argument” contains minimum, maximum and default keys. This field will be shown on form with the given default value and validation with given min and max values.
- To mask any password, user can provide the key value ‘“masked”:True’ which will enable the entered string to be masked in the input form.
- If klera_in is an empty list (klera_in = []), it is interpreted that the script does not take any inputs/arguments and the registered operation is exposed on the floor.
- Output (for klera operation) :
- Define a list "klera_dst" to hold multiple outputs from script. Each output from the script should be a dictionary of pandas data frames.
- Key in the dictionary is used as "DST Name" for the Klera DST and column names of the pandas data frame are used as column names for DST.
- Examples
- If pandas data frames 'df1' and 'df2' need to be shown as output DSTs to user, define a dictionary with key as DST name and value as data frame as following.
out_dict = {“Entities": df1, "Objects": df2}
# Define a list 'klera_dst' to hold multiple items to be returned. Each item should be a dictionary of data frames as following.
klera_dst = [out_dict]
# To return two dictionaries of data frames, out_dict and status, define klera_dst as following:
klera_dst = [out_dict, status] - If a scalar (single value) 'lang' has to be returned as column 'Language' in DST 'Detected languages', do the following:
# Convert the scalar value to a data frame with given column name and put in a dictionary with key as DST name.
import pandas as pd
out_dict = {'Detected languages': pd.DataFrame({'Language': lang}, index=[0])}
# Add the dictionary to the 'klera_dst' list.
klera_dst = [out_dict] - If multiple scalars need to be shown as output, ('lang' as column 'Language' and 'langcode' as column 'Language code' in DST 'Detected languages'), do the following:
# Convert the scalar values to a data frame with given column names and put in a dictionary with key as DST name.
import pandas as pd
out_dict = {'Detected languages': pd.DataFrame({'Language': lang, 'Language code': langcode}, index=[0])}
# Add the dictionary to the 'klera_dst' list.
klera_dst = [out_dict] - If a list 'langs' has to be returned as column 'Languages' in DST 'Detected languages', do the following:
# Convert the list to a data frame with given column name and put in a dictionary with key as DST name.
import pandas as pd
out_dict = {'Detected languages': pd.DataFrame({'Languages': langs})}
# Add the dictionary to the 'klera_dst' list.
klera_dst = [out_dict] - If multiple lists need to be shown as output, ('langs' as column 'Languages' and 'langcodes' as column 'Language codes' in DST 'Detected languages'), do the following:
# Convert the list to a data frame with given column name and put in a dictionary with key as DST name.
import pandas as pd
out_dict = {'Detected languages': pd.DataFrame({'Languages': langs, 'Language codes': langcodes})}
# Add the dictionary to the 'klera_dst' list.
klera_dst = [out_dict]
- If pandas data frames 'df1' and 'df2' need to be shown as output DSTs to user, define a dictionary with key as DST name and value as data frame as following.
- Output (for klera formula) :
- Define "klera_scalar" variable and assign the value to be returned as the output in the case of formula
- Define "klera_scalar_datatype" variable to denote the data type of the output.
- If the klera_scalar is of integer data type, define 'klera_scalar_datatype' as follows:
klera_scalar_datatype = 'INT' - If the klera_scalar is of long data type, define 'klera_scalar_datatype' as follows:
klera_scalar_datatype = 'LONG' - If the klera_scalar is of double data type, define 'klera_scalar_datatype' as follows:
klera_scalar_datatype = 'DOUBLE' - If the klera_scalar is of numeric (int, long, double) data type, define 'klera_scalar_datatype' as follows:
klera_scalar_datatype = 'NUMERIC’ - If the klera_scalar is of string data type, define 'klera_scalar_datatype' as follows:
klera_scalar_datatype = 'STRING' - If the klera_scalar is of boolean data type, define 'klera_scalar_datatype' as follows:
klera_scalar_datatype = 'BOOL'
- If the klera_scalar is of integer data type, define 'klera_scalar_datatype' as follows:
- Define "klera_scalar_ismultivalue" variable to denote if the output is multi-value or not.
- If an integer value in variable 'intValue' needs to be returned as output for a formula, do the following:
# Output value
klera_scalar = intValue
# Output data type
klera_scalar_datatype = 'INT'
# Output isMultivalue
klera_scalar_ismultivalue = False - If a numeric value in variable 'numericValue' needs to be returned as output for a formula, do the following:
# Output value
klera_scalar = numericValue
# Output data type
klera_scalar_datatype = 'NUMERIC'
# Output isMultivalue
klera_scalar_ismultivalue = False - If a multi-value string (i.e. list) in variable 'strValueList' needs to be returned as output for a formula, do the following:
# Output value (strValueList should be a list)
klera_scalar = strValueList
# Output data type
klera_scalar_datatype = 'STRING'
# Output isMultivalue
klera_scalar_ismultivalue = True
- If an integer value in variable 'intValue' needs to be returned as output for a formula, do the following:
Note :
- Klera Script Integration Service comes with Python and some popular packages. If your script requires packages other than the given, then please mention the required package(s) names while registering your script.
- Only one level of nesting of user defined functions in a script is supported. i.e. Your script can call a user defined function. But the user defined functions in your script cannot call another user defined function.
- Complete script should be in single file.
- Scripts shall not use __future__ statements.
- It is advisable to register scripts that do not require a lot of computational resources (CPU, RAM).
- All the operations registered using Script integration are exposed on container level only.
- Please make sure that if the user is editing the input or the output details from the script file, he/she is required to re-register the script.
Example Script
Please follow the below steps to register a clustering script on Klera:
Actual Script :
# Clustering
from sklearn.cluster import KMeans
import numpy as np
# Number of clusters
num_clusters = 3
# Features
features = np.array([[1, 2], [1, 4], [1, 0],
[10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=num_clusters, random_state=0).fit(features)
# Get the cluster labels for each point
cluster_labels = kmeans.labels_.tolist()
Modified Script as per the guidelines:
# Clustering
from sklearn.cluster import KMeans
import numpy as np
import pandas as pd
# klera_in : List of Input variable names
klera_in = ["features","num_clusters"]
kmeans = KMeans(n_clusters=num_clusters, random_state=0).fit(features)
# Get the cluster labels for each point
cluster_labels = kmeans.labels_.tolist()
# Output Details for Klera Operation
#Define a dictionary of pandas data frames
out_dict = {"Cluster Labels": pd.DataFrame(data={"Clusters": cluster_labels})}
# klera_dst : List to hold multiple outputs
klera_dst = [out_dict]
# Output Details for Klera Formula
# Klera Formula output
klera_scalar = cluster_labels
# Klera Formula data type
klera_scalar_datatype = "INT"
# Klera Formula isMultivalue
klera_scalar_ismultivalue = False
Features
Register
Registers a Python script as Klera operation or formula or both
he operation is exposed on DST Scripts Register
Script
- Operation Name: Name of the operation to be exposed on Klera
- Formula Name: Name of the Formula to be exposed on Klera
- Short Description for Operation/Formula: Description of the operation/formula
- Hierarchy to guide user (Level1/Level2): Hierarchy of the operation that would be created and visible to user
For Ex: If Hierarchy is created as UserOps, Operation, Test, then operation would be exposed as UserOps>Operation>Test on Klera
By default, the hierarchy would be given as Scripts, Operations - Script: Browse to script file to be registered as operation
After the user clicks Submit, the script will be validated based on the input(s) provided. Kindly ensure that the operation is run on appropriate DST which has the correct input data types or parameters that the script requires.
Packages
If your script(s) require any Python package, then it can be installed using this option.
- Package Name: Enter the name of the package.
- Version: Enter the package version number. If nothing is mentioned, then the latest version would be installed.
Note: Requires active internet connection. - Offline package(s): Use this option to install Python package(s) offline.
Dependencies
Script dependent files: Any other dependent file that is required by the script
Show
Shows the registered Python scripts
Operation is exposed on floor. OnFloor>Scripts>Show
Delete
Deletes the selected Python script
Operation would be exposed on the columns of “show” operation output DST.
OperationName>Scripts>Delete
Edit
Edits the selected Python script
Operation would be exposed on the columns of “show” operation output DST.
OperationName>Scripts>Edit
Export
Export an operation created by using the Python script.
Operation would be exposed on the columns of “show” operation output DST.
Operation Name>Scripts>Export
Import
Import an operation created by using the Python script.
Operation is exposed on Floor
On Floor>Scripts>Import
Guidelines
- Do not modify the input arguments/variables. In the following code, input argument “File_Path” is modified. Avoid this and use a different variable to hold the modified value of “File_Path”.
klera_in_details = {
"File_Path": {
"datatype": ["string"],
"argtype": "Data",
"required": True,
"multiplerows":True
}
}
File_Path = File_Path[8:]
Sending intermediate output from python script to Klera
This feature is available from KSI 2.2.0 onwards.
With previous version of KSI, python scripts could send the output only after the complete execution of the python script. With this new provision, python scripts can send intermediate outputs along with final output.
If a python script has some intermediate output to be sent to Klera, follow the below steps:
- Import MessageSender and MessageType from MessageSender module. from MessageSender import MessageSender, MessageType
- To send intermediate output (available in klera_dst) from python script, create a message block and push it to queue using the 'push_data_to_queue' function of 'klera_message_sender' global object.
- Create a dictionary for data_block with 'klera_dst' and 'klera_meta_out' keys
- klera_dst is assigned as value for 'klera_dst' key in data_block dictionary. klera_dst is created in the same way the output is prepared in normal python scripts which send output at the end of the execution.
- klera_meta_out is assigned as value for 'klera_meta_out' key in data_block dictionary. If klera_meta_out is not available, assign None for 'klera_meta_out' key.
- Create a dictionary for message_block with 'message_type' and 'data_block' keys.
- Message type is assigned as value for 'message_type' key in message_block dictionary. Use MessageType.DATA enum value for data block.
- data_block created in the previous step (2.a) is assigned as value for 'data_block' key in message_block dictionary.
- Push the message block using the function ‘push_data_to_queue’ of ‘klera_message_sender’ object. ‘klera_message_sender‘ object is already created and available for this script.
Example code to push intermediate data to Klera:
# Create a data block
data_block = {}
data_block['klera_dst'] = klera_dst
data_block['klera_meta_out'] = None
# Prepare a message block.
message_block = {}
# Message Type
message_block ['message_type'] = MessageType.DATA
message_block ['data_block'] = data_block
# Push the data to Klera
klera_message_sender.push_data_to_queue(klera_message_block)
- Create a dictionary for data_block with 'klera_dst' and 'klera_meta_out' keys
- For reference, an example python script sending intermediate output is attached.
Limitations
- Do not send intermediate output from python scripts which are created to process each input record (The operations for which “Apply on each record” check box is selected).
Guidelines
- Do not send too many intermediate outputs from a python script in short interval of time. This will hit the system's performance.
- Choose the size of the intermediate outputs and the gap between the intermediate outputs in the python script such that it does not hit the system performance.
- If a script sends an output multiple times, make sure the meta of the data is same each time. If klera_meta_out contains meta details for an output, it is used. Otherwise, meta is derived using the first intermediate part of an output and meta is assumed to be same for the next intermediate parts also.