Handling JSON with MDR_DATAGEN & MDR_DATAINTO
Overview
The RPGLE compiler determines the layout of the JSON structure at compile-time for RPG DATA-INTO and DATA-GEN functions.
This is only available from V7R2 upwards, and is not available at all in COBOL or other languages.
MDR_DATAGEN and MDR_DATAINTO are MDRFRAME functions that do the equivalent of IBM's DATA-INTO and DATA-GEN BIF's.
A schema is required to map between the JSON structure and the program DATA structures for MDR_DATAGEN and MDR_DATAINTO.
Create a MDRFRAME schema
The MDRCBLCPY command creates copybook source with these formats from a JSON example.
JSON Schema format
The purpose of this format (or “schema”) is to represent the layout of the data structure for a program that uses MDR_DATAINTO or MDR_DATAGEN.
{
"variable name": { "=type”: "XXX", "=len": xx },
"variable name": { "=type”: "XXX", "=len": xx },
"variable name": { "=type”: "XXX", "=len": xx },
"structure name": {
"=dim": xx,
"variable": { "=type”: "XXX", "=len": xx, "=dec": xx },
"array variable": { "=type”: "XXX", "=len": yyy, "=dim": xx },
},
"=sym": "X"
}
Names:
Names of variables should match the JSON – except they should be converted to valid variables according to the “case=convert” rules specified for RPG’s DATA-INTO. There is a procedure called MDR_caseConvert that you can call to convert the name.
-
Any accented characters are converted to unaccented variants.
-
Anything that remains that is not a letter or number is converted to the “symbol” (=sym) character.
-
Any consecutive symbols are merged into a single symbol.
-
If there is a symbol at the start of a variable name, it is removed.
-
If the variable name begins with a numeric digit, it is prefixed with the letter N.
For example: consider the following JSON key name: "21!!ALó"
-
The accented character is replaced with an unaccented one. "21!!ALO”
-
Symbols are replaced with an underscore. “21__ALO”
-
Consecutive symbols are merged. “21_ALO”
-
If the variable name begins with a digit, an N is added as a prefix “N21_ALO”
-
(or N21-ALO for COBOL)
Attributes:
Attributes describe a variable. Attribute names begin with the = (equal sign) sign to distinguish them from variable names. The equal sign was chosen because it exists in all character sets, but is not normally used as part of a variable name in programming languages.
-
=type – (REQUIRED FOR ALL FIELDS) data type of variable. Can be:
-
char = fixed length character
-
charz = C-style, zero-terminated variable length character string. The length will indicate the maximum length of the string, including the zero-terminator.
-
varchar(2) = RPG/SQL varchar data type, the first 2 bytes are an unsigned integer representing the current length, this is followed by up to =len characters (=len indicates the maximum)
-
varchar(4) = same as varchar(2), except the length prefix is 4 bytes long. This is required for fields that are larger than 65535.
-
ucs2 = fixed-length UCS-2 characters. These are double-byte characters in Unicode.
-
varucs2(2) = variable-length UCS-2. Like VARCHAR(2), except that the characters are double byte characters in UCS-2 rather than single byte in EBCDIC. The length specifies the number of characters (not the number of bytes.)
-
varucs2(4) = same as varucs2(2), except with a 4-byte length prefix.
-
ind = indicator. =len must be 1. Value must be either ‘0’ or ‘1’
-
zoned = zoned decimal type. =len will indicate the total number of digits (including fractions) and =dec will indicate the number of fractional digits.
-
packed = packed decimal type. =len will indicate the total number of digits (including fractions) and =dec will indicate the number of fractional digits.
-
binary = this is meant to be used for RPG’s “bindec” (fixed format type B) fields. This should be avoided where possible. For =len of 1-4, a 2 byte binary is generated. For =len 5-9 a 4-byte is generated. (Same as RPG.) =dec indicates the number of decimal places.
-
int = integer. (RPG fixed format type I, free format “int”). The =len value must be 3 for a 1 byte field, 5 for a 2 byte, 10 for 4 bytes, or 20 for 8 bytes. (same as RPG). Integers may not have decimal places, the =dec attribute is ignored.
-
uns = unsigned integer. Same as “int”, but allows larger numbers and cannot be negative. Lengths are the same as “int”. Cannot have decimals.
-
float = floating point. =len must be 8 for a double-precision floating point (8 bytes), or 4 for a single precision (4 bytes.) Since the number of decimal places can vary, the =dec attribute is not used during calculations, however, =dec will be used to control the number of decimal places generated when added to a JSON/XML document. For example, if you had the number 1.5, and =dec is 2, when added to JSON/XML document, it would be formatted as 1.50. NOTE: Floating point fields are imprecise, and not recommended.
-
date = date field. The format is assumed to match the format of the customer data, no conversions are done.
-
time = time field. The format is assumed to match the format of the customer data, no conversions are done
-
timestamp = timestamp field. The format is assumed to match the format of the customer data, no conversions are done.
-
=dim – (ARRAYS ONLY) array dimensions (only include if the fieldis an array or data structure array.)
-
=len – (REQUIRED FOR ALL FIELDS) length of field indigits/characters.
-
=dec – (NUMERIC ONLY) number of decimal positions.
-
=sym – replacement symbol, generally would be _ (underscore) for most languages, or – (dash) for Cobol. This should be specified only once, at the top level, the same symbol applies to the entire format (it is not specific to a subfield.)
Countprefix fields:
If the customer uses the “countprefix” option the fields MUST also be included in the the above JSON. (Even though they will not be included in the payload JSON/XML.) Countprefix fields must always be “=type”: “int”, “=len”: 10
renameprefix fields:
If the customer uses the “renameprefix” option, the field MUST also be included in the above JSON. (Even though they will not be included in the payload JSON/XML.) A renameprefix field must be coded with “=type” set to char, charz, varchar(2) or varchar(4), and should have “=len” set to a number between 1 and 4096.
Example:
Suppose that you wanted a COBOL program to use MDR_DATAGEN to create a response in the following JSON format:
{
"Stock": [
{
"Department": "STRING",
"Category": {
"MainCategory": "STRING",
"Sub-Category": "STRING"
},
"Sizes": [ NUMBER, NUMBER, NUMBER ],
"Colours": [ "White", "Black", "Green", "Red", "Blue" ]
}
]
}
Note
note that all of the arrays are variable-length. for the sake of this example, we will say that there is a maximum of 10 elements in any array.
In order to process this data in COBOL, you want to generate from the following data structure:
01 WS-RESP.
10 WS-NUM-STOCK PIC S9(8) USAGE BINARY VALUE 0.
10 WS-RESP-STOCK OCCURS 10.
15 WS-STOCK-DEPARTMENT PIC X(10).
15 WS-STOCK-CATEGORY.
20 WS-STOCK-CAT-MAINCATEGORY PIC X(10).
20 WS-STOCK-CAT-NAME-SUB-CATEGORY PIC X(20).
20 WS-STOCK-CAT-SUB-CATEGORY PIC X(10).
15 WS-STOCK-NUM-SIZES PIC S9(8) USAGE BINARY VALUE 0.
15 WS-STOCK-SIZES PIC S9(8) USAGE BINARY OCCURS 10.
15 WS-STOCK-NUM-COLOURS PIC S9(8) USAGE BINARY VALUE 0.
15 WS-STOCK-COLOURS PIC X(20) OCCURS 10.
The counprefix feature will be used to control the number of array elements for each of the arrays. The caller will specify countprefix=num_ so that any field that begins with “num_” is the count of a corresponding element.
The renameprefix feature will be used to allow the “Sub-Category” field to have a variable name.
The schema should look as follows:
{
"num_Stock": {"=type": "int","=len": 10 },
"stock": { "=dim": 10,
"Department": { "=type":"char","=len":10},
"Category":{
"MainCategory":{"=type":"char", "=len": 10 },
"Name_Sub_Category": { "=type": "char", "=len": 20 },
"Sub_Category": {"=type": "char", "=len": 10 }
},
"num_sizes": {"=type":"int","=len":10},
"sizes":{"=type":"int","=len":10,"=dim":10},
"num_Colours":{"=type":"int","=len":10},
"Colours":{"=type":"char","=len": 20, "=dim": 10}
}
}
These field names (aside from the num_ and name_ fields) must match those that will be generated into the resulting JSON or XML elements.
Notice that the actual Cobol names don’t have to match the schema(MDR_DATAGEN and MDR_DATAINTO have no way to know what the actual Cobol names are.) In this respect, it is different from RPG’s DATA-INTO and DATA-GEN, as the schema there is based on the actual names in the program.
The fact that the names don’t need to match is helpful, because in Cobol it’s often advantageous to prefix fields with something like “WS-RESP”,such as “WS-RESP-STOCK”, etc. (Though, this isn’t always ideal, as it prevents features like MOVE CORRESPONDING from working nicely.)
MDR_DATAGEN will create the JSON or XML payload names exactly the same as the names in the schema, except in the case where they are changed with a renameprefix. MDR_DATAGEN will use the schema’s lengths, data types and decimal positions to calculate the correct place in the caller’s memory, and then read the from memory at that position, and format it for the payload document based on the data type.
MDR_DATAINTO will load the fields in the JSON/XML payload by converting the payload field name using the case=convert methodology, then looking for the matching field in the schema. If found, it will calulate the correct place in memory based on the lengths specified in the schema,and will generate data at that location based on the data type, length,and decimal positions given in the schema.
Once the correct format (schema) is determined, the JSON should be“squished” (spaces, indents, carriage returns, etc removed) and itshould be placed in a variable in the copybook. MDR_DATAINTO andMDR_DATAGEN will detect whether your variable is fixed-lengthcharacter, C-style null-terminated character, or VARCHAR(4) – it must be provided in one of those formats. Typically, in Cobol the most efficient way is to generate a data structure where the first 4 bytes are thelength, and the remainder is the schema – this is equivalent to a VARCHAR(4) in RPG.
For Example:
01 WS-RESP-FORMAT.
10 WS-RESP-LEN PIC 9(8) USAGE BINARY VALUE 412.
10 WS-RESP-001 PIC X(256) VALUE '{"num_Stock":{"=type":"int",
- '"=len":10},"stock":{"=dim":10,"Department":{"=type":"cha
- 'r","=len":10},"Category":{"MainCategory":{"=type":"char"
- ',"=len":10},"Name_Sub_Category":{"=type":"char","=len":2
- '0},"Sub_Category":{"=type":"char","=len":10}},"num_sizes
- '":{"'.
10 WS-RESP-002 PIC X(156) VALUE '=type":"int","=len":10},"siz
- 'es":{"=type":"int","=len":10,"=dim":10},"num_Colours":{"
- '=type":"int","=len":10},"Colours":{"=type":"char","=len"
- ':20,"=dim":10}}}'.
The WS-RESP-FORMAT may now be passed to MDR_DATAGEN in the first parameter.
COBOL CALL PROCEDURE 'MDR_DATAGEN' USING WS-RESP-FORMAT WS-RESP OMITTED WS-RESP-OPT END-CALL.
Options (the 4^t^h parameter):
Both MDR_DATAGEN and MDR_DATAINTO provide options that can be used to control how they read/generate JSON or XML documents. The following are the properties they understand:
-
trace = specifies a diagnostic trace file in the IFS. The value should start with FILE= or FILEAPPEND= followed by the filename. For example: trace=FILE=/tmp/trace.txt will create a file named /tmp/trace.txt and place the trace in it. If there is already a file with that name, it will be replaced. If FILEAPPEND is given, instead of replacing the existing file it will add the trace to the end of the file.
-
trim = can be “all” (default) which causes leading/trailing blanks to be trimmed from character strings, or “none” which causes the blanks to be retained.
-
countprefix = specifies a prefix to be placed before field names in the data structure that are to be used to control the number of elements for that item.
-
renameprefix = (MDR_DATAGEN only) specifies that fields beginning with a given prefix will be used to dynamically set the name of the field being output.
-
req = can be set to no to not process the HTTP request, or yes (default) to process the request. If yes, MDR_DATAINTO will read the request body sent via HTTP instead of reading the user’s payload variable. MDR_DATAGEN will write the output to the HTTP response if req=yes. value_true – when a boolean value in the JSON is true, this specifies the value to be written to the user’s variable. Default is ‘1’ (because it’s assumed that you will map booleans to indicators.)
-
value_false – when a boolean value in the JSON is false, this specifies the value to be written to the user’s variable. Default is ‘0’ (because it’s assumed that you will map booleans to indicators.)
-
value_null – when a JSON document contains a null element, this is the value that will be written to your variable. Default: *NULL.
-
ccsid = may be ucs2, utf16, utf8 or job.
-
document_name = the name of the outermost element of the payload document. When generating XML, this corresponds to the name of the outermost XML tag. When reading data, this corresponds to the name of the data structure in the program.
-
doc = controls where the JSON comes from (MDR_DATAINTO) or goes to (MDR_DATAGEN) when using req=no. Default is doc=string, which means the third parameter to MDR_DATAGEN/MDR_DATAINTO will be used for the data for the payload document. Also supported is doc=file, which causes the third parameter to be used as an IFS pathname, and the document will be read/written to that IFS file.
-
datasubf = when working with XML, this determines which subfield of an structure is used for the “data”. Other subfields are treated as sub-element names or attributes. This parameter is ignored for JSON documents.
-
ns = when reading XML, this determines what should happen with namespaces when reading XML. The default value ns=keep means that the variable name is expected to include the XML namespace. Specify ns=remove to strip the namespace from the name. This parameter is ignroed for JSON documents.
-
beautify = when generating a payload document, this determines whether the generator adds indenting and linefeeds to make it easier for a human to read. Default is beautify=no. Specify beautify=yes to make it nicer for a person to read.
-
format = when generating a payload, this can be used to override the format. Specify format=xml to cause the output to be xml, or format=json for json. By default, the format will be JSON. If the input is read from an HTTP request, the content type will be used to default the output format, so content-types text/xml or application/xml will cause the output to be treated as xml, anything else will cause the output to be json. The format parameter will have no effect on MDR_DATAINTO – instead, MDR_DATAINTO will attempt to determine the format from the content-type (if reading from HTTP) or the contents of the document (when not reading from HTTP.)
Options are specified in a string in the 4<supth</sup parameter to MDR_DATAGEN or MDR_DATAINTO, as a space separated list. For example:
'countprefix=num_ nameprefix=name_ format=json req=no'
A value after the equal-sign may be put in quotes if you need to include spaces in the value. For example:
'trace=”FILE=/tmp/diagnostic log.txt” req=no'