SPL can present a steeper learning curve compared with non-streaming languages. But once you get some basics, it is very rewarding for it gives you so much freedom. This said, SPL's JSON path notations need some getting used to. The JSON functions are actually OK once you understand the notations. Before I give my suggestions, let's examine your original trial.
| spath input=json.msg output=msg_raw path=json.msg
This will not give you desired output because in the embedded JSON object in json.msg does not contain a path named json.msg. The object that does contain this path is _raw. If you try
| spath ``` input=_raw implied ``` output=msg_raw path=json.msg
you would have extracted a field named msg_raw that duplicates the value of json.msg:
json.msg | msg_raw |
{"name":"", "connection":22234743, "time":20000, "success":false, "type":"Prepared", "batch":false, "querySize":1, "batchSize":0, "query":["select * from whatever.whatever w where w.whatever in (?,?,?) "], "params":[["1","2","3"]]} | {"name":"", "connection":22234743, "time":20000, "success":false, "type":"Prepared", "batch":false, "querySize":1, "batchSize":0, "query":["select * from whatever.whatever w where w.whatever in (?,?,?) "], "params":[["1","2","3"]]} |
Of course, this is not what you wanted. What did we learn here? That path option in spath goes into the JSON object itself.
But if you try
| spath input=json.msg
you will get these fields from json.msg:
batch | batchSize | connection | name | params{}{} | querySize | query{} | success | time | type |
false | 0 | 22234743 | 1 2 3 | 1 | select * from whatever.whatever w where w.whatever in (?,?,?) | false | 20000 | Prepared |
What did we learn here? Place that field name whose value is itself a valid JSON object directly in spath's input option to extract from that field. Additionally, Splunk uses {} to denote fields extracted from JSON array, and turn them into a multivalue field.
In your other comment, you said you want theequivalent of `jq '.json.msg|fromjson|.query[0]'`. Such would be trivial from the above result. Add
| eval jq_equivalent = mvindex('params{}{}', 0)| fields params* jq_equivalent
you get
params{}{} | jq_equivalent |
1 2 3 | 1 |
What did we learn here? 1. mvindex selects value from a multivalue field (params{}{}), using base 0 index; 2. Use single quote to dereference value of field whose name contains special characters.
A word of caution: If all you want from params{}{} is a single multivalue field, the above can be sufficient. But params[[]] is an array of arrays. To complicate things, your developer doesn't do you the best of service by throwing in query[] array in the same flat structure. As the JSON arrayquery can have more than one element, my speculation is that your developer intended for each element in top level array of params to represent params to each element of query[].
What if, instead of
{\"name\":\"\", \"connection\":22234743, \"time\":20000, \"success\":false, \"type\":\"Prepared\", \"batch\":false, \"querySize\":1, \"batchSize\":0, \"query\":[\"select * from whatever.whatever w where w.whatever in (?,?,?) \"], \"params\":[[\"1\",\"2\",\"3\"]]}
your raw data contains json.msg of this value?
"{\"name\":\"\", \"connection\":22234743, \"time\":20000, \"success\":false, \"type\":\"Prepared\", \"batch\":false, \"querySize\":2, \"batchSize\":0, \"query\":[\"select * from whatever.whatever w where w.whatever in (?,?,?) \", \"select * from whatever.whatever2 w where w.whatever2 in (?,?) \"], \"params\":[[\"1\",\"2\",\"3\"],[\"4\",\"5\"]]}"
i.e., query[] and params[] each contains two elements? (For convenience, I assume that querySize represents the number of elements in these arrays. We can live without this external count but why complicate our lives in a tutorial.) Using the above search, you will find query{} and params{}{} to contain
querySize | query{} | params{}{} |
2 | select * from whatever.whatever w where w.whatever in (?,?,?) select * from whatever.whatever2 w where w.whatever2 in (?,?) | 1 2 3 4 5 |
This is one of shortcomings of flattening structured data like JSON, not unique to SPL but the shortcoming becomes more obvious. On top of the flattened structure, the spath command also cannot handle array of arrays correctly. Now what?
Here is what I would use to get past this barrier. (This is not the only way. But JSON functions introduced in 8.2 works really well while preserving semantic context.)
| spath input=json.msg| eval params_array = json_array_to_mv(json_extract('json.msg', "params"))| eval idx = mvrange(0, querySize) ``` assuming querySize is size of query{} ```| eval query_params = mvmap(idx, json_object("query", mvindex('query{}', idx), "params", mvindex(params_array, idx)))| fields - json.msg params* query{} idx| mvexpand query_params
With this, the output contains
batch | batchSize | connection | name | querySize | query_params | success | time | type |
false | 0 | 22234743 | 2 | {"query":"select * from whatever.whatever w where w.whatever in (?,?,?) ","params":"[\"1\",\"2\",\"3\"]"} | false | 20000 | Prepared | |
false | 0 | 22234743 | 2 | {"query":"select * from whatever.whatever2 w where w.whatever2 in (?,?) ","params":"[\"4\",\"5\"]"} | false | 20000 | Prepared |
I think you know what I am going for by now. What did we learn here? To compensate for the unfortunate implied semantics your developer forces on you, first construct an intermediary JSON object that binds each query with each array of params. Then, use mvexpand to separate the elements. (Admittedly, json_array_to_mv is an oddball function at first glance. But once you understand how Splunk uses multivalue, you'll get used to the concept. Hopefully you will find many merits of using a multivalue representation.)
From here, you can use spath again to get desired results, but I find JSON functions to be simpler AND more semantic considering there are only two keys in this intermediary JSON. Add the following to the above
| eval query = json_extract(query_params, "query")| eval params = json_array_to_mv(json_extract(query_params, "params"))
With this, you get the final result
batch | batchSize | connection | name | params | query | querySize | success | time | type |
false | 0 | 22234743 | 1 2 3 | select * from whatever.whatever w where w.whatever in (?,?,?) | 2 | false | 20000 | Prepared | |
false | 0 | 22234743 | 4 5 | select * from whatever.whatever2 w where w.whatever2 in (?,?) | 2 | false | 20000 | Prepared |
Hope this is a useful format for your further processing.
Below is an emulation of the above 2-query mock data that I adapted from@ITWhisperer's original emulation. Play with it and compare with real data.
| makeresults| eval _raw="{ \"time\": \"2024-09-19T08:03:02.234663252Z\", \"json\": { \"ts\": \"2024-09-19T15:03:02.234462341+07:00\", \"logger\": \"<anonymized>\", \"level\": \"WARN\", \"class\": \"net.ttddyy.dsproxy.support.SLF4JLogUtils\", \"method\": \"writeLog\", \"file\": \"<anonymized>\", \"line\": 26, \"thread\": \"pool-1-thread-1\", \"arguments\": {}, \"msg\": \"{\\\"name\\\":\\\"\\\", \\\"connection\\\":22234743, \\\"time\\\":20000, \\\"success\\\":false, \\\"type\\\":\\\"Prepared\\\", \\\"batch\\\":false, \\\"querySize\\\":2, \\\"batchSize\\\":0, \\\"query\\\":[\\\"select * from whatever.whatever w where w.whatever in (?,?,?) \\\", \\\"select * from whatever.whatever2 w where w.whatever2 in (?,?) \\\"], \\\"params\\\":[[\\\"1\\\",\\\"2\\\",\\\"3\\\"],[\\\"4\\\",\\\"5\\\"]]}\", \"scope\": \"APP\" }, \"kubernetes\": { \"pod_name\": \"<anonymized>\", \"namespace_name\": \"<anonymized>\", \"labels\": { \"whatever\": \"whatever\" }, \"container_image\": \"<anonymized>\" }}"| spath``` data emulation ```
Hope this helps.