how to do trivial projection and json parsing? (2024)

SPL can present a steeper learning curve compared with non-streaming languages. But once you get some basics, it is very rewarding for it gives you so much freedom. This said, SPL's JSON path notations need some getting used to. The JSON functions are actually OK once you understand the notations. Before I give my suggestions, let's examine your original trial.

| spath input=json.msg output=msg_raw path=json.msg

This will not give you desired output because in the embedded JSON object in json.msg does not contain a path named json.msg. The object that does contain this path is _raw. If you try

| spath ``` input=_raw implied ``` output=msg_raw path=json.msg

you would have extracted a field named msg_raw that duplicates the value of json.msg:

json.msgmsg_raw
{"name":"", "connection":22234743, "time":20000, "success":false, "type":"Prepared", "batch":false, "querySize":1, "batchSize":0, "query":["select * from whatever.whatever w where w.whatever in (?,?,?) "], "params":[["1","2","3"]]}{"name":"", "connection":22234743, "time":20000, "success":false, "type":"Prepared", "batch":false, "querySize":1, "batchSize":0, "query":["select * from whatever.whatever w where w.whatever in (?,?,?) "], "params":[["1","2","3"]]}

Of course, this is not what you wanted. What did we learn here? That path option in spath goes into the JSON object itself.

But if you try

| spath input=json.msg

you will get these fields from json.msg:

batchbatchSizeconnectionname

params{}{}

querySizequery{}successtimetype
false022234743

1

2

3

1select * from whatever.whatever w where w.whatever in (?,?,?)false20000Prepared

What did we learn here? Place that field name whose value is itself a valid JSON object directly in spath's input option to extract from that field. Additionally, Splunk uses {} to denote fields extracted from JSON array, and turn them into a multivalue field.

In your other comment, you said you want theequivalent of `jq '.json.msg|fromjson|.query[0]'`. Such would be trivial from the above result. Add

| eval jq_equivalent = mvindex('params{}{}', 0)| fields params* jq_equivalent

you get

params{}{}

jq_equivalent

1

2

3

1

What did we learn here? 1. mvindex selects value from a multivalue field (params{}{}), using base 0 index; 2. Use single quote to dereference value of field whose name contains special characters.

A word of caution: If all you want from params{}{} is a single multivalue field, the above can be sufficient. But params[[]] is an array of arrays. To complicate things, your developer doesn't do you the best of service by throwing in query[] array in the same flat structure. As the JSON arrayquery can have more than one element, my speculation is that your developer intended for each element in top level array of params to represent params to each element of query[].

What if, instead of

{\"name\":\"\", \"connection\":22234743, \"time\":20000, \"success\":false, \"type\":\"Prepared\", \"batch\":false, \"querySize\":1, \"batchSize\":0, \"query\":[\"select * from whatever.whatever w where w.whatever in (?,?,?) \"], \"params\":[[\"1\",\"2\",\"3\"]]}

your raw data contains json.msg of this value?

"{\"name\":\"\", \"connection\":22234743, \"time\":20000, \"success\":false, \"type\":\"Prepared\", \"batch\":false, \"querySize\":2, \"batchSize\":0, \"query\":[\"select * from whatever.whatever w where w.whatever in (?,?,?) \", \"select * from whatever.whatever2 w where w.whatever2 in (?,?) \"], \"params\":[[\"1\",\"2\",\"3\"],[\"4\",\"5\"]]}"

i.e., query[] and params[] each contains two elements? (For convenience, I assume that querySize represents the number of elements in these arrays. We can live without this external count but why complicate our lives in a tutorial.) Using the above search, you will find query{} and params{}{} to contain

querySize

query{}

params{}{}

2

select * from whatever.whatever w where w.whatever in (?,?,?)

select * from whatever.whatever2 w where w.whatever2 in (?,?)

1

2

3

4

5

This is one of shortcomings of flattening structured data like JSON, not unique to SPL but the shortcoming becomes more obvious. On top of the flattened structure, the spath command also cannot handle array of arrays correctly. Now what?

Here is what I would use to get past this barrier. (This is not the only way. But JSON functions introduced in 8.2 works really well while preserving semantic context.)

| spath input=json.msg| eval params_array = json_array_to_mv(json_extract('json.msg', "params"))| eval idx = mvrange(0, querySize) ``` assuming querySize is size of query{} ```| eval query_params = mvmap(idx, json_object("query", mvindex('query{}', idx), "params", mvindex(params_array, idx)))| fields - json.msg params* query{} idx| mvexpand query_params

With this, the output contains

batchbatchSizeconnectionnamequerySizequery_paramssuccesstimetype
false0222347432{"query":"select * from whatever.whatever w where w.whatever in (?,?,?) ","params":"[\"1\",\"2\",\"3\"]"}false20000Prepared
false0222347432{"query":"select * from whatever.whatever2 w where w.whatever2 in (?,?) ","params":"[\"4\",\"5\"]"}false20000Prepared

I think you know what I am going for by now. What did we learn here? To compensate for the unfortunate implied semantics your developer forces on you, first construct an intermediary JSON object that binds each query with each array of params. Then, use mvexpand to separate the elements. (Admittedly, json_array_to_mv is an oddball function at first glance. But once you understand how Splunk uses multivalue, you'll get used to the concept. Hopefully you will find many merits of using a multivalue representation.)

From here, you can use spath again to get desired results, but I find JSON functions to be simpler AND more semantic considering there are only two keys in this intermediary JSON. Add the following to the above

| eval query = json_extract(query_params, "query")| eval params = json_array_to_mv(json_extract(query_params, "params"))

With this, you get the final result

batchbatchSizeconnectionname

params

queryquerySizesuccesstimetype
false022234743

1

2

3

select * from whatever.whatever w where w.whatever in (?,?,?)2false20000Prepared
false022234743

4

5

select * from whatever.whatever2 w where w.whatever2 in (?,?)2false20000Prepared

Hope this is a useful format for your further processing.

Below is an emulation of the above 2-query mock data that I adapted from@ITWhisperer's original emulation. Play with it and compare with real data.

| makeresults| eval _raw="{ \"time\": \"2024-09-19T08:03:02.234663252Z\", \"json\": { \"ts\": \"2024-09-19T15:03:02.234462341+07:00\", \"logger\": \"<anonymized>\", \"level\": \"WARN\", \"class\": \"net.ttddyy.dsproxy.support.SLF4JLogUtils\", \"method\": \"writeLog\", \"file\": \"<anonymized>\", \"line\": 26, \"thread\": \"pool-1-thread-1\", \"arguments\": {}, \"msg\": \"{\\\"name\\\":\\\"\\\", \\\"connection\\\":22234743, \\\"time\\\":20000, \\\"success\\\":false, \\\"type\\\":\\\"Prepared\\\", \\\"batch\\\":false, \\\"querySize\\\":2, \\\"batchSize\\\":0, \\\"query\\\":[\\\"select * from whatever.whatever w where w.whatever in (?,?,?) \\\", \\\"select * from whatever.whatever2 w where w.whatever2 in (?,?) \\\"], \\\"params\\\":[[\\\"1\\\",\\\"2\\\",\\\"3\\\"],[\\\"4\\\",\\\"5\\\"]]}\", \"scope\": \"APP\" }, \"kubernetes\": { \"pod_name\": \"<anonymized>\", \"namespace_name\": \"<anonymized>\", \"labels\": { \"whatever\": \"whatever\" }, \"container_image\": \"<anonymized>\" }}"| spath``` data emulation ```

Hope this helps.

how to do trivial projection and json parsing? (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Mrs. Angelic Larkin

Last Updated:

Views: 5319

Rating: 4.7 / 5 (47 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Mrs. Angelic Larkin

Birthday: 1992-06-28

Address: Apt. 413 8275 Mueller Overpass, South Magnolia, IA 99527-6023

Phone: +6824704719725

Job: District Real-Estate Facilitator

Hobby: Letterboxing, Vacation, Poi, Homebrewing, Mountain biking, Slacklining, Cabaret

Introduction: My name is Mrs. Angelic Larkin, I am a cute, charming, funny, determined, inexpensive, joyous, cheerful person who loves writing and wants to share my knowledge and understanding with you.