CloverETL is now CloverDX - Learn Why

Back to CloverDX Blog on Data Integration

Parsing List Data in XML and JSON with CloverDX 4.1

Posted by Jan Sedláček on Jan 30, 2016 2:09:44 PM

Parsing of XML and JSON data is an important task in data integration because the majority of today's APIs and Web Services are built around these two formats. CloverDX has JSONExtract and XMLExtract components which do just that.

With the release of CloverDX version 4.1 we added the ability to to parse JSON array elements and repeating elements in XML (sequences) directly into lists in CloverDX records.

A metadata field usually holds only a single value of predefined data type (string, integer, etc.). However, you can switch the field to “list” or “map” mode which makes it hold an array of values (of the same data type). Some components can work with these directly, or you can easily manipulate lists in CTL.

You can read our blog on list and map container types or check out our documentation for finer details.

Parsing list data. Set CloverETL to list or map.

Look at “galleries” and “tags” in these simple examples:

JSON


{
"filename": "dsc_74532.jpg",
"galleries": ["family", "2015"],
"tags": ["summer", "children", "Peter", "Mary"]
}

XML


<?xml version="1.0" encoding="UTF-8" ?>
<catalog>
<file>
<filename>dsc_74532.jpg</filename>
<galleries>family</galleries>
<galleries>2015</galleries>
<tags>summer</tags>
<tags>children</tags>
<tags>Peter</tags>
<tags>Mary</tags>
</file>
</catalog>

You can see these have multiple values that need to be parsed and processed. Previously, you had to work around such multi-value elements by having multiple record streams (and therefore multiple edges) instead of using multi-value fields.

See how it looks with lists instead of mutiple record streams:

Graph for parsing list data before and after update.

As you can see, we've saved a component for each multi-value element, plus a 'join' component to put everything back together. Undoubtedly, it can save a lot of time in development when incoming data structures change. And of course, this improvement also positively affects performance.

Here's how it works...

Let’s see how the new list mapping feature works for the sample XML data above (galleries and tags).

Map the parsing list in the CloverETL.
The mapping looks simple, just like if we were dealing with regular single value fields. For obvious reasons this only works if the repeating element does not contain additional sub elements (i.e. if is not a structured element on its own).

I am sure it will be helpful for many of you.

Why not share your feedback in the comments...

Data integration software and ETL tools provided by the CloverDX platform (formerly known as CloverETL) offer solutions for data management tasks such as data integration, data migration, or data quality. CloverDX is a vital part of enterprise solutions such as data warehousing, business intelligence (BI) or master data management (MDM). CloverDX Designer (formerly known as CloverETL Designer) is a visual data transformation designer that helps define data flows and transformations in a quick, visual, and intuitive way. CloverDX Server (formerly known as CloverETL Server) is an enterprise ETL and data integration runtime environment. It offers a set of enterprise features such as automation, monitoring, user management, real-time ETL, data API services, clustering, or cloud data integration.