CloverETL is now CloverDX - Learn Why

Back to CloverDX Blog on Data Integration

Working with subgraphs in CloverETL 4.0

Posted by Jan Sedláček on Nov 25, 2014 12:03:34 PM

Subgraphs are one of several new, exciting abilities in CloverETL 4.0. You may have already read some articles about them. To better illustrate the usage and benefits of subgraphs, let me guide you through one detailed example.

I was working on one presentation for one of our new customers and part of the presentation was a graph that generated data. The specific request was:

- to generate a data sample that looks like real people's contact information

- to show how to use the data sample for testing purposes

I started to build a graph to generate imaginary contacts. It takes the most common first names and surnames from a web source and then combines them randomly together. Afterward, it generates an email address for each of them.

Here is what the graph looks like:

Simplify data integration with subgraphs - reducing complexity of graph

Although for simple task, graph is already quite complex.

The graph consists of 3 different jobs:

1. It reads a list of first names and puts them into lookup

2. It reads a list of last names and puts them into lookup

3. It generates random combinations of first and last names

I realized that I can easily demonstrate subgraphs' functionality to the customer as well, as downloading and parsing names is used twice in the graph and it is almost same in both cases. The only thing that differs is the HTTP request URL and the regular expression pattern used for the extraction of the data. This sort of duplicity in a graph is typical for a good subgraph candidate; the only difference between those two will be in the subgraph parameters. Also creating a subgraph now will save me a lot of work in the future, because I will be able to create any graph with lookup data in just a few minutes.

So how can I create a subgraph from an existing graph?

Wrapping part of the existing graph into a subgraph

As I want to reuse part of the existing graph, the “Wrap as subgraph” functionality was exactly what I need. I select five components that are supposed to be in the graph, and then right-click to open the menu.

Creating subgraph with wrapping in CloverETL 4.0 - Simplify data integration

Selected components are highlighted

CloverETL shows a dialogue window with the subgraph wrapping wizard. You can change the name of the subgraph, configure the input and outputs ports, and check a preview of the wrapped subgraph.

Naming and setting subgraphs in CloverETL 4.0 - Simplify data integration

The basic setting of a subgraph––changing the name and configuring ports.

Or you can preview what the original graph will look like after you wrap its parts into a subgraph.

Creating subgraph - parent graph preview - Simplify data integration

Preview of parent graph after wraping up subgraph.

I was fine with the default name and input/output configuration, so I don't change anything. Clicking on "Finish" at this point closes the wizard, and a new subgraph will be created and the parent graph will be changed. You can see that the wrapped components are now gone and the subgraph component has replaced them.

Wrapped subgraph - part 1 - Simplify data integration

Graph after finishing wrapping wizard.

After clicking on the subgraph component, CloverETL opens the subgraph's content.

Complexity hidden in subgraph thus simplifying data integration process

Complexity is hidden in the subgraph.

Note the green and blue bars that represent the inputs and outputs of the subgraph component in the graph. In this case, this mean that there is one input and two outputs. The area on the right and left, behind the bars, allows you to debug the subgraph without messing with the graph. You can add any components behind those bars without affecting the parent graph at all.

Setting the parameters of a subgraph

In order for subgraphs to work and to relieve additional work from graph creation, it is important to parametrize their functionality so they can be used in different situations. To do this, you can use public parameters in subgraphs and set their value in the parent graph. I want to use a URL as a parameter for our subgraph. It's pretty simple. Just use "Export as subgraph parameter" for the attribute you want to export. And now you can update this parameter directly in the graph, without ever opening the subgraph.

Parametrising of a subraph in CloverETL 4.0 - Simplify data integration

Choose parameters, that will be visible from parent graph.

Utilizing the subgraph.

The final step is to use the subgraph to create the graph again. Here, I will create this graph from scratch, using a subgraph.

You can set the value of parameters as an attribute of the subgraph component in the parent graph.

Setting the parameters of subgraphs

Parameters are propagated to a subgraph properties.

The final graph looks like this:

Final graph - Simplify data integration with subgraphs

Simplification is visible immediately,

The original graph contained 16 components and was quite hard to understand. The new version, using a subgraph, now contains only half the number of components and is much easier to understand and navigate.

Also with the utilization of the subgraph, you will be able to reduce the time needed to create a new graph. And this is because of the new features available in CloverETL 4.0.

There are many other features, like metadata propagation and the execution view. We will get to these functions later, but for now, let's stick with these basic concepts. I encourage you to download CloverETL 4.0 now and try subgraphs for yourself. Start with wrapping subgraphs from existing graphs, as it is the easiest way to learn how to use them; you can immediately see how powerful they can be.

Data integration software and ETL tools provided by the CloverDX platform (formerly known as CloverETL) offer solutions for data management tasks such as data integration, data migration, or data quality. CloverDX is a vital part of enterprise solutions such as data warehousing, business intelligence (BI) or master data management (MDM). CloverDX Designer (formerly known as CloverETL Designer) is a visual data transformation designer that helps define data flows and transformations in a quick, visual, and intuitive way. CloverDX Server (formerly known as CloverETL Server) is an enterprise ETL and data integration runtime environment. It offers a set of enterprise features such as automation, monitoring, user management, real-time ETL, data API services, clustering, or cloud data integration.