CloverETL is now CloverDX - Learn Why

Back to CloverDX Blog on Data Integration

Querying Twitter in CloverETL

Posted by Jan Sedláček on Jan 23, 2014 10:17:37 AM

These days, social networks are pervasive. It’s virtually impossible to avoid some kind of interaction with at least a few of them. Not only that, but the mere fact that so many people use them means there’s a ton of interesting data available within.

A typical example of such a popular network is Twitter, with more than 500 million tweets sent each day. Wouldn’t it be useful if you were able to querying Twitter to find tweets you want and then process them in bulk? The capacity to dig through heaps of social interactions in an effective manner is one of the core promises of Big Data – and it’s a valuable one. In this blog, I will show you how to do it with CloverETL.

First of all, you need to grant yourself access to Twitter so that you can use it to access the API later. Log in to https://dev.twitter.com/apps and select “Create new application” to set up your application. Fill in the name, description, and website here if you want, leave the Callback URL field empty, and submit the form.

After submitting, you’ll get to a page with application details. There is an OAuth settings section on this page where you can find “Consumer key” and “Consumer secret.” You’ll need these to connect from CloverETL.

querying twitter

Further down on the page, there is a “Your access token” section. Use “Create my access token” button. This might take some time, so wait a few seconds and then reload the page. You should see your “Access token” and “Access token secret” there. These two values will be used in CloverETL too.

querying twitter

With that, you’re done working on the Twitter side. Let’s now proceed to CloverETL.

We’re going to be using REST API, so we’ll basically be performing HTTP requests. The best component to achieve this target is the HTTPConnector.

querying twitter

To configure the HTTPConnector component, you need to specify these five attributes:

  • URL
  • OAuth Consumer key
  • OAuth Consumer secret
  • OAuth Access token
  • OAuth Access token secret

All OAuth attributes are taken from the registered Twitter application (see above). The URL depends on REST API method you want to use. For example, using https://api.twitter.com/1.1/search/tweets.json?q=%40CloverETL will search for tweets related to @CloverETL.

The result is returned in JSON format. You can either store it into file (Output file URL attribute of the component) or map the response content to an output port and process with other downstream components (e.g. JSONReader).

The attached example graph queries for the current Twitter trends and tweets related to them. Parses returned JSON for tweets attributes and stores them into XML file. Download Example: QueryingTwitterInCloverETL.zip

querying twitter

And with that, you've now waded through the noise to find exactly what you're looking for.

Data integration software and ETL tools provided by the CloverDX platform (formerly known as CloverETL) offer solutions for data management tasks such as data integration, data migration, or data quality. CloverDX is a vital part of enterprise solutions such as data warehousing, business intelligence (BI) or master data management (MDM). CloverDX Designer (formerly known as CloverETL Designer) is a visual data transformation designer that helps define data flows and transformations in a quick, visual, and intuitive way. CloverDX Server (formerly known as CloverETL Server) is an enterprise ETL and data integration runtime environment. It offers a set of enterprise features such as automation, monitoring, user management, real-time ETL, data API services, clustering, or cloud data integration.