Apervi Blog

Viewpoints, musings and articles to enable thought and action

Apervi Blog
Why invest in a Big Data Integration tool?

November 14th, 2015 – Harish Narayan, BD and Marketing, Apervi

Companies continually have to evaluate an explosion of new tools and paradigms, as they evolve and compete in a data-driven world. The amount of growth in big data, the sources from where it is coming from, and the spectrum of data from unstructured to structured, means that companies have to have a strategy for how to ingest, store & process data, and leverage the resulting information to drive decisions and analytics. In order to integrate data to be useful, should a company invest in developing and coding custom solutions, or a platform that focuses on data integration and orchestration to enable analytics. Are you facing this situation?

Answers to certain key questions could help determine your choice. These questions are:

  • Does the vision of your enterprise include modern data architecture with traditional and new technologies co-existing?
  • Will traditional ETL continue to have a place in enabling analytics?
  • Does your enterprise expect to leverage existing talent to implement projects involving big data?
  • How critical is not only cost to implement solutions, but maintaining them?

In most enterprises, the evolving data architecture will be one where newer technologies will complement existing ones. Specifically, if your enterprise has a vision to better use existing batch data, will see a continued growth of data in terms of volume and variety, and sees the need to process data in real-time, then it opens up two main needs. First - the need to quickly connect to sources that have or provide such data; and second - to quickly massage and process such data. Inevitably, storage and processing needs will result in you having to consider technologies like columnar databases and Hadoop. An enterprise grade tool that can not only ingest data from a myriad of sources, but also move data for storage and processing on Hadoop, in addition to existing technologies, would be critical to your overall success.

To process large volumes of batch data, ETL is still a great option. If your organization has ETL tools, an EDW, and a BI program in place, then this will not be replaced overnight. Chances are that, traditional BI and emerging analytics will co-exist for a while. However, in order to optimize cost, much of the EDW storage can be moved to Hadoop. Instead of only ETL, the strategy could also involve EL to Hadoop and then transformation in-place leveraging the Hadoop ecosystem. This usually results in huge cost savings. There could also be emerging real-time processing needs using streaming data. In order for your enterprise to efficiently implement a solution, there would be a need to now create and maintain multiple pipelines, for ex., ETL to EDW, ELT to Hadoop, and real-time integration to engines like Spark. A data integration tool that can accomplish all this, while still offering the plug-n-play UI capability that traditional ETL tools offer, would ensure faster progression and significant savings in effort to create and maintain such data pipelines.

The chance of completely reinvesting in new talent to drive data-based initiatives is neither prudent nor feasible. Like most enterprises, if your organization has skilled engineers who are well-versed in using ETL tools, database analysts who have expertise in querying, administrators who schedule and monitor jobs, and analysts who depend on BI; then ensuring that these roles continue to succeed is critical. A data integration tool that offers similar paradigms to design workflows, features to embed queries in the pipeline, and the ability to schedule & monitor jobs – will ensure much faster adoption and a more seamless transformation towards big data analytics.

Finally, almost all integration that involves big data – from ingestion, loading, transformation and preparation for analytics – could be accomplished in your enterprise without a data integration tool. However, many of the new technologies that support big data analytics require custom coding and scripting in languages like Java, Pig, Scala, etc. Such skills are not typically available in an enterprise and are in very short supply in the market. In addition, such solutions significantly increase development cost, and in a world where new data needs to continually be merged with existing data sets, maintaining integration pipelines would be even more costly. A data integration platform that can not only speed up design and implementation by abstracting out the need to write any code, but also make maintenance very cost efficient, would be worth its value in gold to your enterprise.

Consider this – data integration technologies have shown to increase the speed of implementing data-based solutions multifold, and reduce cost by up to 10X.

You can connect with me at - LinkedIn Icon

You can follow Apervi at – Twitter Bird Icon

At Strata+Hadoop World NY 2015, it was all about Spark, Fast Data Integration, and Enterprise Analytics

October 14th, 2015 – Harish Narayan, BD and Marketing, Apervi

The expansive Javits Center in mid-town Manhattan, NYC was humming with activity between September 29th and October 1st, and the latest ‘buzz’ was all things Big Data. Including our team from Apervi, many of the now well-known companies shared technology and possibilities at the Expo, while interesting case studies piqued my interest at the conference.

Three themes struck a note with me from this conference:

  • The explosion of Spark for streaming
  • Application of ML and Data Science in the Enterprise
  • Emergence of big data integration into the mainstream.

From a ‘Spark’ to an Explosion – It seems that every demo or presentation is touting Spark for all things streaming (well, it’s actually micro-batch, but that’s for another day). And why not? Spark’s in-memory processing, especially iterative computations on blocks of data at high performance of sometimes ‘nearly 100x faster than Hadoop’ is very enticing. It is cost-effective and plays well with all file types and formats supported by Hadoop. Available APIs, ML libraries and graph processing capabilities makes for a rich ecosystem. While not as mature as Hadoop yet, there were a number of announcements for Spark support from Talend, Syncsort, SnapLogic, in addition to Spark as a Service offerings from Databricks, Qubole, Altiscale and others. We at Apervi also announced full-support in our platform’s ability to process workloads on Spark.

Enterprise ML is evolving and Data Science offers opportunity – Enterprises are using technologies, open source or otherwise, and leveraging key partnerships to improve decision making with predictive analytics. One presentation by Intel and Penn Medicine shared that by using more data and advanced analytics, they can predict patient illness, clinicians can increase detection rates, and reduce care costs. This article provides more detail. O’Reilly’s 2015 Data Science Salary Survey points to the gap and need for Data Scientists in enterprises, while showing how SQL is still ‘numero uno’. In addition to technical and statistical skills, the ability of data scientists to present findings and collaborate with the business continues to be a differentiator. We also saw many vendors expand their ML and predictive analytics offerings.

Big Data Integration is now ever important – It is no secret. Most enterprises spend more than 50% of their time preparing data for integration. Many spend over 40% of the time just connecting to data sources. Data integration technology is clearly needed to make this process easy and efficient for enterprises, and a key to successful adoption. Key vendors announced additional offerings around integration. Notable was the Hortonworks Dataflow Technology (HDF), designed to help users with data movement of large datasets. Companies emerged out of stealth, and after the conference we have seen further announcements of investment and growth in this space.

Finally, a note about our platform, Apervi Conflux – It is focused on big data integration offering full support for batch, streaming and micro-batch data pipelines. The platform ensures low cost of ownership by fully leveraging a customer's existing Hadoop investment. Users can design high throughput streaming data flows in micro-batch mode and deploy on Spark streaming. The platform's differentiated set of features for processing data leveraging Storm includes event and time-based windowing, in-memory and DB caching, real-time lookups, processing multiple streams, and support for event injection. Here’s our most recent release.

You can follow us at – Twitter Bird Icon

Try This! Integrate with Hadoop leveraging Apervi Conflux on Hortonworks Ambari Views

September 25th, 2015 – Uday Sagi, SVP Product and Engineering, Apervi.

Apervi Conflux is already one of the first big data integration products beta tested as an Ambari View. It will be ready in Q4, 2015 and can be downloaded from the Hortonworks marketplace.

Ambari provides a common, secure, pluggable UX for Hadoop to help operators, system admins, data engineers and application developers. Ambari Views provides a common point of entry for user communities with ‘Views’ embedded in the Ambari UI. It is a UI framework enabled by Hortonworks, where community members can contribute ‘Views’ that plugged into the framework.

Presently, views that have been deployed include operational managers, query editors, and visualizers. I see views as a way to also deploy ‘fully functional trial versions’ of more complex big data products. It is a way to engage the user community in using new products in the ever-changing big data space.

It is fairly straightforward to deploy views on the Ambari framework. Client-side assets can be deployed as an ‘Ambari Web’ view, and server-side resources as an ‘Ambari Server View’. This Hortonworks Gallery Repo is a good place to start.

Working closely with Hortonworks, we are in the process of deploying such a view for Apervi Conflux in Q4, 2015. It will be downloadable from the Hortonworks marketplace. It is already one of the first big data integration products beta tested as an Ambari View into the Ambari container. We will deploy the HTML5 Conflux UI as an Ambari Web view, and the Conflux web application service on the server-side. The figure below is a simple illustration.

Integrate with Hadoop

Apervi Conflux’s low footprint and extensible services based web architecture has allowed a seamless plugin into Ambari, and we believe it will be a great way for data engineers to try an enterprise-ready product quickly and easily.

Meet us at Strata NY 2015 – Accelerate data integration using Apervi Conflux

September 24th, 2015 - Siddu Tummala, CEO, Apervi.

Data Integration and the time it takes to get data ready for applications and analytics is one of the top concerns of executives, along with finding the vendor with the right the product and skill set. We offer a platform to address this challenge.

We are very excited that we will be at the Strata + Hadoop World conference in NY from Sep 29th to Oct 1st, at Booth #P11. We are thrilled to showcase our flagship product, Apervi Conflux - Industry's first web-based big data integration and orchestration platform providing full support for batch, streaming and micro-batch data pipelines.

We have made significant progress since our last visit here in 2014. Strata + Hadoop World has always attracted the best the big data and analytics world has to offer including business decision makers, developers and analysts and we looking forward to interacting with thought leaders in our rapidly expanding industry.

The Big Data technology market is poised to hit nearly $50 Billion by 2018 (IDC). Our enterprise customers are continuously looking for solutions that will simplify their big data application development lifecycles and deployments. They would like to protect their investments in an increasingly complex and evolving data architecture landscape. Data Integration and the time it takes to get data ready for applications and analytics is one of the top concerns of executives, along with finding the vendor with the right the product and skill set.

Apervi Conflux is platform that accelerates designing & deploying data pipelines and IoT applications in the Hadoop ecosystem. It enables users to integrate data from various sources and create reusable data pipelines for data at rest (batch processing) and real-time data (stream processing). Users can build data applications within minutes, and deploy on any execution engine in the Hadoop ecosystem including Storm, Spark and Tez. Apervi Conflux drastically reduces the need to write custom code using technologies like MapReduce, Pig, SCALA, Hive etc., by packaging all those features in the platform.

Accelerate data integration

In addition to strong support for Storm and Spark streaming, the platform also offers ready to configure and deploy solutions for the most common use cases. Enterprises can quickly get started with our solutions for EDW modernization, real-time operational reporting, and log analytics.

We continue to get great feedback from our customers, and market validation for our product and solution offerings, which reaffirms our belief that easy to use data integration tools are an absolute must in any enterprise architecture.

Please stop by our booth P11 to meet face to face with me or one of my colleagues to see Apervi Conflux in action, or just to brainstorm about your Big Data challenges, or just to say Hello!!!

Clients & Partners