ALU

api

Apache Cayenne 4.0 —What It Is All About

Didn’t have much time for writing lately. Still felt the topic of Apache Cayenne 4.0 is important enough to say a few words, even though this is coming to you months after the final release. So here is a belated overview of the release scope and its significance. Enjoy!

As you may know, Cayenne is a Java ORM, i.e. an intermediary between your Java objects and the database, that keeps object and relational worlds in sync and lets you run queries and persist data. The current stable version of Cayenne is 4.0 and it was released in August 2018.

Original Link

What is Serverless? — Part 3: Kubernetes and Serverless

This is a 5-part blog series. See part 1 and part 2.

Kubernetes is an open-source solution for automating deployment, scaling, and management of containerized applications. The business value provided by Kubernetes has been extended into the serverless world as well. In general, serverless loves Kubernetes — with Kubernetes being the ideal infrastructure for running serverless applications, because of a few key reasons.

Original Link

Java 8 Stream API Guide

Java provides a new additional package in Java 8 called java.util.stream. This package consists of classes, interfaces, and an enum that allows functional-style operations on the elements. You can use stream by importing java.util.stream package in your programs.

In this guide, we will explore important Stream APIs/Methods with examples.

Original Link

Couchbase Lite for Data Storage in Ionic App Using Cordova Plugin

Couchbase Lite is an embedded NoSQL database for iOS, Android and .Net platforms. The framework’s API supports native platform bindings for Android (Java), iOS (Swift, ObjC) and UWP/Xamarin (csharp). This implies that if you are building a Cordova app and you want to use Couchbase Lite as your embedded data persistence layer, you will have to find a way to access Couchbase Lite’s native APIs from within your Cordova web application. You can accomplish that with Cordova Plugins. Cordova plugins allow web-based apps running in a Cordova webview to access native platform functionality through a Javascript interface.

Architecture

At a high level, the architecture of a Cordova application that uses Cordova Plugins to access native code libraries is pretty straightforward.

Original Link

Couchbase Lite for Data Storage in Ionic App Using Cordova Plugin

Couchbase Lite is an embedded NoSQL database for iOS, Android and .Net platforms. The framework’s API supports native platform bindings for Android (Java), iOS (Swift, ObjC) and UWP/Xamarin (csharp). This implies that if you are building a Cordova app and you want to use Couchbase Lite as your embedded data persistence layer, you will have to find a way to access Couchbase Lite’s native APIs from within your Cordova web application. You can accomplish that with Cordova Plugins. Cordova plugins allow web-based apps running in a Cordova webview to access native platform functionality through a Javascript interface.

Architecture

At a high level, the architecture of a Cordova application that uses Cordova Plugins to access native code libraries is pretty straightforward.

Original Link

Building Enterprise Java Applications the Spring Way

I think it is fair to say that Java EE has gained a pretty bad reputation among Java developers. Despite the fact that it has certainly improved on all fronts over the years and even changed its home from the Eclipse Foundation to become Jakarta EE, its bitter taste is still quite strong. On the other side, we have the Spring Framework (or to reflect the reality better, a full-fledged Spring Platform) this is a brilliant, lightweight, fast, innovative, and hyper-productive Java EE replacement. So why bother with Java EE?

We are going to answer this question by showing how easy it is to build modern Java applications using most of the Java EE specs. And, the key ingredient to succeeding here is Eclipse Microprofile: enterprise Java in the age of microservices.

Original Link

Working With Stream APIs in Java 1.8

The Stream concept was introduced in Java 1.8 and remains present in the java.util.stream package. It is used to process the object from the collection or any group of objects or data source. We can easily understand the Java stream concept by its name. For example, in a stream, water flows from one water source to a destination. We can perform operations like filtering, collecting, etc. via a stream to get useful water flow. It is the same when working with Java streams. We think that an object flows from one object source to its destination through Stream pipelines. Keep in mind: a stream pipeline is composed of a stream source, zero or more intermediate operations, and a terminal operation. Let’s take a closer look!

We should be familiar with a few basic concepts before going any further. Remember one thing: Streams don’t actually store elements; they are computed on demand.

Original Link

Reactive Programming With Project Reactor

If you are building reactive microservices, you would probably have to merge data streams from different source APIs into a single result stream. It inspired me to create this article containing some of the most common scenarios of using reactive streams in microservice-based architecture during inter-service communication. I have already described some aspects related to reactive programming with Spring based on Spring WebFlux and Spring Data JDBC projects in the following articles:

Spring Framework supports reactive programming since version 5. That support is build on top of Project Reactor. Reactor is a fourth-generation Reactive library for building non-blocking applications on the JVM based on the Reactive Streams Specification. Working with this library can be difficult at first, especially if you don’t have any experience with reactive streams. Reactive Core gives us two data types that enable us to produce a stream of data: Mono and Flux. With Flux,we can emit 0..nelements. While with Mono, we can create a stream of 0..1 elements. Both those types implement the Publisher interface. Both of these types are lazy, which means they won’t be executed until you consume it. Therefore, when building reactive APIs, it is important not to block the stream. Spring WebFlux doesn’t allow that.

Original Link

Introduction to Reactive APIs With Postgres, R2DBC, Spring Data JDBC and Spring WebFlux

I know — there are A LOT of technologies listed in the title of this article. Spring WebFlux has been introduced with Spring 5 and Spring Boot 2 as a project for building reactive-stack web applications. I have already described how to use it together with Spring Boot and Spring Cloud for building reactive microservices in that article: Reactive Microservices with Spring WebFlux and Spring Cloud. Spring 5 has also introduced projects that support reactive access to NoSQL databases, like Cassandra, MongoDB, or Couchbase. But, there was still a lack of support for reactive to provide access to relational databases. The change is coming together with the R2DBC (Reactive Relational Database Connectivity) project. That project is also being developed by Pivotal members. It seems to be a very interesting initiative, however, it is at the beginning of the road. Anyway, there is a module for integration with Postgres, and we will use it for our demo application.

R2DBC will not be the only one new interesting solution described in this article. I will also show you how to use Spring Data JDBC – another really interesting project that was released recently. It is worth mentioning the features of Spring Data JDBC. This project has already been released and is available under version 1.0. It is a part of bigger Spring Data framework. It offers a repository abstraction based on JDBC. The main reason of creating that library is to allow access to relational databases using Spring Data (through CrudRepository interfaces) without including the JPA library to the application dependencies. Of course, JPA is still certainly the main persistence API used for Java applications. Spring Data JDBC aims to be simpler conceptually than JPA by not implementing popular patterns like lazy loading, caching, dirty context, and sessions. It also provides very limited support for annotation-based mapping. Finally, it provides an implementation of reactive repositories that use R2DBC for accessing the relational database. Although that module is still under development (only a SNAPSHOT version is available), we will try to use it in our demo application. Let’s proceed to the implementation.

Original Link

LocalDateTime Class API Guide in Java 8

In this article, we will learn more about the commonly-used LocalDateTime Class APIs with examples. LocalDateTime represents a combination of date and time.

This is the most commonly used class when we need a combination of date and time. The class offers a variety of APIs, and we will look at some of the most commonly used ones.

Original Link

Java: Gain Performance Using SingletonStream

Background

The Stream library in Java 8 is one of the most powerful additions to the Java language ever. Once you start to understand its versatility and resulting code readability, your Java code-style will change forever. Instead of bloating your code with all the nitty and gritty details with for, if, and switch statements and numerous intermediate variables, you can use a Stream that just contains a description of what to do, and not really how it is done.

Some years ago, we had to make an API decision for a Java project: which return type should we select for the two fast local in-memory data cache methods with;

Original Link

Java 11: a New Way to Handle HTTP and WebSockets in Java!

Once upon a time, using the Java SE (Standard Edition) APIs to perform common HTTP operations, such as REST API calls, might have been described as unnatural and cumbersome. Java 11 officially changes this. With Java 11, the incubated HTTP APIs from Java 9 are now officially incorporated into the Java SE API. The JDK Enhancement Proposal (JEP) 321 was the JEP behind this effort. Since being integrated into Java 11, the API has seen a few changes. As of Java 11, the API is now fully asynchronous. This article will attempt to show you the basic use of the new API by performing a REST API call. We will be using OpenJDK 11.

The APIs use java.util.concurrent.CompleteableFuture<T> to provide asynchronous, non-blocking request/response behavior, allowing for dependent actions. The API used new-style OOP method chaining, like a builder, which returns an object that may be affected by the method call.

Original Link

Kotlin Collections’ API Performance Anti-Patterns

Kotlin’s collections API is expressive and rich — but with great power comes great responsibility. There are certain practices that can cause unnecessary time-complexity and object allocation overhead.

To fully understand the context, make sure to check the previous article, first:

Original Link

4 Things I Include in My Agile Estimations


Estimation is one of the most difficult aspects of the Agile process. The natural tendency of team members is to include only the time it will take to complete the actual work for the item they are estimating. I have a process where I break each work item down into 4 parts to help me get a more accurate estimate. This is a process I use all the time in my current role as CTO of CUE Marketplace and I hope it helps you in your Agile estimations.

Understanding the Big Picture Estimate

I want to know every aspect of the work item that I’ll be completing, so I add any time it would take for me to fully understand it. It’s a huge time saver if my Product Owner has written the work items as user stories. That format helps with the “who,” “what,” and “why.” Other items that could take time include understanding any UI designs/clickable demos, reviewing usability tests and getting to know the “who” part of the story by researching the customer or persona.

Original Link

Full-Stack Test Automation Frameworks — API Usability, Part 2

In the last article from the series, we talked about API usability, one of the must-have features of full-stack test automation frameworks. Here, I am going to show you how API usability goes hand-in-hand with extensibility features. You will see examples of how to create additional element locators and waits. Additionally, we are going to talk about typified elements for accelerating tests development and making tests more readable.

NOTE: You cannot just copy-paste or use most of the examples directly unless you use the Bellatrix framework. However, you can get lots of ideas on how you can make your test automation framework easier to use, extensible, and customizable.

Original Link

Problems Missed When Only Testing APIs Through UI

UI testing is an important part of quality assurance. Specifically, UI testing refers to the practice of testing front-end components to make sure that they do what they’re supposed to. If a user clicks the Login button, the login modal appears. If they click a link, they’re brought to the appropriate part of the application. With automation platforms, these individual tests can be linked together into workflows and automated. Business-driven development style tests can be created in this fashion. The UI can be tested to see that each individual path that a user may take is functional and that the interface is responding appropriately. Other platforms exist that allow these workflows to be tested on simulated resolutions and devices, ensuring that the user experience is consistent across all possible combinations of browser and device.

API testing lives a layer below UI testing. The UI is fed by these APIs and renders the DOM based upon conditions set by both the user and the developer. These conditions determine the sort of API call that’s made to populate the viewport. When we’re UI Testing, it could be argued that we are indirectly testing the API layer. It’s actually pretty fair to say so. Many of the actions that our UI platform will take will issue API calls. If the DOM rerenders correctly, we can assume to an extent that the API call was successful. The dangerous ground here is the assumption.

Original Link

Java 11 String API Updates

It turns out that the new upcoming LTS JDK 11 release is bringing a few interesting String API updates to the table.

Let’s have a look at them and the interesting facts surrounding them.

Original Link

The Ultimate Guide to the Java Stream API groupingBy() Collector

The groupingBy() is one of the most powerful and customizable Stream API collectors.

If you constantly find yourself not going beyond the following use of the  groupingBy():

Original Link

Read/Write in Excel Sheet Using Apache POI With Scala

Yes, you read the title it right — using Apache POI, you can easily read and write in an MS Excel file using Java/Scala.

So, before we get started with the implementation, let’s have a quick introduction of Apache POI.

Original Link

Keys to Effective API Testing

API endpoints make websites work. Simply put, they are the conduits that data moves through. Login functionality? Frequently an API call for authentication. Click on a new section of a webpage? Often an API call for content. Clearly, APIs are a critically important part of any web application. The way that we test these endpoints is incredibly important. At API Fortress, we like to maintain the best practices that follow for effective API testing.

Rule 1: Keep It DRY

DRY is an acronym for “Don’t Repeat Yourself.” This simple idea forms a core principle of good programming. When we are writing tests, even in API Fortress’ visual composer, we’re still programming and should make every effort to adhere to the principles of writing good code. Let’s say that I have an endpoint that provides user data. The relational database providing the user data has ten entries, and I’d like to write tests to validate the responses when each one of these entries is called. There’s no “All Users” route as our organization has no real business need for one. The only way to access the data in each of these endpoints is to send multiple calls, one for each entry in the database. How could we accomplish this without repeating code?

Original Link

The Simple Way to Parse JSON Responses Using Groovy and Katalon Studio

Many people in the Katalon forum have asked about retrieving information from JSON responses and parsing the JSON format in the Katalon Studio. In this post, I will show a simple way on how to do so. Let’s get started.

JSON Response Example

Suppose we have the following JSON response, and we want to parse and retrieve its data:

Original Link

Reviewing FASTER: Summary

FASTER is an interesting project with some unique approaches to solving their tasks, which I haven’t encountered before. When I initially read the paper about a year or so ago, I was impressed with what they were doing even though I didn’t quite grasp exactly what was going on. After reading the code, this is now much clearer. I don’t remember where I read it, but I remember reading a Googler talking about the difference between Microsoft and Google with regards to publishing technical papers. Google would only publish something after it has been in production for a while (and probably ready to sunset) while Microsoft would publish papers about software that hasn’t been deployed yet.

The reason I mention this is that FASTER isn’t suitable for production. Not by a long shot. I’m talking about issues such as swallowing errors, writing to the console as an error handling approach, calling sleep(), lack of logging/tracing/visibility into what is going on in the system. In short, FASTER looks like it was produced to support the paper. It is proof of concept/research code, not something that can take and use.

Original Link

Mark Your Safe Zone! Utilizing Access Modifiers in Java

Recently, I was wondering which keyword in Java is the most used by programmers. Is it final, return, class, or maybe something else?

Unluckily, I haven’t found any broader statistic on the Internet or even on GitHub. However, I remained curious, and so, I wrote a simple file crawler and ran it on several big projects found on GitHub. The outcome was horrible.

Original Link

Fluent Design Style Button, Toggle Button, and Tooltip for Java, JavaFX

This weekend, in my spare time, I’ve continued to work on JMetro. The end result is a new button and toggle button in both the dark and light style. These styles include a new animation when the button is pressed. This can be turned on and off through CSS.

Finally, I’ve quickly tweaked the tooltip style.

This bumps up the JMetro version number to 4.4.

Original Link

Programming to an Interface: A Simple Explanation

As an architect, you know that programming to an interface is good. It’s what everyone should do.

But what does that mean? And why should you do it?

Original Link

Transactional Patterns: Conversation vs. Batch

When I designed RavenDB, I had a very particular use case at the forefront of my mind. That scenario was a business application talking to a database, usually as a web application.

These kinds of applications have a particular style of communication with the user. As you can see below, there are two very distinct operations. Show the user the data, followed by some “think time” (seconds at minimum, but can be much longer) and then followed by an action.

Original Link

Python Lives: Why This Old School Language Keeps Getting More Popular

Python is an old mainstay when it comes to programming languages and it is still on top. Python was the “most wanted” language in StackOverflow’s 2018 Developer Survey, and recently received the top rating for a programming language in the IEEE Spectrum ranking. Not only is it the most highly rated, it’s also one of the most versatile languages, applicable to web, enterprise, and embedded systems programming.

Despite the fact that Python has been around longer than Facebook, Google, or even Amazon, the language has retained its popularity over time. In fact, it has found a strong niche in some of the trendiest technology applications, as the preferred language for machine learning and data science capabilities. Technologists currently use Python to do everything from testing microchips to powering Instagram and building video games.

Original Link

Building a Dating Site With Neo4j: Part 7

Now it is time to create the timeline for our users. Most of the time, the user wants to see posts from people they could High Five in order to elicit a conversation. Sometimes, they want to see what their competition is doing and what kind of posts are getting responses…also who they can low five. I don’t think they don’t want to see messages from people who are not like them and don’t want to date them but I could be wrong.

We need a bunch of parameters for our method. There are the obvious ones, but we’re also adding "city," "state," and "distance" so a user who is traveling can see potential dates from locations outside their typical place. Long distance relationships are hard, but short out of town dates are not. We are also including a "competition" flag to see those posts instead. We’ll make use of these later.

Original Link

Comprehensive API Testing Tools You Need to Know in 2018

API testing (Application Programming Interface Testing) is a type of software testing which focuses on determining if the developed APIs meet the expectations regarding functionality, reliability, performance, and security of the application.

The interest in API testing has been growing steadily over the last couple years, according to Google Trends. Research by Smartbear of over 5,000 software professionals in 2017 showed API testers automating more than 50% of their tests and expected the numbers to grow by 30% (from 59% to 77%) in the next two years. 80% of survey participants reported they were responsible for testing APIs.

Original Link

Why Is Java Great?

Java is a programming language that runs parallel to others coding in the tradition of C and C++. Thus, on the off chance that you have any involvement with C or C++, you’ll wind up in a commonplace area, as you take in the different highlights of Java.

However, Java varies from other programming dialects in a few critical ways. The accompanying areas portray the most critical contrasts.

Original Link

The 6 Best Time-Saving Tools

When you’re working as a business professional, your efficiency and productivity are typically only as good as the tools you use. Fortunately, with the advent of mobile data, smart devices, and software, the amount of time you have to spend stuck in the office on mundane tasks can be shaved down to a bare minimum. We’ve assembled a list of the best time-saving tools that also help you do more and stay organized. Here’s a quick look at each tool to help you get started.

Communications

Twilio

When you need to push communications for your business, you can do it the easy way or the hard way, and Twilio is one of the best tools to make the process as easy as possible. It allows for easy integration of text, voice and video communications through an easy-to-use API with endless scalability. As a cloud-based service, it’s easily accessible for fast tweaks on the go while remaining fully featured from day one, allowing you access to all the communication tools you need to stay in touch with your client base.

Original Link

Creating a Docker Overlay Network

Image title

Summary

When we get started using Docker, the typical configuration is to create a standalone application on our desktop.

For the most part, it’s usually not practical to run all your applications on a single machine and when it’s not, you’ll need an approach for distributing the applications across many machines. This is where a Docker Swarm comes in.

Docker Swarm provides capabilities for clustering, scalability, discovery, and security, to name a few. In this article, we’ll create a basic Swarm configuration and perform some experiments to illustrate discovery and connectivity.

In this demo, we’ll create a Swarm overlay cluster that will consist of a Swarm manager and a worker. For convenience, it will be running in AWS.

Architecture

Our target Architecture will consist of a couple of Docker containers running inside AWS AMI images on different EC2 hosts. The purpose of these examples is to demonstrate the concepts of how a Docker swarm can be used to discover services running on different host machines and communicate with one another.

Image title

In our hypothetical network above, we depict the interconnections of a Docker swarm manager and a couple of swarm workers. In the examples which follow we’ll use a single manager and a single worker to keep complexity and costs low. Keep in mind that your real configurations will likely consist of many swarm workers.

Here’s an example of what a potential Use Case may look like. An AWS load balancer configured to distribute load to a Docker swarm running on 2 or more EC2 instances.

Image title

We’ll show in the examples below how you can create a Docker swarm overlay network that will allow DNS discovery of members and allow members to communicate with one another.

Prerequisites

We assume you’re somewhat familiar with Docker and have some familiarity setting up EC2 instances in AWS.

If you’re not confident with AWS or would like a little refresher, please review the following articles:

Some AWS services will incur charges, so be sure to stop and/or terminate any services you aren’t using. Additionally, consider setting up billing alerts to warn you of charges exceeding a threshold that may cause you concern.

Configuration

Begin by creating two (2) EC2 instances (free tier should be fine), and install Docker on each EC2 instance. Refer to the Docker Supported platforms section for Docker installation guidance and instructions for your instance.

Here are the AWS ports to open to support Docker Swarm and our port connection test:

Open ports in AWS Mule SG

Type

Protocol

Port Range

Source

Description

Custom TCP Rule

TCP

2377

10.193.142.0/24

Docker swarm management

Custom TCP Rule

TCP

7946

10.193.142.0/24

Container network discovery

Custom UDP Rule

UDP

4789

10.193.142.0/24

Container ingress network

Custom TCP Rule

TCP

8083

10.193.142.0/24

Demo port for machine to machine communications

For our examples, we’ll use the following IP addresses to represent Node 1 and Node2:

  • Node 1: 10.193.142.248
  • Node 2: 10.193.142.246

Before getting started, let’s take a look at the existing Docker networks.

Docker Networks

docker network ls

The output of the network list should look at least like the listing below if you’ve never added a network or initialized a swarm on this Docker daemon. Other networks may be shown as well.

Results of Docker Network Listing:

NETWORK ID

NAME

DRIVER

SCOPE

fa977e47b9f3

bridge

bridge

local

705fc078c278

host

host

local

bd4caf6c1751

none

null

local

From Node 1, let’s begin by initializing the swarm.

Create the Swarm Master Node

docker swarm init --advertise-addr=10.193.142.248

You should get a response that looks like the one below. We’ll use the token provided to join our other node to the swarm.

Results of swarm init

Swarm initialized: current node (v9c2un5lqf7iapnv96uobag00) is now a manager. To add a worker to this swarm, run the following command: docker swarm join --token SWMTKN-1-5bbh9ksinfmajdqnsuef7y5ypbwj5d9jazt47urenz3ksuw9lk-227dtheygwbxt8dau8ul791a7 10.193.142.248:2377 To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

It takes a minute or two for the Root CA Certificate to synchronize through the swarm, so if you get an error, give it a few minutes and try again.

If you happen to misplace the token, you can use the  join-tokenargument to list tokens for manager and workers. For example, on Node 1, run the following:

Manager Token for Node 1

docker swarm join-token manager

Next, let’s join the swarm from Node 2.

Node 2 Joins Swarm

docker swarm join --token SWMTKN-1-5bbh9ksinfmajdqnsuef7y5ypbwj5d9jazt47urenz3ksuw9lk-227dtheygwbxt8dau8ul791a7 10.193.142.248:2377
This node joined a swarm as a worker.

From Node 1, the swarm master, we can now look at the connected nodes

On Master, List All Nodes

docker node ls

Results of Listing Nodes

ID HOSTNAME STATUS AVALIABILITY ENGINE VERSION

2quenyegseco1w0e5n1qe58r3

ip-10-193-142-248

Ready

Active

Active 18.03.1-ce

wrjk02g909c6fnuxlepmksuz4

ip-10-193-142-246

Ready

Active

Active 18.03.1-ce

Also, notice that an Ingress network has been created, this provides an entry point for our swarm network.

Results of Docker Network Listing

NETWORK ID

NAME

DRIVER

SCOPE

fa977e47b9f3

bridge

bridge

local

705fc078c278

host

host

local

bd4caf6c1751

none

null

local

qrppfipdu098

ingress

overlay

swarm

Let’s go ahead and create our Overlay network for standalone containers.

Overlay Network Creation on Node 1

docker network create --driver=overlay --attachable my-overlay-net docker network ls

Results of Docker Network Listing

NETWORK ID

NAME

DRIVER

SCOPE

fa977e47b9f3

bridge

bridge

local

705fc078c278

host

host

local

bd4caf6c1751

none

null

local

qrppfipdu098

ingress

overlay

swarm

vn12jyorp1ey

my-overlay-net

overlay

swarm

Note the addition of our new overlay network to the swarm. Now we join the overlay network from Node 1.

Run Our Container, Join the Overlay Net

docker run -it --name alpine1 --network my-overlay-net alpine

Join the overlay network from Node 2, we’ll open port _8083_ to test connectivity into our running container.

Run Our Container, Join the Overlay Net

docker run -it --name alpine2 -p 8083:8083 --network my-overlay-net alpine

Verify Our Overlay Network Connectivity

With our containers running we can test that we can discover our hosts using DNS configured by the swarm. From Node 2, let’s ping the Nod 1 container.

Node 2 Pings Node 1, Listens on Port 8083

ip addr # show our ip address
ping -c 2 alpine1 # create listener on 8083
nc -l -p 8083

From Node 1LetsPing the Node 2 Container and Connect toIt’sOpen Listener on Port8083

Node 1 Pings Node 2, Connect to Node 2 Listener on Port 8083

ip addr # show our ip address
ping -c 2 alpine2 # connect to alpine2 listener on 8083
nc alpine2 8083
Hello Alpine2
^C

There you have it, you created a tcp connection from Node 1 to Node 2 and sent a message. Similarly, your services can connect with and exchange data when running in the Docker overlay cluster.

With these fundamental building blocks in place, you’re ready to apply these principles to real-world designs.

Cleanup

With our testing complete we can tear down the swarm configuration.

Remove Node 2 Swarm

docker container stop alpine2
docker container rm alpine2 docker swarm leave

Remove Node 1 Swarm

docker container stop alpine1
docker container rm alpine1 docker swarm leave --force

This concludes our brief examples with creating Docker Overlay Networks. With these fundamental building blocks in place, you now have the essential pieces necessary for building larger, more complex Docker container interactions.

Be sure to remove any AWS assets you may have used in these examples so you don’t incur any ongoing costs.

I hope you enjoyed reading this article as much as I have enjoyed writing it, I’m looking forward to your feedback!

Original Link

Operating Your API In The Cloud Kill Zone

When you operate your application within the API ecosystem of a large platform, depending on the platform, you might have to worry about the platform operator copying, and emulating what you do. Twitter has long been accused of sharecropping within their ecosystem, and other larger platforms have come out with similar features to what you can find within their API communities. Not all providers take the ideas, it is also very common for API platforms to acquire talent, features, and applications from their ecosystems–something that Twitter has done regularly. Either way, API ecosystems are the R&D, and innovation labs for many platforms, where the latest features get proven.

As the technology playing field has consolidated across three major cloud providers, AWS, Azure, and Google, this R&D and innovation zone, has become more of a cloud kill zone for API providers. Where the cloud giants can see the traction you are getting, and decide whether or not they want to launch a competing solution behind the scenes. Investors are tuning into this new cloud kill zone, and in many cases opting not to invest in startups who operate on a cloud platform, afraid that the cloud giant will just come along and copy a service, and begin directly competing with companies operating within their own ecosystem. Making it a kill zone for API providers, who can easily be assimilated into the AWS, Azure, or Google stack, and left helpless do anything but wither on the vine, and die.

Much like other API ecosystems, AWS, Azure, and Google all have the stats on who is performing across their platforms, and they know which solutions developers are demanding. Factoring in the latest growth trends into their own road maps, and making the calculations around whether they will be investing in their own solutions, or working to partner, and eventually acquire a company operating with this new kill zone. The 1000 lb cloud gorillas carry a lot of weight in regards to whether or not they choose to partner and acquire, or just crush a startup. I’m guessing there are a lot of factors they consider along the way that will contribute to whether they play nicely or not. There are no rules to this game, and they really can do whatever they want with as much market share and control over the resources as they all possess. It will be interesting to begin tracking on acquisitions and partnerships across all players to better understand the score.

I wrote last year about how the API space is in the tractor beam of the cloud providers now, and it is something I think will only continue in coming years. It will be hard to deploy, scale, and operate your API without doing it on one of the cloud platforms, or multiple cloud platforms, forcing all API providers to operate within the cloud kill zone. Exposing all new ideas to share their analytics with their platform overlords, and open them up for being copied, or at least hopefully acquired. Which is something that will stunt investment in new APIs, making it harder for them to scale and grow on the business side of things. Any way you look at it, the cloud providers have the upper hand when it comes to cherry picking the best ideas and features, with AWS having a significant advantage in the game with their dominant cloud market position. It will be pretty hard to do APIs in the next decade without AWS, Azure, and Google knowing what you are doing, and having the last vote in whether you are successful or not.

Original Link

Expose RESTful APIs Using Spring Boot in Seven Minutes

Today, we are going to learn how to expose standalone RESTful web services. The purpose of this post is to enable a reader to write their own RESTful web services.

Yes, that’s right! I’m sure that after watching this short video you will be able to write your own RESTful web service. Let’s get started.

You can check out this sample project at GitHub.

Original Link

Building a Fully Automated CI/CD Process for API Development With WSO2 API Manager

It is the age of automation where everyone tries to minimize manual steps while building software. API management is no exception. The term “CI/CD,” which expands to “Continous Integration/ Continous Deployment,” is the more technical term for this automation process. According to this definition, it contains a 2-step process for automating your software development and delivery.

  • Continuous Integration — This means that whenever you make a change to your source code, it needs to be well tested before integrating into the existing stable codebase.
  • Continuous Deployment — This means that once the integration tests have passed, the deployment of the new piece of code to the relevant environments (staging, prod) needs to happen automatically as part of the build process.

When it comes to the development of enterprise software, enterprise architects design the software in such a manner that entire software system breaks down into several layers (layered architecture). The back end services, which implement the business logic, are developed with a general-purpose programming language and the help of existing frameworks. Implementing a CI/CD process on top of this kind of approach is quite straightforward since it is easier to write unit tests using the same programming language, and source code can be managed in a GitHub repository. With a tool like TravisCI, you can easily integrate the integration and deployment process together when there is a change in the source code.

WSO2 API Manager is a full lifecycle API management platform which comes with a componentized architecture to make life easier for API designers, API consumers, and API owners. WSO2 API Manager comes with a dedicated web application called API Publisher for designing and developing the APIs. Even though this is a pretty useful tool for manually building APIs through a proper governance workflow, applying a CI/CD process for this is not possible. But WSO2 API Manager is designed in such a manner that it exposes a set of REST APIs at the product core runtime level which can be used to build a proper CI/CD pipeline for API development.

This article is hugely inspired by the work done by @Manjula Rathnayake of the WSO2 team, where he has published a similar concept and source code on GitHub. I will be reusing his code to explain the process in a more granular and API Manager-focused manner.

At a higher level, we are going to build a CI/CD pipeline to deploy an API to the WSO2 API Manager, which connects to a given backend URL. In this scenario, the source code of the API definition is stored in a GitHub repository and there is a TravisCI build job configured to the master branch of that repository so that when there is a change occur in the GitHub repo, it will automatically deploy the API to the staging environment after testing in the development environment. This process is explained in the above GitHub repository and I’m using the diagram which was used there.

If you want to see how things working first, you can follow the steps mentioned here.

I’m going to explain what is actually happening under the hood so that anyone can use the scripts which are implemented here and change them according to their requirements. Once you clone the above GitHub repository, you will find there are 6 main directories under the root directory of the repository. The contents of those directories are explained below.

  • BackendServiceImpl: This is the source code of the backend service which is going to be exposed through the API created in WSO2 API Manager. This has been implemented using the WSO2 MSF4J framework. It contains a build script to build the flat jar for the microservice.
  • DeployBackendService: This directory contains the shell script to deploy the backend service to the relevant environment based on the flag set when running this script. This can be used to deploy the backend service to either a dev environment (tenant) or staging environment based on the passed value.
  • DeployAPI: This directory contains important information about the API definition and its metadata required for WSO2 API Manager. The API definition is included in the swagger.json file in a standard format. Users can define their API with this file. This information is not enough to create an API within the WSO2 API Manager. It requires a set of metadata about the API (e.g. endpoint URL, throttling, security, caching, mediation, etc.) to be passed in when creating an API. These metadata and the requests which need to be sent to the API Manager runtime are defined within a Postman collection file with the name “WSO2_API_PUBLISHER.postman_collection.json.” This is the step where we connect with the API Manager runtime and use the product-level APIs to create the API definition within WSO2 API Manager. It contains the following requests in the given order.
  • Dynamic Client Registration request: First we need to register a dynamic client to create the API in the runtime. This is sent to the following URL
{{gatewayURL}}/client-registration/register {{gatewayURL}}/token {{gatewayURL}}/token {{gatewayURL}}/api/am/publisher/apis?query=name:{{apiName}} {{gatewayURL}}/api/am/publisher/apis {{gatewayURL}}/api/am/publisher/apis/{{apiId}}/swagger {{gatewayURL}}/token {{gatewayURL}}/api/am/publisher/apis/change-lifecycle?action=Publish&apiId={{apiId}}

The above steps are sequentially configured in the Postman collection which will be executed by the “newman” tool. Once this script is executed, the API will be created in the relevant tenant and will be in the “published” state.

  • TestAPI: This directory includes a Postman collection which is used to test the deployed API. Before publishing into the upper environments, we need to make sure that tests are properly passing in the lower environments. We are executing the following steps within this Postman collection.
  • Dynamic Client Registration request: First we need to register a dynamic client to subscribe to the API in the runtime. This is sent to the following URL
{{gatewayURL}}/client-registration/register {{gatewayURL}}/token {{gatewayURL}}/api/am/store/applications?query=testingApp {{gatewayURL}}/api/am/store/applications {{gatewayURL}}/api/am/store/apis?query=name:{{apiName}} {{gatewayURL}}/api/am/store/subscriptions {{gatewayURL}}/api/am/store/applications/{{applicationId}} {{gatewayURL}}/api/am/store/applications/generate-keys?applicationId={{applicationId}} {{gatewayURL}}/token {{gatewayURL}}/t/{{tenantDomain}}/hello/1.0.0/

Once the above steps are executed with the Postman collection, we move onto the next step of the execution flow.

  • DevEnvironment: This directory contains the environment variables related to the development environment. The file “Development.postman_environment.json” is passed in as an input parameter to Newman when executing the Postman collection stored in the “DeployAPI” section. The relevant URLs and tenant domain information are extracted from this file during the deployment. The contents within the “backendService.properites” are used when deploying the backend service within the bash script of “DeployBackendService” directory.
  • StagingEnvironment: This directory contains the same content as the previous “DevEnvironment” directory, which is related to the staging environment.

Last but not least, we have the Travis YAML descriptor file, which defines the execution flow of the build and deploy pipeline. The sequence is as follows:

  1. Build the backend service implementation
  2. Deploy the backend service to integration cloud “Development” tenant (environment)
  3. Deploy the API to the WSO2 API cloud “Development” tenant
  4. Test the API deployed at “Development” tenant
  5. Deploy the backend service to integration cloud “Staging” tenant
  6. Deploy the API to the WSO2 API cloud “Staging” tenant
  7. Test the API deployed at “Staging” tenant

With TravisCI, you can configure the above script to be executed when there is a change in the GitHub repository. You can follow the steps there to see this in action.

In this post, I have explained the steps to fully automate the deployment of APIs with WSO2 API Manager. You can tweak the steps mentioned in this post and define your own CI/CD pipeline. On top of this automation, if you have more than one node, you can have proper file sharing and database sharing mechanisms to deploy the APIs across all the nodes in the cluster.

Original Link

Thread Methods destroy() and stop(Throwable) Removed in JDK 11

Developers! Quickly and easily gain access to the tools and information you need! Explore, test and combine our data quality APIs at Melissa Developer Portal – home to tools that save time and boost revenue. Our APIs verify, standardize, and correct the Big 4 + more – name, email, phone and global addresses – to ensure accurate delivery, prevent blacklisting and identify risks in real-time.

Topics:

java ,jdk 11 ,threads ,methods ,deprecation ,api

Original Link

How to Visualize Java Module Graphs

This article is a demonstration on how we can visualize a Jigsaw module graph in a Java application. The module API can list Jigsaw modules and its dependents, as shown below.

Set<Module> modules = ModuleLayer.boot().modules();
Set<Requires> requires = module.getDescriptor().requires();

With these two simple commands, we can access the module relation graph in the running application.

To visualize module relations, vis.js can be used. It is easy to create network graphs with vis.js. Take a look at the following code snippet!

// create an array with nodes var nodes = new vis.DataSet([ {id: 'java.base', label: 'java.base'}, {id: 'java.logging', label: 'java.logging'}, {id: 'java.sql', label: 'java.sql'} ]); // create an array with edges var edges = new vis.DataSet([ {from: 'java.sql', to: 'java.base'}, {from: 'java.sql', to: 'java.logging'}, {from: 'java.logging', to: 'java.base'} ]); // create a network var container = document.getElementById('mynetwork'); var data = { nodes: nodes, edges: edges }; var options = {}; var network = new vis.Network(container, data, options);

The view should be similar to the image below:

Image title

Let’s create a whole alive module graph visualizer.  First, input the following:

public class Node { private String id; private String label; // getters, setters, constructors
}

Node.java represents Node data. Each module name will have one node:

public class Edge { private String from; private String to; // getters, setters, constructors
}

Edge.java represents an edge between two nodes:

@RestController
public class ModuleGraphController { @GetMapping("/modules") public Map<String, HashSet<?>> moduleInfo() { var nodes = new HashSet<Node>(); // <1> var edges = new HashSet<Edge>(); // <2> fillNodeAndEdges(nodes, edges); // <3> return Map.of("nodes", nodes, "edges", edges); // <4> } private void fillNodeAndEdges(HashSet<Node> nodes, HashSet<Edge> edges) { Set<Module> modules = ModuleLayer.boot().modules(); // <5> for (Module module : modules) { String moduleName = module.getName(); if (moduleNotContain(moduleName, "jdk")) { // <6> nodes.add(new Node(moduleName)); } Set<Requires> requires = module.getDescriptor().requires(); <7> for (Requires require : requires) { edges.add(new Edge(moduleName, require.name())); <8> } } } private boolean moduleNotContain(String moduleName, String text) { return !moduleName.startsWith(text); }
}
1

Create node set

2

Create edge set

3

Fill node and edge sets

4

Return edge and node sets in a map

5

Access module list

6

Skip jdk internal modules for clarity

7

Access module’s dependents

8

Fill edge between module and dependent

That’s all!

Here is the final result:

Image title

In order to run the demo, follow these steps:

mvn clean install
java -jar target/module-graph.jar
// Then open http://localhost:8080

or

mvn clean install
docker build -t rahmanusta/module-graph .
docker run -it -p 8080:8080 rahmanusta/module-graph
// Then open http://localhost:8080

You can access full source code here: https://github.com/rahmanusta/module-graph.

Original Link

Secure Your Dropwizard Server With Single Sign-on and User Management

We all know that authentication is no fun, especially when you have to build in-house and rolling your own can lead to security and scale issues in the future. With Okta, you can add secure authentication to your Dropwizard site in a matter of minutes. Dropwizard is recognized as the pioneer in turn-key Java API frameworks. Okta also rivals Spring Boot for its ease of adoption. By combining Dropwizard’s production-ready essential libraries and Okta’s identity platform, you can construct a fully secured internet-facing web service with little effort. Read on to see how!

This tutorial assumes familiarity with Java, Maven, and basic web security concepts. The first section sets up a new Dropwizard server from scratch. So, if you already have one up and running, feel free to skip straight to how to integrate with Okta. You can also find the completed code example on GitHub.

Dropwizard Versus Spring Boot

A number of excellent articles and blog posts — notably Takipi and Schibsted — provide thorough comparisons of the two frameworks on both a feature-set level and from an architectural perspective. While Spring Boot has been eclipsing Dropwizard lately in regards to popularity, Dropwizard still provides a compelling out-of-the-box distribution.

Most of the differences boil down to what is included by default versus what is offered as an add-on library. With no other setup needed, Dropwizard gives you exhaustive API metrics, logging, and a handful of useful libraries and tools, such as Jackson, Liquibase, Hibernate, and a few page-templating frameworks. Spring Boot requires that you specify most of these extras, which adds a little more thought, planning, and effort to new server creation. The benefit, however, is greater flexibility and a wider variety of options, such as multiple HTTP server alternatives and less coupling to specific libraries.

Both frameworks provide a well-integrated, mature, production-ready insta-server. The choice between the two usually falls according to the preference of one system’s libraries over the other. If you need the Swiss Army knife or prefer to leave more options open, Spring Boot may be the way to go. If you’re a fan of Jetty, Hibernate, Jersey, et al. and just want to start coding immediately, Dropwizard is hard to beat.

Generate a New Dropwizard Server

First things first, you’ll need a running server. The Dropwizard Maven archetype is a convenient way to create a new project. You can execute the following command to start in interactive mode:

mvn archetype:generate \ -DarchetypeGroupId=io.dropwizard.archetypes \ -DarchetypeArtifactId=java-simple

When prompted for various project names, this example used com.example for the groupIddemo for the artifactId, and Demo for the name. The rest were given default values.

Maven archetype output

Almost like a TODO list, Dropwizard outlines the fundamental components of your server by creating a bunch of empty directories. Most of those can be left alone for now.

Directory structure

To start penciling in the new server, create a HomePageResource.java class in the com.example.resources package. This will serve as the “Hello world” entry point for testing and can be enhanced later on with one of Dropwizard’s built-in HTML templating libraries. Two key annotations are needed: one @Path("/") annotation at the class level, indicating that this resource will handle requests to your server’s root URI, and one JAX-RS @GET annotation applied to a simple function that returns a test string.

@Path("/")
public class HomePageResource { @GET public String handleGetRequest(){ return "Hello from Dropwizard!"; }
}

Now, back in com.example.DemoApplication, register this new resource with Jersey in the provided run() method:

@Override
public void run(final DemoConfiguration configuration, final Environment environment) { environment.jersey().register(new HomePageResource());
}

With that, it’s time to give the server a quick run and make sure all is working as expected. The following two commands will build and start the server on its default port of 8080:

mvn package
java -jar target/demo-1.0-SNAPSHOT.jar server

Once it’s running, visit http://localhost:8080 in your browser and this should relay your “Hello, world” message:

Basic hello world response

Before getting too fancy with the UI, now is a great opportunity to enable single sign-on for your server. The next section will walk you through the process!

Integrate With Okta for OAuth 2.0

Since one of Dropwizard’s goals is to make it easy to create RESTful applications, it provides support for creating an OAuth 2.0 resource server. However, the actual implementation is just a stub and requires you to implement the actual handling of the access token. Okta has created an access token validation library (okta-jwt-verifier) to make it easy to plug this logic into any application.

Create an Okta Account and Gather Credentials

If you don’t already have a free Okta account, you can follow these instructions to create one and set up your first Okta application. There are four important values you will want to take note of:

Use Dropwizard Configuration to Store Your OAuth Settings

Dropwizard’s configuration mechanism is quite easy to work with. All that’s needed is a YAML file with some config values defined and a matching POJO to access the values at runtime (in this case, that POJO is the DemoConfiguration class). You should already have a config.yml in the root of the example directory. Create a new oktaOAuth section and add your OAuth connection details as follows:

oktaOAuth: baseUrl: https://dev-820448.oktapreview.com issuer: "https://dev-820448.oktapreview.com/oauth2/default" clientId: "{yourClientId}" audience: "{yourAudience}" # defaults to ‘api://default’

To cut down on boilerplate, this example just adds these fields as publicly accessible members. You may prefer adding getters and setters.

Create a new class com.example.models.OktaOAuthConfig:

public class OktaOAuthConfig { public String baseUrl; public String clientId; public String issuer; public String audience;
}

Now, add our new model to the com.example.DemoConfiguration class.

public class DemoConfiguration extends Configuration { public OktaOAuthConfig oktaOAuth = new OktaOAuthConfig();
}

Now, these config values can be easily retrieved in the DemoApplication class via its inherited configuration member.

Handle the OAuth 2.0 Access Token

As I mentioned above, Dropwizard’s OAuth support still requires you to handle the access token yourself. No worries! You can do that in a few lines of code with the Okta JWT Verifier.

First up, add the dropwizard-auth and okta-jwt-verifier dependencies to your pom.xml:

<dependency> <groupId>io.dropwizard</groupId> <artifactId>dropwizard-auth</artifactId>
</dependency>
<dependency> <groupId>com.okta.jwt</groupId> <artifactId>okta-jwt-verifier</artifactId> <version>0.3.0</version>
</dependency>

Create a Principal Implementation

Next up, I need to create a class to hold the user’s information. Dropwizard expects class to implement java.security.Principal. Create a new class com.example.auth.AccessTokenPrincipal:

public class AccessTokenPrincipal implements Principal { private final Jwt accessToken; AccessTokenPrincipal(Jwt accessToken) { this.accessToken = accessToken; } @Override public String getName() { // the 'sub' claim in the access token will be the email address return (String) accessToken.getClaims().get("sub"); }
}

The above class basically just wraps an com.okta.jwt.Jwt and exposes it as a Principal and uses the email address in the sub claim for the name.

Dropwizard Authentication

So far so good. Next, create a new class com.example.auth.OktaOAuthAuthenticator. This is where the magic happens! This class will implement io.dropwizard.auth.Authenticator and validate the access token:

public class OktaOAuthAuthenticator implements Authenticator<String, AccessTokenPrincipal> { private final JwtVerifier jwtVerifier; public OktaOAuthAuthenticator(JwtVerifier jwtVerifier) { this.jwtVerifier = jwtVerifier; } @Override public Optional<AccessTokenPrincipal> authenticate(String accessToken) throws AuthenticationException { try { Jwt jwt = jwtVerifier.decodeAccessToken(accessToken); // if we made it this far we have a valid jwt return Optional.of(new AccessTokenPrincipal(jwt)); } catch (JoseException e) { throw new AuthenticationException(e); } }
}

That is it! Basically, it will consist of two lines of code — one to validate the token and another to return our custom principal type!

Wire it up!

The last step is to wire this all up in our application class, edit DemoApplication , and create a new method configureOAuth():

private void configureOAuth(final DemoConfiguration configuration, final Environment environment) { try { OktaOAuthConfig widgetConfig = configuration.oktaOAuth; // Configure the JWT Validator, it will validate Okta's JWT access tokens JwtHelper helper = new JwtHelper() .setIssuerUrl(widgetConfig.issuer) .setClientId(widgetConfig.clientId); // set the audience only if set, otherwise the default is: api://default String audience = widgetConfig.audience; if (StringUtils.isNotEmpty(audience)) { helper.setAudience(audience); } // register the OktaOAuthAuthenticator environment.jersey().register(new AuthDynamicFeature( new OAuthCredentialAuthFilter.Builder<AccessTokenPrincipal>() .setAuthenticator(new OktaOAuthAuthenticator(helper.build())) .setPrefix("Bearer") .buildAuthFilter())); // Bind our custom principal to the @Auth annotation environment.jersey().register(new AuthValueFactoryProvider.Binder<>(AccessTokenPrincipal.class)); } catch (Exception e) { throw new IllegalStateException("Failed to configure JwtVerifier", e); }
}

This method does a couple of things. It creates a JwtVerifier based on the properties in our configuration file, registers the new OktaOAuthAuthenticator class, and finally binds the @Auth annotation to our new AccessTokenPrincipal class.

Don’t forget to update the run() method with a call to our new configureOAuth() method.

@Override
public void run(final DemoConfiguration configuration, final Environment environment) { // configure OAuth configureOAuth(configuration, environment); // add resources environment.jersey().register(new HomePageResource());
}

Finally, update the HomePageResource to require authentication and add a bit more personalization using the @Authannotation.

@Path("/")
public class HomePageResource { @GET public String handleGetRequest(@Auth AccessTokenPrincipal tokenPrincipal) { return "Hello! We'll be contacting you at: " + tokenPrincipal.getName(); }
}

You could restart the server and start handling requests! But, you, of course, need to get an access token from somewhere. If you were handling class from another OAuth capable application, you could stop here. However, since this is an example, I’m going to add a simple login page using the Okta Sign-In Widget.

Add the Okta Sign-In Widget

Adding a login page to our RESTful application does create a few concerns. I’m going to do this to simplify the example and at the same time show you a few more cool things with Dropwizard.

Expose the OAuth Configuration via REST

Since there is nothing secret in our OAuth configuration (access tokens to not require a client secret to be validated), we can expose our OktaOAuthConfig with a new JAX-RS resource. Create a new class com.example.resources.LoginWidgetConfigResource:

@Path("/signInConfig")
@Produces("application/json")
public class LoginWidgetConfigResource { private final OktaOAuthConfig config; public LoginWidgetConfigResource(OktaOAuthConfig config) { this.config = config; } @GET public OktaOAuthConfig getConfig() { return config; }
}

It’s pretty simple and it’s just a getter with a @GET annotation! Back in our DemoApplication class, you need to register the new resource in the run method:

environment.jersey().register(new LoginWidgetConfigResource(configuration.oktaOAuth));

Add a Login Page

Before creating a login page, I need to configure Dropwizard to serve static assets using the concept of an AssetsBundle. This will require another dependency in your pom.xml:

<dependency> <groupId>io.dropwizard</groupId> <artifactId>dropwizard-assets</artifactId>
</dependency>

In your DemoApplicaiton class, you can register this bundle in the initialize method:

@Override
public void initialize(final Bootstrap<DemoConfiguration> bootstrap) { bootstrap.addBundle(new AssetsBundle("/assets/", "/", "index.html"));
}

This configures the application to serve all of the files in src/main/resources/assets at the root (/) of your application. It also defines index.html as the default welcome file.

This creates a small issue. If you restart your application now it would throw an exception. Both our static assets and our resources are being service from the root context. The easy fix is to serve your API resources at /api/* with a single line in your application’s run method, the whole method should now look like:

@Override
public void run(final DemoConfiguration configuration, final Environment environment) { // base url for our resources environment.jersey().setUrlPattern("/api/*"); // configure OAuth configureOAuth(configuration, environment); // add resources environment.jersey().register(new HomePageResource()); environment.jersey().register(new LoginWidgetConfigResource(configuration.oktaOAuth));
}

The only thing left to do is to create a login page. I’m actually going to create a simple SPA app with a single index.html file. This page will load the widget configuration from /api/signInConfig, prompt the user to login, and then display the results from a call to /api/message. I’m not going to dig into the contents of the HTML. If you are interested, you should be able to follow the comments.

<!doctype html>
<html lang="en">
<head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <title>Dropwizard OAuth 2.0 Example</title> <base href="/"> <script src="https://ok1static.oktacdn.com/assets/js/sdk/okta-signin-widget/2.6.0/js/okta-sign-in.min.js" type="text/javascript"></script> <link href="https://ok1static.oktacdn.com/assets/js/sdk/okta-signin-widget/2.6.0/css/okta-sign-in.min.css" type="text/css" rel="stylesheet"> <link href="https://ok1static.oktacdn.com/assets/js/sdk/okta-signin-widget/2.6.0/css/okta-theme.css" type="text/css" rel="stylesheet"> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css"> <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js" type="text/javascript"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" type="text/javascript"></script>
</head> <body>
<!-- Render the login widget here -->
<div id="okta-login-container"></div> <div class="container"> <!-- Render the REST response here --> <div id="api-message"></div> <!-- And a logout button, hidden by default --> <button id="logout" type="button" class="btn btn-danger" style="display:none">Logout</button>
</div>
<script> $.ajax({ url: "/api/signInConfig", }).then(function(data) { // we are priming our config object with data retrieved from the server in order to make this example easier to run // You could statically define your config like if you wanted too: /* let config = { baseUrl: 'https://dev-123456.oktapreview.com', clientId: '00icu81200icu812w0h7', redirectUri: 'http://localhost:8080', authParams: { issuer: 'https://dev-123456.oktapreview.com/oauth2/default', responseType: ['id_token', 'token'] } }; */ window.oktaSignIn = new OktaSignIn({ baseUrl: data.baseUrl, clientId: data.clientId, redirectUri: window.location.href, authParams: { issuer: data.issuer, responseType: ['id_token', 'token'], scopes: ["openid", "profile", "email"] } }); // handle the rest of the page doInit(); }); /** * Makes a request to a REST resource and displays a simple message to the page. * @param accessToken The access token used for the auth header */ function renderApiMessage(accessToken) { // include the Bearer token in the request $.ajax({ url: "/api/message", headers: { 'Authorization': "Bearer " + accessToken }, }).then(function(data) { // Render the message of the day let htmlToRender = ` <h1>Message: <small>/api/message</small> </h1> <p>${data}</p>`; $('#api-message').append(htmlToRender); }) .fail(function(data) { // handle any errors $('#api-message').append("ERROR, check your browsers console log!"); console.log("ERROR!!"); console.log(data.responseJSON); }); // show the logout button $( "#logout" )[0].style.display = 'block'; } function doInit() { $( "#logout" ).click(function() { oktaSignIn.signOut(() => { oktaSignIn.tokenManager.clear(); location.reload(); }); }); // Check if we already have an access token const token = oktaSignIn.tokenManager.get('my_access_token'); // if we do great, just go with it! if (token) { renderApiMessage(token.accessToken) } else { // otherwise show the login widget oktaSignIn.renderEl( {el: '#okta-login-container'}, function (response) { // check if success if (response.status === 'SUCCESS') { // for our example we have the id token and the access token // oktaSignIn.tokenManager.add('my_id_token', response[0]); oktaSignIn.tokenManager.add('my_access_token', response[0]); // hide the widget oktaSignIn.hide(); // now for the fun part! renderApiMessage(response[1].accessToken); } }, function (err) { // handle any errors console.log(err); } ); } }
</script>
</body>
</html>

Whew! You’ve emerged from the jungle of hand-rolled OIDC clients and now have authorization in your Dropwizard server! There were quite a few code examples above, so if you need to verify anything you built along the way, you can always access the complete source for this project on GitHub.

Ok… time to see it in action! You can once again build the project with:

mvn clean package

But this time, you’ll also need to specify the location of the config.yml as a command line argument when starting the server. It needs to include the path relative to the current working directory:

java -jar target/demo-1.0-SNAPSHOT.jar server config.ym

By visiting http:/localhost:8080 in your browser, it should now redirect you to sign in on Okta’s domain, and following that, it should present a message with your email address. If so, congratulations! If you’ve had difficulty at any point along the way, try running the example as is.

Post login message showing API response

You probably have noticed the console warnings when starting your application:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! THIS APPLICATION HAS NO HEALTHCHECKS. THIS MEANS YOU WILL NEVER KNOW !
! IF IT DIES IN PRODUCTION, WHICH MEANS YOU WILL NEVER KNOW IF YOU'RE !
! LETTING YOUR USERS DOWN. YOU SHOULD ADD A HEALTHCHECK FOR EACH OF YOUR !
! APPLICATION'S DEPENDENCIES WHICH FULLY (BUT LIGHTLY) TESTS IT. !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Dropwizard makes it really easy to add existing health checks or to create your own. I’ll leave that as an exercise for you!

Learn More

In this post, I’ve created a self-contained Dropwizard application with a couple JAX-RS resources and a simple HTML page. Take a look at Dropwizard’s getting started guide or these resources for more info.

Original Link

Dynamic DNS using Alibaba Cloud DNS API

Written by Alberto Roura, Alibaba Cloud Tech Share author. Tech Share is Alibaba Cloud’s incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.

We show you how to set up Dynamic DNS on Alibaba Cloud ECS using API. Dynamic DNS is a method of automatically updating a name server record.

According to Wikipedia, “Dynamic DNS (DDNS or DynDNS) is a method of automatically updating a name server record, often in real time, with the active Dynamic DNS configuration of its configured hostnames, addresses or other information.”

Typically, a server has a static IP, and the related domain name contains an A Record stating which one it is. An illustration as an example of how a machine resolves the IP of wikipedia.org is shown below:

Image title

As you can see, there are a lot of steps involved for the visitors’ machine to “translate” wikipedia.org into 145.97.39.155. After the DNS resolves wikipedia.org into its IP address, the computer can locate where the page is hosted in the Internet. This is also the common case for most of websites.

Why We Need a Dynamic DNS solution

For the most part, static IPs work well for accessing the Internet. The problem arises when we want to design a mobile (not just cell phones) network.

For example, if we have some personal NAS or IoT devices, or even a cell phone, we can’t use the same IP address outside of our personal network. 

In this tutorial, we hope to set up a similar network for home devices that we want to access from the outside. For example, you may have a smart home or security device set up and you need to access it while being away from home.

What Do We Need

This tutorial assumes that you already have the following products with Alibaba Cloud:

  • A domain.
  • An ECS instance with Apache & PHP.

If you are not sure how to set up a domain, you can check out some tutorials on Alibaba Cloud Getting Started, or visit the Documentation Center for more information.

The whole idea will be to schedule a cron job in a device at home using curl to run a PHP script hosted in our ECS instance that uses Alibaba Cloud DNS API to update the A Record of the given domain.

The standardized method for dynamically updating a domain name server record is defined in RFC2136, commonly known as dynamic DNS update. This method is a network protocol for use with managed DNS servers, and it includes a security mechanism. Check the relevant documents for RFC2136 if you want to dig more about it.

So, knowing how the DNS works and why we need to set up a Dynamic DNS for our home use, let’s dive into the details. We will use alicloud-php-dns-updater, a PHP script made specifically for this purpose. It is based in a class ready to use.

Clone the Repo

Go ssh into your Alibaba Cloud ECS instance and go to the /var/www/html directory (or whichever one of your choice serving public content).

Once there, type git clone https://github.com/roura356a/alicloud-php-dns-updater.git dyndns-updater.

Get Your Access Keys from Alibaba Cloud

Getting a key pair is easy, and lets you to use more API features apart from the DNS one.

In order to get one, log into your Alibaba Cloud console and in the top navigation bar, hover with your mouse in your email address and click “accesskeys” as illustrated below.

Image title

Once in the keys screen, copy the Access Key ID and the Access Key Secret into a safe place. To show the Secret Key to need to click on “Show.” Be careful where you save this data, as it is very sensitive and could potentially cause irreversible damages if mishandled. Also, you should consider creating more limited keys using their policies, but that’s a topic for another entry.

Setting the Dynamic DNS Updater Script up in the ECS

Going back to our ECS, we need to open the index.php file and replace the placeholders with the information you gathered before, such as ACCESS_KEY_ID and ACCESS_KEY_SECRET.

In this example, I have assumed that our ACCESS_KEY is CAmKUmIUGiMO83mS, our ACCESS_KEY_SECRET is CjKaN02Ann9maMmiauusmoGOI7mn, and the domain customnasathome.com. The index.php file should look like this:


setDomainName('customnasathome.com');
$updater->setRecordType('A');
$updater->setRR('@');
$updater->setValue($newIp); print_r($updater->sendRequest());

Testing the Updater

Now that we have finished all the steps above, it’s time to test if everything is correctly set up. By this moment, you should have a public URL (http://11.111.11.111/dyndns-updater/), which will run the updater just by visiting it. Open it in your browser and look at the output.

If the API response is positive, the output should look like this:


Array
(
[RecordId] => 3666544576879860
[RequestId] => F4VDF8A-D2DF-49VV-ER00-458D6918FDDE
)

Hooray! You successfully updated the A Record of your domain by using Alibaba Cloud DNS API. Easy, right?

Securing the Script

So we are able to change the A Record of a given domain by only opening a URL, either from a browser or using curl, but the URL by default is publicly accessible, and, even if you don’t tell the URL to anyone, is a really bad practice to leave it like that. To secure the access we will use Apache .htaccess and .htpasswd.

.htaccess

Put this file (.htaccess) in the same folder as index.php:


AuthType Basic
AuthName "DNS Updater Access"
AuthUserFile /var/www/dyndns-updater/.htpasswd
Require valid-user

.htpasswd

For this step you need to run a command to create the user and its password.

Type, in any location, htpasswd -c /var/www/dyndns-updater/.htpasswd updater_user.

This will create the file for the first time. “updater_user” is the username you are adding. It will ask you for the password when you run it. According to the official Apache documentation, htpasswd encrypts passwords using either bcrypt, a version of MD5 modified for Apache, SHA1, or the system’s crypt() routine, so the password will be never be saved in plain text. This is important to know, as you will need to keep the password in a safe place after executing the command. You won’t be able to recover it if you forget it because it is encrypted.

After that you should be able to access the URL by providing the username and password.

Cron Job

Cron is a time-based job scheduler utility in Unix-like operating systems. It comes in very handy for running automatic backups or other routine tasks. It suits perfectly in our case, as we will need to check from time to time if the external IP changed to update the A Record of our domain.

The location of the crontab in your instance does not matter, as we will add the cronjob by using the command line.

Run crontab -e and select your favorite editor (if not sure, choose nano, as it is the easiest one out there).

If you choose nano, remember that to exit and save the file, you need to press ctrl + x, then y and enter.

For this tutorial, we are setting the scheduled job to run every 30 minutes. You can see that in the variable /30. If you want to set it every 15 minutes, you should update that part to /15. For more advanced cron adjustments check the official Linux cron guide.

Without authentication:

Go to the bottom of the crontab file and add /30 * curl http://11.111.11.111/dyndns-updater/.

With authentication:

In this case, we will need to add the credentials for basic authentication to curl in order to get access. Go to the bottom of the crontab file and add /30 * curl -u "updater_user:YOUR_PASSWORD" http://11.111.11.111/dyndns-updater/.

Wrapping Up

By default, Alibaba Cloud sends you an email whenever there is any record changes. So you will be able to keep track of all the automated updates the moment they happen. If you want to know more about Alibaba Cloud API, you can visit the official Developer Resources, where you can check all the Alibaba Cloud API references.

Original Link

Streaming Options In The AWS Serverless Application Repository

We are learning more about the AWS Serverless Application Repository, trying to understand what types of functions people are publishing there, and how it might fit into the bigger event-driven architectural picture. The repository is a place to discover, deploy, and publish serverless applications. We want to understand how it all fits into the bigger picture, and make sure we track things as it continues to evolve.

To try and understand how the AWS Serverless Application Repository fits into the event-driven picture we wanted to search and see what types of streaming applications were being developed—here is what we’ve come across so far.

The list provides an interesting snapshot of what is emerging on within the repository, and across the serverless landscape when it comes to streaming. While the majority of it has an AWS service focus, you do see other products emerging like Log4J, and Splunk. Plus the AWS-focused solutions begin to paint a picture of what types of streaming approaches developers are interested in, and what AWS systems are being used to publish and generate streams.

We’ve set up a script to help us monitor any new streaming serverless functions. We’ll also be profiling other serverless scripts in future stories, trying to paint a picture of what serverless means to the event-driven evolution going on across the API sector. The serverless movement is just getting going, but with it already being baked into the AWS, Google, and Azure clouds, it is something that will undoubtedly continue to make a mark on how we deliver API infrastructure. The question is, what does it mean for streaming, event-driven, and other real-time aspects of doing business with APIs?

Original Link

Vectorized Algorithms in Java

There has been a Cambrian explosion of JVM data technologies in recent years. It’s all very exciting, but is the JVM really competitive with C in this area? I would argue that there is a reason Apache Arrow is polyglot, and it’s not just interoperability with Python. To pick on one project impressive enough to be thriving after seven years, if you’ve actually used Apache Spark, you will be aware that it looks fastest next to its predecessor, MapReduce. Big data is a lot like teenage sex: everybody talks about it, nobody really knows how to do it, and everyone keeps their embarrassing stories to themselves. In games of incomplete information, it’s possible to overestimate the competence of others: nobody opens up about how slow their Spark jobs really are because there’s a risk of looking stupid.

If it can be accepted that Spark is inefficient, the question becomes: Is Spark fundamentally inefficient?Flare provides a drop-in replacement for Spark’s backend, but replaces JIT compiled code with highly efficient native code, yielding order of magnitude improvements in job throughput. Some of Flare’s gains come from generating specialized code, but the rest comes from just generating better native code than C2 does. If Flare validates Spark’s execution model, perhaps it raises questions about the suitability of the JVM for high-throughput data processing.

I think this will change radically in the coming years. I think the most important reason is the advent of explicit support for SIMD provided by the vector API, which is currently incubating in Project Panama. Once the vector API is complete, I conjecture that projects like Spark will be able to profit enormously from it. This post takes a look at the API in its current state and ignores performance.

Why Vectorization?

Assuming a flat processor frequency, throughput is improved by a combination of executing many instructions per cycle (pipelining) and processing multiple data items per instruction (SIMD). SIMD instruction sets are provided by Intel as the various generations of SSE and AVX. If throughput is the only goal, maximizing SIMD may even be worth reducing the frequency, which can happen on Intel chips when using AVX. Vectorization allows throughput to be increased by the use of SIMD instructions.

Analytical workloads are particularly suitable for vectorization, especially over columnar data, because they typically involve operations consuming the entire range of a few numerical attributes of a dataset. Vectorized analytical processing with filters is explicitly supported by vector masks, and vectorization is also profitable for operations on indices typically performed for filtering prior to calculations. I don’t actually need to make a strong case for the impact of vectorization on analytical workloads: just read the work of top researchers like Daniel Abadi and Daniel Lemire.

Vectorization in the JVM

C2 provides quite a lot of auto-vectorization, which works very well sometimes, but the support is limited and brittle. I have written about this several times. Because AVX can reduce the processor frequency, it’s not always profitable to vectorize, so compilers employ cost models to decide when they should do so. Such cost models require platform specific calibration, and sometimes C2 can get it wrong. Sometimes, specifically in the case of floating point operations, using SIMD conflicts with the JLS, and the code C2 generates can be quite inefficient. In general, data parallel code can be better optimized by C compilers such as GCC than C2 because there are fewer constraints, and there is a larger budget for analysis at compile time. This all makes having intrinsics very appealing, and as a user, I would like to be able to:

  1. Bypass JLS floating point constraints.
  2. Bypass cost model-based decisions.
  3. Avoid JNI at all costs.
  4. Use a modern “object-functional” style. SIMD intrinsics in C are painful.

There is another attempt to provide SIMD intrinsics to JVM users via LMS, a framework for writing programs which write programs, designed by Tiark Rompf (who is also behind Flare). This work is very promising (I have written about it before), but it uses JNI. It’s only at the prototype stage, but currently, the intrinsics are auto-generated from XML definitions, which leads to a one-to-one mapping to the intrinsics in immintrin.h, yielding a similar programming experience. This could likely be improved a lot, but the reliance on JNI is fundamental, albeit with minimal boundary crossing.

I am quite excited by the vector API in Project Panama because it looks like it will meet all of these requirements, at least to some extent. It remains to be seen quite how far the implementors will go in the direction of associative floating point arithmetic, but it has to opt out of JLS floating point semantics to some extent, which I think is progressive.

The Vector API

Disclaimer: Everything below is based on my experience with a recent build of the experimental code in the Project Panama fork of OpenJDK. I am not affiliated with the design or implementation of this API, may not be using it properly, and it may change according to its designers’ will before it is released!

To understand the vector API, you need to know that there are different register widths and different SIMD instruction sets. Because of my area of work, and 99% of the server market is Intel, I am only interested in AVX, but ARM have their own implementations with different maximum register sizes, which presumably need to be handled by a JVM vector API. On Intel CPUs, SSE instruction sets use up to 128-bit registers (xmm, four ints), AVX and AVX2 use up to 256-bit registers (ymm, eight ints), and AVX512 use up to 512-bit registers (zmm, sixteen ints).

The instruction sets are typed, and instructions designed to operate on packed doubles can’t operate on packed ints without explicit casting. This is modeled by the interface Vector<Shape>, parameterized by the Shape interface, which models the register width.

The types of the vector elements are modeled by abstract element type specific classes such as IntVector. At the leaves of the hierarchy are the concrete classes specialized both to element type and register width, such as IntVector256, which extends IntVector<Shapes.S256Bit>.

Since EJB, the word factory has been a dirty word, which might be why the word species is used in this API. To create a IntVector<Shapes.S256Bit>, you can create the factory/species as follows:

public static final IntVector.IntSpecies<Shapes.S256Bit> YMM_INT = (IntVector.IntSpecies<Shapes.S256Bit>) Vector.species(int.class, Shapes.S_256_BIT);

There are now various ways to create a vector from the species, which all have their use cases. First, you can load vectors from arrays: Imagine you want to calculate the bitwise intersection of two int[]s. This can be written quite cleanly without any shape/register information.

public static int[] intersect(int[] left, int[] right) { assert left.length == right.length; int[] result = new int[left.length]; for (int i = 0; i < left.length; i += YMM_INT.length()) { YMM_INT.fromArray(left, i) .and(YMM_INT.fromArray(right, i)) .intoArray(result, i); }
}

A common pattern in vectorized code is to broadcast a variable into a vector, for instance, to facilitate the multiplication of a scalar by a vector.

IntVector<Shapes.S256Bit> multiplier = YMM_INT.broadcast(x);

Or to create a vector from some scalars; for instance, in a lookup table.

IntVector<Shapes.S256Bit> vector = YMM_INT.scalars(0, 1, 2, 3, 4, 5, 6, 7);

A zero vector can be created from a species:

IntVector<Shapes.S256Bit> zero = YMM_INT.zero();

The big split in the class hierarchy is between integral and floating point types. Integral types have meaningful bitwise operations (I am looking forward to trying to write a vectorized population count algorithm), which are absent from FloatVector and DoubleVector, and there is no concept of fused-multiply-add for integral types, so there is obviously no IntVector.fma. The common subset of operations is arithmetic, casting, and loading/storing operations.

I generally like the API a lot: It feels familiar to programming with streams but, on the other hand, it isn’t too far removed from traditional intrinsics. Below is an implementation of a fast matrix multiplication written in C, and below it is the same code written with the vector API:

static void mmul_tiled_avx_unrolled(const int n, const float *left, const float *right, float *result) { const int block_width = n >= 256 ? 512 : 256; const int block_height = n >= 512 ? 8 : n >= 256 ? 16 : 32; for (int column_offset = 0; column_offset < n; column_offset += block_width) { for (int row_offset = 0; row_offset < n; row_offset += block_height) { for (int i = 0; i < n; ++i) { for (int j = column_offset; j < column_offset + block_width && j < n; j += 64) { __m256 sum1 = _mm256_load_ps(result + i * n + j); __m256 sum2 = _mm256_load_ps(result + i * n + j + 8); __m256 sum3 = _mm256_load_ps(result + i * n + j + 16); __m256 sum4 = _mm256_load_ps(result + i * n + j + 24); __m256 sum5 = _mm256_load_ps(result + i * n + j + 32); __m256 sum6 = _mm256_load_ps(result + i * n + j + 40); __m256 sum7 = _mm256_load_ps(result + i * n + j + 48); __m256 sum8 = _mm256_load_ps(result + i * n + j + 56); for (int k = row_offset; k < row_offset + block_height && k < n; ++k) { __m256 multiplier = _mm256_set1_ps(left[i * n + k]); sum1 = _mm256_fmadd_ps(multiplier, _mm256_load_ps(right + k * n + j), sum1); sum2 = _mm256_fmadd_ps(multiplier, _mm256_load_ps(right + k * n + j + 8), sum2); sum3 = _mm256_fmadd_ps(multiplier, _mm256_load_ps(right + k * n + j + 16), sum3); sum4 = _mm256_fmadd_ps(multiplier, _mm256_load_ps(right + k * n + j + 24), sum4); sum5 = _mm256_fmadd_ps(multiplier, _mm256_load_ps(right + k * n + j + 32), sum5); sum6 = _mm256_fmadd_ps(multiplier, _mm256_load_ps(right + k * n + j + 40), sum6); sum7 = _mm256_fmadd_ps(multiplier, _mm256_load_ps(right + k * n + j + 48), sum7); sum8 = _mm256_fmadd_ps(multiplier, _mm256_load_ps(right + k * n + j + 56), sum8); } _mm256_store_ps(result + i * n + j, sum1); _mm256_store_ps(result + i * n + j + 8, sum2); _mm256_store_ps(result + i * n + j + 16, sum3); _mm256_store_ps(result + i * n + j + 24, sum4); _mm256_store_ps(result + i * n + j + 32, sum5); _mm256_store_ps(result + i * n + j + 40, sum6); _mm256_store_ps(result + i * n + j + 48, sum7); _mm256_store_ps(result + i * n + j + 56, sum8); } } } }
}
 private static void mmul(int n, float[] left, float[] right, float[] result) { int blockWidth = n >= 256 ? 512 : 256; int blockHeight = n >= 512 ? 8 : n >= 256 ? 16 : 32; for (int columnOffset = 0; columnOffset < n; columnOffset += blockWidth) { for (int rowOffset = 0; rowOffset < n; rowOffset += blockHeight) { for (int i = 0; i < n; ++i) { for (int j = columnOffset; j < columnOffset + blockWidth && j < n; j += 64) { var sum1 = YMM_FLOAT.fromArray(result, i * n + j); var sum2 = YMM_FLOAT.fromArray(result, i * n + j + 8); var sum3 = YMM_FLOAT.fromArray(result, i * n + j + 16); var sum4 = YMM_FLOAT.fromArray(result, i * n + j + 24); var sum5 = YMM_FLOAT.fromArray(result, i * n + j + 32); var sum6 = YMM_FLOAT.fromArray(result, i * n + j + 40); var sum7 = YMM_FLOAT.fromArray(result, i * n + j + 48); var sum8 = YMM_FLOAT.fromArray(result, i * n + j + 56); for (int k = rowOffset; k < rowOffset + blockHeight && k < n; ++k) { var multiplier = YMM_FLOAT.broadcast(left[i * n + k]); sum1 = sum1.fma(multiplier, YMM_FLOAT.fromArray(right, k * n + j)); sum2 = sum2.fma(multiplier, YMM_FLOAT.fromArray(right, k * n + j + 8)); sum3 = sum3.fma(multiplier, YMM_FLOAT.fromArray(right, k * n + j + 16)); sum4 = sum4.fma(multiplier, YMM_FLOAT.fromArray(right, k * n + j + 24)); sum5 = sum5.fma(multiplier, YMM_FLOAT.fromArray(right, k * n + j + 32)); sum6 = sum6.fma(multiplier, YMM_FLOAT.fromArray(right, k * n + j + 40)); sum7 = sum7.fma(multiplier, YMM_FLOAT.fromArray(right, k * n + j + 48)); sum8 = sum8.fma(multiplier, YMM_FLOAT.fromArray(right, k * n + j + 56)); } sum1.intoArray(result, i * n + j); sum2.intoArray(result, i * n + j + 8); sum3.intoArray(result, i * n + j + 16); sum4.intoArray(result, i * n + j + 24); sum5.intoArray(result, i * n + j + 32); sum6.intoArray(result, i * n + j + 40); sum7.intoArray(result, i * n + j + 48); sum8.intoArray(result, i * n + j + 56); } } } } }

They just aren’t that different, and it’s easy to translate between the two. I wouldn’t expect it to be fast yet, though. I have no idea what the scope of work involved in implementing all of the C2 intrinsics to make this possible is, but I assume it’s vast. The class jdk.incubator.vector.VectorIntrinsics seems to contain all of the intrinsics implemented so far, and it doesn’t contain every operation used in my array multiplication code. There is also the question of value types and vector box elimination. I will probably look at this again in the future when more of the JIT compiler work has been done, but I’m starting to get very excited about the possibility of much faster JVM-based data processing.

Original Link

Monitoring Kubernetes (Part 2): Best Practices for Alerting on Kubernetes

A step by step cookbook on how to configure alerting in your Kubernetes cluster with a focus on the infrastructure layer.

This article is part of our series on operating Kubernetes in production. Part 1 covered the basics of Kubernetes and monitoring tools; this part covers Kubernetes alerting best practices. Next we’ll cover troubleshooting Kubernetes service discovery, and finally, the final section is a real-world use case of monitoring Kubernetes.

Monitoring is a fundamental piece of every successful infrastructure. Monitoring is the base of hierarchy of reliability. Monitoring helps reducing response time to incidents, enabling detecting, troubleshooting and debugging systemic problems. In the Cloud times where infrastructure is highly dynamic, monitoring is also fundamental for capacity planning.

Effective alerting is at the bedrock of a monitoring strategy. Naturally, with the shift to containers and Kubernetes-orchestrated environments, your alerting strategy will need to evolve as well.

This is due to a few core reasons, many of which we covered in How To Monitor Kubernetes:

  • Visibility: Containers are black boxes. Traditional tools can only check against public monitoring endpoints. If you want to deeply monitor the service in question, you need to take a different approach.
  • New infrastructure layers: Between your services and the host now you have a new layer: the containers and the container orchestrator. These are new internal services that you need to monitor and your monitoring system needs to understand them.
  • Dynamic rescheduling: Containers are not coupled with nodes like services were before, so traditional monitoring doesn’t work effectively. There is no a static endpoint where service metrics are available, no static number of service instances running (think of a canary deployment or auto-scaling setup). It is fine that a process is being killed in one node because most of the chances are that is being rescheduled somewhere else in your infrastructure.
  • Metadata and labels: With services spread across multiple containers, monitoring system level and service specific metrics for all of those, plus all the new services that Kubernetes brings in, how do you give all this information meaning? Sometimes you want to see a metric like network requests across a service distributed in containers in different nodes, sometimes you want to see the same metric for all containers in a specific node, no matter what service they belong to. This is basically a multi-dimensional metric system. You need to look at the metrics from different perspectives. If we automatically tag metrics with the different labels existing in Kubernetes and our monitoring system understands Kubernetes metadata, we just came up with a way of aggregating and segmenting metrics as required on each situation.

With these issues in mind, let’s create a set of alerts that are essential to any Kubernetes environment. Our Kubernetes alerts tutorial will cover:

As a bonus, we’ll also look into how your alerting can accelerate troubleshooting by monitoring your syscalls around problems. Let’s dig in!

Alerting on Application Layer Metrics

Metrics that allow you to confirm that your application performs as expected are known as working metrics. These metrics typically come from your users’ or consumers’ expected actions on the service or metrics generated internally by your applications via stated, or JMX. If your monitoring system natively provides network data, you can also use that to create response time-based alerting.

The following example is a public REST API endpoint monitoring alert for latency over 1 second in a 10-minute window, over the javaapp deployment in the production namespace prod.

All these alerts will highly depend on your application, your workload patterns, etc., but the real question is how you’re going to get the data itself consistently across all your services. In an ideal environment, you don’t need to purchase a synthetic monitoring service in addition to your core monitoring tools to get this one critical set of metrics.

Some metrics and their alerts often found in this category are:

  • Service response time
  • Service availability up/down
  • SLA compliance
  • Successful / error requests per second

Alerting on Services Running on Kubernetes

When looking at the service level, shouldn’t be very different from what you were doing before Kubernetes if you had your services clustered. Think of databases like MySQL/MariaDB or MongoDB where you will look at the replication status and lag. Is there anything to take into account now then?

Yes! If you want to know how your service operates and performs globally you will need to leverage your monitoring tool capabilities to do metric aggregation and segmentation based on containers metadata.

We know Kubernetes tags containers within a deployment, or exposed through a service, as we explained in How to Monitor Kubernetes. Now you need to take that into account when you define your alerts, for example scoping alerts only for the production environment, probably defined by a namespace.

The following is an example from a Cassandra cluster:

The metric cassandra.compactions.pending exists per instance if we run Cassandra in Kubernetes, AWS EC2 or Openstack, but now we want to look at this metric aggregated across the cassandra replication controller inside our prod namespace. Both labels have come from Kubernetes, but we could also change the scope including tags coming from AWS or other cloud providers, like availability zones for example.

Some metrics and their alerts often found in this category are:

  • HTTP requests
  • Database connections, replication
  • Threads, file descriptors, connections
  • Middleware specific metrics: Python uwsgi workers, JVM heap size, etc

Also, if external managed services are being used, you most probably want to import metrics from those providers as they still might have incidents that you need to react to.

Alerting on the Kubernetes Infrastructure

Monitoring and alerting at the container orchestration level is two-fold. On one side we need to monitor if the services handled by Kubernetes do meet the requirements we defined. On the other side, we need to make sure all the components of Kubernetes are up and running.

Services Handled by Kubernetes

1.1 Do we have enough pods/containers running for each application?

Kubernetes has a few options to handle an application that has multiple pods: Deployments, Replica Sets and Replication Controllers. There are few differences between them but the 3 can be used to maintain a number of instances of running the same application. There the number of running instances can be changed dynamically if we scale up and down, and this process can even be automated with auto-scaling.

There are also multiple reasons why the number of running containers can change: rescheduling containers in a different host because a node failed or because there are not enough resources; a rolling deploying of a new version, etc. If the number of replicas or instances running during an extended period of time is lower than the number of replicas we desire, it’s a symptom of something not working properly (not enough nodes or resources available, Kubernetes or Docker Engine failure, Docker image broken, etc).

An alert that compares:

timeAvg(kubernetes.replicaSet.replicas.running) < timeAvg(kubernetes.replicaSet.replicas.desired) 

across all services is almost a must in any Kubernetes deployment. As we mentioned before, this situation is acceptable during container reschedule and migrations, so keep an eye on the configured .spec.minReadySeconds value for each container (time from container start until becomes available in ready status). You might also want to check .spec.strategy.rollingUpdate.maxUnavailable which defines how many containers can be taken offline during a rolling deployment.

The following is an example alert with this condition applied to a deployment wordpress-wordpress within a wordpress namespace in a cluster with name kubernetes-dev.

1.2 Do we have any pod/containers for a given application?

Similar to the previous alert but with higher priority (this one, for example, is a candidate for getting paged in the middle of the night), we will alert if there are no containers at all running for a given application.

In the following example, we apply the alert for the same deployment but triggering if running pods is < 1 during 1 minute:

1.3 Is there any pod/container in a restart loop?

When deploying a new version which is broken, if there are not enough resources available or just some requirements or dependencies are not in place, we might end up with a container or pod restarting continuously in a loop. That’s called CrashLoopBackOff. When this happens pods never get into ready status and therefore are counted as unavailable and not as running, so this scenario is already captured by the alerts before. Still, I like to set up an alert that catches this behavior across our entire infrastructure and lets us know the specific problem right away. It’s not the kind of alert that interrupts your sleep but gives you useful information.

This is an example applied across the entire infrastructure detecting more than 4 restarts over the last 2 minutes:

Monitoring Kubernetes services

In addition to making sure Kubernetes is doing its job, we also want to monitor Kubernetes internal health. This will depend on the different components on your Kubernetes setup, as these can change depend on your deployment choices but there are some basic components that will definitely will be there.

2.1 Is etcd running?

etcd is the distributed service discovery, communication, command channel for Kubernetes. Monitoring etcd can go as deep as monitoring any distributed key value database but will keep things simple here: can we reach etcd?

We can go further and monitor set commands failure or node count, but we will leave that for a future article around etcd monitoring.

2.2 Do we have enough nodes in our cluster?

A node failure is not a problem in Kubernetes, the scheduler will spawn containers in other available nodes. But, what if we are running out of nodes? Or the resources requirements for the deployed applications overbook existing nodes? Or are we hitting any quota limit?

Alerting in this cases is not easy, as it will depend on how many nodes you want to have on standby or how far you want to push oversubscription on your existing nodes. To monitor node status alert on the metrics, use kube_node_status_ready and kube_node_spec_unschedulable.

If you want to alert on capacity, you will have to sum each scheduled pod requests for cpu and memory and then check doesn’t go over each node kube_node_status_capacity_cpu_cores and kube_node_status_capacity_memory_bytes.

Alerting on the Host/Node Layer

Alerting at the host layer shouldn’t be very different from monitoring VMs or machines. It’s going to be mostly about if the host is up or down/unreachable, and resources availability (CPU, memory, disk, etc).

The main difference is the severity of the alerts now. Before, a system down likely meant you had an application down and an incident to handle (barring effective high availability). With Kubernetes, services are now ready to move across hosts and host alerts should never wake up up from bed.

Let’s see a couple of options that we should still consider:

1. Host is down

If a host is down or unreachable we want to receive a notification. We will apply this single alert across our entire infrastructure. We are going to give it a 5 minutes wait time in our case, since we don’t want to see noisy alerts on network connectivity hiccups. You might want to lower that down to 1 or 2 minutes depending on how quickly you want to receive a notification.

2. Disk usage

This is a slightly more complex alert. We apply this alert across all file systems of our entire infrastructure. We manage to do that setting everywhere as scope and firing a separate evaluation/alert per fs.mountDir.

This is a generic alert that triggers over 80% usage but you might want different policies like a second higher priority alert with a higher threshold like 95% or different thresholds depending on the file system.

If you want to create different thresholds for different services or hosts, simply change the scope where you want to apply a particular threshold.

3. Some other resources

Usual suspects in this category are alerts on load, CPU usage, memory and swap usage. You probably want to alert if any of these is significantly high during a prolonged time frame. A compromise needs to be found between the threshold, the wait time and how noise can become your alerting system with no actionable alerts.

If you still want to set up metrics for these resources look at the following metrics:

  • For load: load.average.1m, load.average.5m and load.average.15m
  • For CPU: cpu.used.percent
  • For memory: memory.used.percent or memory.bytes.used
  • For swap: memory.swap.used.percent or memory.swap.bytes.used

Some people also include in this category monitoring the cloud provider resources that are part of their infrastructure.

Sysdig Bonus: Monitoring Syscalls

From the moment an alert triggers and you receive the notification, the real work starts for the members of the DevOps team on duty. Sometimes the run book is as simple as checking it was just a minor anomaly in the workload. It might be an incident in the cloud provider, but hopefully, Kubernetes is taking care of that and just struggling for a few moments coping with the load.

But if some more hairy problem is in front of us, we can see ourselves pulling out all the weapons: providers status pages, logs (hopefully from a central location), any kind of APM or distributed tracing if developers instrumented the code or maybe external synthetic monitoring. Then we pray the incident left some clues for us in any of these places so we can trace down the problem to the multiple source causes that came along together.

What if we could automatically start a dump of all system calls when the alert was fired? System calls are the source of truth of what happens and will contain all the information we can get. Chances are that the issue still is exposing the source causes when the alert is triggered. Sysdig Monitor allows to automatically start a capture of all system calls when an alert gets triggered.

For example I always configure this for CrashLoopBackOff scenarios:

Sysdig Monitor agent exposes metrics that are calculated from system call interception that wouldn’t be possible unless you had instrumented your code. Think of HTTP response time or SQL response time. Sysdig Monitor agent captures read() and write() system calls on sockets, decodes the application protocol and calculates the metrics that are forwarded into our time series database. The beauty of this is getting dashboards and alerts out of the box without touching your developer’s code:

Conclusions

We have seen how using container orchestration platforms increase the number of pieces moving around in your system. Having container native monitoring in place is a key element for having a reliable infrastructure. Monitoring cannot just focus on the infrastructure later but needs to understand the entire stack from the hosts at the bottom up to the top where the application metrics are.

Being able to leverage Kubernetes and cloud providers metadata to aggregate and segment metrics and alerts will be a requirement for effective monitoring across all layers. We have seen how to use the labels to update the alerts we already had or create the new ones required on Kubernetes.

Next, let’s take a look at a typical service discovery troubleshooting challenge in a Kubernetes-orchestrated environment.

Eager to learn more? Join our webinar Container Troubleshooting with Sysdig

Btw, we are running a webinar discussing the challenges of troubleshooting issues and errors in Docker containers and Kubernetes, like pods in CrashLoopBackOff, join this session and learn:

  • How to gain visibility into Docker containers with Sysdig open source and Sysdig Inspect
  • Demo: troubleshoot a 502 Bad Gateway error on containerized app with HAproxy
  • Demo: troubleshoot a web application that mysteriously dies after some time
  • Demo: Nginx Kubernetes pod goes into CrashLoopBackOff, what’s you can do? Will show you how to find the error without SSHin into production servers

Original Link

Everything You Need to Know About API Testing

API testing, a.k.a. Application Programming Interface testing, is the term which has garnered growing attention in the past five years. It is a staple of any internet based product testing team, used for small stuff like image loading to huge stuff like payment processing.

Image title

An API, or application programming interface, is a set of tools, protocols, and programs that glues all of our digital worlds altogether. If you are able to login to Medium, Quora, and other popular sites websites using “Login with Google,” the main hero behind that is an API.

How APIs Made Our Lives Easier

Remember the Trivago guy? Aggregator websites like Trivago bring you offer prices of various hotels from multiple sources like Expedia, Hotels.com, Goibibo, etc, all in a single platform. A user can book a hotel and take an offer rolled out by Expedia, without even logging into Expedia!

So, how does this happen? The simple answer to your question turns out to be APIs.

So, with the help of APIs, your application can communicate with third-party applications without any human intervention, acting as a communication bridge.

API Testing: What Led to the Growth?

Verifying that all the API endpoints act as expected without any breaks in between is the main aim of API Testing. It is one of the most important aspects of a testing process because of:

1. Agile Practices

Organizations are lovingly embracing agile development, calling for dramatically changed ways of automated testing. Continuous builds ask for continuous feedback and improvements, and GUI tests tend to take longer to run. Since API tests do not lean on the UI to be done, they match the frequency to keep pace with Agile development.

2. Internet of Things

IoT is no doubt gaining speed and various sources predict that devices connected to IoT will keep growing, and by the end of 2020, will be at 20 billion. Devices connected to the cloud are highly backed by APIs. You won’t be launching satellites or developing google again to connect the devices in cloud, all we’ll be using will be API. So, it’s incumbent to make sure that the connected devices stay connected.

Where API Testing Stands in Services-Based Architecture

Image title

Types of API Testing

Integration testing, security testing, performance testing, and usability testing are some terms that you might be aware of. API testing holds all these terms under a single umbrella. When you perform API testing, you make sure that your API passes the following tests:

  1. Functional Testing: To make sure that all the API endpoints are up and working and doing what exactly they are supposed to so.

  2. Reliability Testing: Making sure that the API works when connecting to various devices and doesn’t get disconnected.

  3. Load Testing: When various servers send a request to an API, it is necessary to make sure that the API responds to all of them.

  4. Stress Testing: When more than a set number of requests is received by the API, how does it behave? Does it send a message? It’s mandatory to check whether it works as intended.

  5. Security Testing: While giving authentication, it is important to make sure that no security breaches happen in between and that no more than the required data is shared. Have appropriate authentications, permissions, and access controls.

  6. Integration Testing: Ensures that all the APIs connected to each other must communicate properly and the addition of features in the API does not cause bugs in other API modules.

  7. Usability Testing: The API is functional, and on top of it, user-friendly.

The Test Pyramid: Pumping Up API Testing

Coupled with some major use cases like authentication, saving from the pain of writing the already written code, there are certain features that add up to the need of API Testing.

One of them was well explained by Mike Cohn in his book Succeeding with Agile: Software Development Using Scrum with the help of the test pyramid. According to which, automation test strategy calls for automating the different levels:

Image title

The need for automating the tests increases from top to bottom. Unit tests forming the base of the pyramid calls to be automated first and the GUI tests forming the top are the ones required to be the least automated. But we are concerned about the ones in the middle: Service/API layer tests. Their proportion describes their relevance to be automated.

Furthermore, API automated testing takes far much less time than automated UI tests. In some cases, it takes less than 1 second to run a single end-to-end API test, thus blending with CI protocols.

The bottom line is, when you are developing an application, smooth communication with various other apps must be at the top of your checklist, and API testing helps you complete that checklist.

Happy testing!

Original Link

Kubernetes, Kafka Event Sourcing Architecture Patterns, and Use Case Examples

With the rapidly changing business and technology landscape of today, developers, data scientists, and IT operations are working together to build intelligent applications with new technologies and dynamic architectures because of the flexibility, speed of delivery, and maintainability that they make possible. This post will go over the technologies that are facilitating evolutionary architectures: containers, Kubernetes, and the Kafka API. Then, we will look at some Kafka event sourcing architecture patterns and use case examples.

Containers

Containers simplify going from development to deployment without having to worry about portability or reproducibility. Developers can package an application plus all its dependencies, libraries, and configuration files needed to execute the application into a container image. A container is a runnable instance of an image. Container images can be pulled from a registry and deployed anywhere the container runtime is installed: your laptop, servers on-premises, or in the cloud.

Source

Compared to virtual machines, containers have similar resources and isolation benefits but are lighter in weight because containers virtualize the operating system instead of the hardware. Containers are more portable and efficient, take up less space, use far fewer system resources, and can be spun up in seconds.

Kubernetes

Kubernetes provides a platform to configure, automate, and manage:

  • Intelligent and balanced scheduling of containers
  • Creation, deletion, and movement of containers
  • Easy scaling of containers
  • Monitoring and self-healing abilities

A Kubernetes cluster is comprised of at least one master node, which manages the cluster, and multiple worker nodes, where containerized applications run using Pods. A Pod is a logical grouping of one or more containers, which are scheduled together and share resources. Pods enable multiple containers to run on a host machine and share resources, such as storage, networking, and container runtime information.

The Master node manages the cluster in this way:

  • The API server parses the YAML configuration and stores the configuration in the etcd key-value store.
  • The etcd stores and replicates the current configuration and run state of the cluster.
  • The scheduler schedules pods on worker nodes.
  • The controller manager manages the state of non-terminating control loops, such as pod replicas.

The microservice architectural style is an approach to developing an application as a suite of small independently deployable services built around specific business capabilities. A microservice approach is well aligned to containers and Kubernetes. You can gain modularity, extensive parallelism, and cost-effective scaling by deploying services across many nodes. Microservices modularity facilitates independent updates/deployments and helps to avoid single points of failure, which can help prevent large-scale outages.

The MapR Data Fabric includes a natively integrated Kubernetes volume driver to provide persistent storage volumes for access to any data located on-premises, across clouds, and to the edge. Stateful applications can now be easily deployed in containers for production use cases, machine learning pipelines, and multi-tenant use cases.

Event-Driven Microservices

Most business data is produced as a sequence of events or an event stream; e.g. web or mobile app interactions, sensor data, bank transactions, and medical devices all continuously generate events. Microservices often have an event-driven architecture using an append-only event stream such as Kafka or MapR Event Streams (which provides a Kafka API).

With MapR-ES (or Kafka), events are grouped into logical collections of events called “topics.” Topics are partitioned for parallel processing. You can think of a partitioned topic like an event log, new events are appended to the end, and like a queue, events are delivered in the order they are received.

Unlike a queue, events are not deleted after they are delivered; they remain on the partition, available to other consumers.

Older messages are automatically deleted based on the stream’s time-to-live setting; if the setting is 0, then they will never be deleted.

Messages are not deleted from topics when read, and topics can have multiple different consumers. This allows for the processing of the same messages by different consumers for different purposes. Pipelining is also possible, where a consumer enriches an event and publishes it to another topic.

MapR-ES provides scalable high-performance messaging, easily delivering millions of messages per second on modest hardware. The publish/subscribe Kafka API provides decoupled communications, making it easy to add new listeners or new publishers without disrupting existing processes.

When you combine these messaging capabilities with the simple concept of microservices, you can greatly enhance the agility with which you build, deploy, and maintain complex data pipelines. Pipelines are constructed by simply chaining together multiple microservices, each of which listens for the arrival of some data, performs its designated task, and optionally publishes its own messages to a topic.

The Stream Is the System of Record

Event sourcing is an architectural pattern in which the state of the application is determined by a sequence of events, each of which is recorded in an append-only event store or stream. As an example, imagine that each “event” is an incremental update to an entry in a database. In this case, the state of a particular entry is simply the accumulation of events pertaining to that entry. In the example below, the stream persists the queue of all deposit and withdrawal events, and the database table persists the current account balances.

Which one of these — the stream or the database — makes a better system of record? The events in the stream can be used to reconstruct the current account balances in the database, but not the other way around. Database replication actually works by suppliers writing changes to a change log, and consumers applying the changes locally.

Adding Microservices to a Bank Monolithic Application With Change Data Capture

Banks often have mainframe applications, which are expensive to run, difficult to update, and also difficult to completely replace. Let’s look at how we could incrementally add event-driven microservices to a monolithic bank application, which consists of payment transactions and batch jobs for fraud detection, statements, and promotion emails.

In the design shown below, payment transactions from the monolithic database commit log are published to a stream, which is set to never throw data away. The immutable event store (stream) becomes the system of record, with events processed by different data pipelines based on the use case. Event data pipelines funnel out to polyglot persistence and different data storage technologies, each one providing different materialized views — MapR-DB HBase and MapR-DB JSON document, graph, and search databases — so that microservices always have the most up-to-date view of their data in the most appropriate format. Using a different model for reading than for writing is the Command Query Responsibility Separation pattern.

The event store provides for rebuilding state by re-running the events in the stream. This is the event sourcing pattern. Events can be reprocessed to create a new index, cache, or view of the data.

The consumer simply reads from the oldest message to the latest to create a new view of the data.

With the payment transactions now coming in as an event stream, real-time fraud detection, using Spark Machine Learning and Streaming, could be added more easily than before, as shown in the data flow below:

Having a long retention time for events in the stream allows for more analysis and functionality to be added. For example, a materialized view of card location histories could be stored in a data format such as Parquet, which provides very efficient querying.

Evolving the Architecture by Adding Events and Microservices

With more event sources, stream processing and machine learning can be added to provide new functionality. Machine learning techniques across a wide range of interactions — including clickstream, click through rates, call center reports, customer preferences, and purchase data — can be used to provide insights such as financial recommendations, predictions, alerts, and relevant offers. For example, web clickstream analysis combined with purchase history can be used to segment customers who share behavioral affinities into groups in order to better target advertisements. Lead events can be added to a stream when a customer clicks on targeted offers, triggering updates to the customer profile in MapR-DB and automated campaigns to prospects.

Healthcare Event Sourcing Examples

Now, let’s look at how a stream-first architecture has been implemented in healthcare by Liaison Technologies for a state health information network. Data from hospitals, providers, and labs flow into the ALLOY Health Platform. MapR-ES solves the data lineage problem of HIPAA compliance because the stream becomes a system of record by being an infinite, immutable log of each data change. Polyglot persistence solves the problem of storing multiple data formats. By streaming data changes in real-time to the MapR-DB HBase API/MapR-DB JSON API, graph, and search databases, materialized views can be provided, explored, and analyzed for different use cases, such as population health queries and patient matching.

Other healthcare stream processing and machine learning data pipeline examples include:

UnitedHealthcare and Optum labs are using predictive analytics on claims events to reduce fraud waste and abuse for healthcare payments.

  • Optum Labs is using predictive analytics across multiple sources from over 30 million patients to:

Retail Event Sourcing Example

A major retailer wanted to increase in-season agility and inventory discipline in order to react to demand changes and reduce markdowns.

Data is collected from point-of-sale transactions, inventory status and pricing, competitive intelligence, social media, weather, and customers (scrubbed of personal identification), allowing for a centralized analysis of correlations and patterns that are relevant to improving business. Big data algorithms analyze in-store and online purchases, Twitter trends, local sports events, and weather-buying patterns to build innovative applications that personalize customer experience while increasing the efficiency of logistics. Point-of-sale transactions are analyzed to provide product recommendations or discounts based on which products were bought together or before another product. Predictive analytics is used to know what products sell more on particular days in certain kinds of stores in order to reduce overstock and stay properly stocked on the most in-demand products, thereby helping to optimize the supply chain.

Conclusion

A confluence of several different technology shifts have dramatically changed the way that applications are being built. The combination of event-driven microservices, containers, Kubernetes, and machine learning data pipelines is accelerating the development of next-generation intelligent applications, which are taking advantage of modern computational paradigms, powered by modern computational infrastructure. The MapR Converged Data Platform integrates global event streaming, real-time database capabilities, and scalable enterprise storage with a collection of data processing and analytical engines to power this new generation of data processing pipelines and intelligent applications.

Original Link

Getting Started With the Node-Influx Client Library

When in doubt, start at the beginning — an adage that applies to any learning journey, including getting started with the node-influx client library. Let’s take a look at the InfluxDB client libraries — in particular, node-influx, an InfluxDB client for JavaScript users. This client library features a simple API for most InfluxDB operations and is fully supported in Node and the browser, all without needing any extra dependencies.

Image title

Embark on a new journey with node-influx!

There’s a great tutorial for the node-influx library available online, as well as some handy documentation, which I recommend reading through beforehand. Here, we will just cover a few of the basics.

What You’ll Need

For this tutorial, I’ll be running a local installation of InfluxDB; you can learn how to get that up and running here. You’ll also need Node installed. If Node.js is not your cup of tea, there are plenty of other client libraries to work with and several guides on using InfluxDB with other languages available, such as these posts on Python and Ruby.

Set the Scene

Image of two surfers walking into the ocean

How to get slotted

Let’s imagine for a minute you have an inexplicable love for surfing. You find yourself in Hawaii — on a journey following in Duke’s footsteps — and you’re trying to find the best surf spot and the besttimeat which to surf said amazing spot. Makes sense to take a look at the tides, right? Well, according to our trusty friend Wikipedia, ocean tides are a great example of time series data. They ebb and flow over time (yes, I know I’m laying it on rather thick here). So, let’s practice putting some sample tide data into InfluxDB using the node-influx library and see what happens.

First things first, we need to install the node-influx library in the application folder where it will be used.

$ npm install --save influx

This adds the node-influx library to our node_modules; we also need to require the library into our server file, like so:

const Influx = require('influx');

We’ll use the following constructor function to connect to a single InfluxDB instance and specify our connection options.

const influx = new Influx.InfluxDB({ host: 'localhost', database: 'ocean_tides', schema: [ { measurement: 'tide', fields: { height: Influx.FieldType.FLOAT }, tags: ['unit', 'location'] } ]
});

There are a few different options available here:

  • You could connect to a single host by passing the DSN as a string into the constructor argument, like so:
const influx = new Influx.InfluxDB('http://user:password@host:8086/database')
  • You could also pass in a full set of config details and specify properties such as username, password, database, host, port, and schema — that’s what we did above.
  • If you have multiple Influx nodes to connect to, you can pass in a cluster config. For example:
    const client = new InfluxDB({ database: 'my_database', username: 'duke_kahanamoku', password: 'aloha', hosts: [ { host: 'db1.example.com' }, { host: 'db2.example.com' }, ] schema: [ { measurement: 'tide', fields: { height: Influx.FieldType.FLOAT }, tags: ['unit', 'location'] } ]
    })
    

It’s worth noting here that within your schema design, you will need to designate the FieldType for your field values using Influx.FieldType — they can be strings, integers, floats, or booleans.

Checking the Database

We can use influx.getDatabaseNames() to first check if our database already exists. If it doesn’t, we can then use influx.createDatabase() to create our database. See below:

influx.getDatabaseNames() .then(names => { if (!names.includes('ocean_tides')) { return influx.createDatabase('ocean_tides'); } }) .then(() => { app.listen(app.get('port'), () => { console.log(`Listening on ${app.get('port')}.`); }); writeDataToInflux(hanalei); writeDataToInflux(hilo); writeDataToInflux(honolulu); writeDataToInflux(kahului); }) .catch(error => console.log({ error }));

We are first grabbing all the databases available from our connected Influx instance, and then cycling through the returned array to see if any of the names match up with ocean_tides. If none do, then we create a new database with that name. The callback from that then writes our data into the database.

Writing Data to InfluxDB

Using influx.writePoints(), we can write our data points into the database.

influx.writePoints([ { measurement: 'tide', tags: { unit: locationObj.rawtide.tideInfo[0].units, location: locationObj.rawtide.tideInfo[0].tideSite, }, fields: { height: tidePoint.height }, timestamp: tidePoint.epoch, } ], { database: 'ocean_tides', precision: 's', }) .catch(error => { console.error(`Error saving data to InfluxDB! ${err.stack}`) });

To keep things simple, I just pull in a few sample data files, then loop through them by location and write each data point to InfluxDB under the measurement name tide with location and unit tags (both are strings). There is only one field here, height, and I send in a timestamp, as well, although that is not technically required (it’s more accurate, though). You can specify additional options such as the database to write to, the time precision, and the retention policy.

Querying the Database

We’ve learned how to write data into the database; now, we need to know how to query for that data. It’s simple: we can use influx.query() and pass in our InfluxQL statement to retrieve the data we want.

influx.query(` select * from tide where location =~ /(?i)(${place})/ `) .then( result => response.status(200).json(result) ) .catch( error => response.status(500).json({ error }) );

Here, we are querying the database for any data from measurement tide where the location contains the place name passed in (using a regular expression). If you’ve stored a lot of data, it’s a good idea to also limit your query to a certain time span. You can additionally pass in an options object (database, retention policy, and time precision) to the influx.query() method.

Conclusion

Image title

That covers all the basics for the node-influx client library. Have a scan of the docs and let us know if there are other use cases you’d like to hear about! I’ve also posted all this code in a repository on GitHub if you want to try it out for yourself. Questions and comments? Reach out to us on Twitter: @mschae16 or @influxDB. Now go forth and find that monster wave — surf’s up!

Original Link

Introduction to GraphQL

Creating a web application can seem like a complicated task depending on how many functions should be in it. In any situation, development requires lots of effort from developers — starting with architecture, then frontend and backend development, and then, of course, testing. Some may compare the process of application creation to an onion, which has lots of different layers. One of the layers that is really important is where you have to arrange a special inquiry tool.

Don’t know what it is?

Inquiry tools are the bridges that make it possible to get data from the backend. Seem clear and simple? The process of such tool-building can be really challenging, especially if you are a junior developer. For that purpose, various companies and organizations have created a lot of helpful platforms that make it easier to start and understand the development process. One of such platform is GraphQL. Despite its newness, many top development companies globally have used it a lot in their projects.

Surely, you want to know more, but let’s start with the basics.

What Is the Purpose of GraphQL?

GraphQL was created directly for different APIs. Its main purpose is to use flexible syntax and systems that simply describe the data requirements and interactions. Throughout its history, GraphQL became an example of properly functioning and reliable software, which could be used in a pretty simple way — even by junior-level programmers. Thanks to its features and opportunities, which have been implemented by the creators, GraphQL was able to replace other earlier customized tools, which have been designed for the same purpose. When we discuss the functions and aspects of GraphQL, it is essential to present those key opportunities.

GraphQL was based on the usage of different important functions. What are they?

  • First of all, the essential primary function is schema. When you are using that tool, you get the opportunity to collect various inquiries in a proper way to avoid a mess. Even more, thanks to the simplicity, the entire process of collecting all needed inquiries became much faster and easier. So, there is no need, for now, to write a huge piece of code just to achieve an easy purpose.
  • The second one is a function called query. That function operates with the inquiries on reading types with attributes. For any system operating with APIs, that function could play one of the most important roles. Thus, in order to make it much more straightforward, the creators replaced various functions and options with only one, which is also really useful if you have a bit of practice in this area.
  • Mutation is another essential function focused on the inquiries of the type recording. It’s interesting to know that GraphQL is operated only through the usage of several function sets, and it is really better comparing to those systems that have been used previously.
  • The attribute function makes it possible to launch a filtered and detailed search. It becomes valuable if you need a direct and certain response to a certain inquiry.
  • The last two tools, type and field, were made regarding entity description and key characteristics of a type, respectively.

Advantages and Disadvantages of GraphQL

No one could argue that a tool is totally perfect. While you are thinking about starting to use GraphQL, it’s good to learn the advantages and disadvantages of it.

What Are the Pros?

  • Development time. As you may have already noticed, one of the main benefits of GraphQL is that you, as a potential user, can do some development things much more quickly. For example, instead of writing huge texts of code, it may be enough just to use one or two primary functions to achieve what you need.
  • Opportunity to simply perform changes. The next pro is the high level of flexibility of your project. It is possible thanks to the simplicity of code writing. Even if you have a fully operational and finished program, you still have the opportunity to alter it in any way you need and want, making it more sophisticated — or, as a contrast, more simple and clear for those who are going to use your application. So, the flexibility of the code structure and entire system deserves the first place in the list of all of the benefits of that very system.
  • Simple understanding. Another positive feature is the high level of organization within the system itself. You may remember that previously, we were talking about the type function. Thanks to this function, the searching process becomes elementary for any user. Due to the existence of such a tool, any person can get what they need without a mound of useless data and information. Thus, you may be able to evaluate the entire potential of GraphQL as a system for the usage of any person regardless of their qualification in a specific area.
  • Documentation magic. Another benefit of GraphQL is a function of self-documentation, which allows you not to worry about the formalities. If you are using an older version of the software similar to GraphQL, it is possible to face useless and pretty complicated documentation within the code itself, which is not good if you are dealing with a big project and have to deal with other many essential aspects of it while setting the priorities wisely.

However, don’t forget about inconvenient features. You may think that they are not so valuable during the development process. But forewarned is forearmed.

What Are the Cons?

GraphQL suffers from a lack of a proper middleware structure. This could be solved by dividing it into different schemes. It is possible to make the API so that the functionality is grouped according to which middleware they need closed, but many schemes are not the best option.

Of course, the development process would not be stuck if your entire project is functions-compact. Keep in mind that you may have to deal with such an issue.

Another thing is the problem with bugs. We have to admit and keep in mind that during the development process, it will not be a surprise at all if you face several severe and harmful bugs that will make it impossible to properly regulate the API. Due to this, making backups becomes an essential task if you are interested in a good result and a fully functioning and well-regulated program. But who doesn’t face this, right?

How to Use GraphQL

These steps will help you organize the right flow. Just follow them one by one.

Installation of GraphQL

Operating with this language, in theory, looks pretty simple. To install GraphQL, you may need only your brain, your hands, and a computer. The installation begins with a composer that is already really comprehensive. This means, that you do not need a tone of various additional extensions and programs to launch that machine. Installation can be launched only through one particular function: composer.json. If you are going to use it with a connection to Laravel, going here is a good decision. 

How to Create a Schema

As with the majority of things in GraphQL, the creation of a schema is simple. Schemas are useful if you have to include public endpoints into your project. Moreover, those that need identification may require a lot of your attention. So, in order to arrange a schema, you may need two basic tools: function and facade. Combining these tools, you may be able to operate with a finished version of a schema for your API without any problems and complications.

How to Create a Query

First of all, you have to arrange a type. After you create one, you must add it to a special files folder, which is config/graphql.php. After that, you may be sure that a half of the job is done and you need only to implement some final frames to get a proper result. After that step, you need to be really careful and find out the queries that have to return a particular type. Finally, you must add all that you have to the same folder config/graphql.php.

How to Arrange a Mutation

The creation of a mutation may be similar to the query creation. Lots of the operations within GraphQL tend to have similar algorithms, meaning that you just need to keep in mind how to use the basics and then later put more and more layers on your fundamental knowledge. Mutations accept arguments and then return certain types. For example, you may use it if you have an obligation to change your password. Now, you need to use the function, which is called resolve. That tool allows you to upgrade an existing model and then return it after the improvements. You may also use different validation rules. Yes, maybe it’s going to be more complicated, but anyway, it is a kind of diversity. In the very end, when you already have a finished function, you have to add it to an already well-known folder, which is config/graphql.php.

How to Make a Flow

The flow is the most complex part of the usage of GraphQL due to the fact that you need to make a lot of movements here. So, first of all, you have to write the type for each of the objects. And for creation type, you need the finished model.

After its creation, you must define for which fields you need them and then deal with the method of returning them. After that, you have to register all of the types in the schema, as in the example above. But there is a pretty important feature that you have to keep in mind: if the object you are going to make is independent, you must write a query for it. Finally, after you have all of the needed elements, you have to design a mutation while using all of the components.

Conclusion

In this article, our team was doing our best to show how to operate with a query manager as widespread as GraphQL. Due to the fact that it is becoming more and more popular, lots of the businessmen and beginning programmers who are interested in the creation of their own applications start looking for the skilled experts who always are ready to help. Mostly, they face two options: choosing highly expensive IT companies or choosing freelancers, who are not always reliable.

Original Link

Mocking JDBC Using a Set of SQL String/Result Pairs

In a previous post, I showed how the programmatic MockDataProvider can be used to mock the entire JDBC API through a single functional interface:

// context contains the SQL string and bind variables, etc.
MockDataProvider provider = context -> { // This defines the update counts, result sets, etc. // depending on the context above. return new MockResult[] { ... }
};

Writing the provider manually can be tedious in some cases, especially when a few static SQL strings need to be mocked and constant result sets would be OK. In that case, the MockFileDatabase is a convenient implementation that is based on a text file (or SQL string), which contains a set of SQL string/result pairs of the form:

Original Link

Apache Flink Basic Transformation Example

Apache Flink is a stream processing framework with added capabilities such as batch processing, graph algorithms, machine learning, reports, and trends insight. Using Apache Flink can help you build a vast amount of data in a very efficient and scalable manner.

In this article, we’ll be reading data from a file, transforming it to uppercase, and writing it into a different file.

Gradle Dependencies

dependencies { compile "org.apache.flink:flink-java:1.4.2" compile "org.apache.flink:flink-streaming-java_2.11:1.4.2" compile "org.apache.flink:flink-clients_2.11" }

Core Concept of Flink API

Image title

When working with the Flink API:

  • DataSource represents a connection to the original data source.
  • Transformation represents what needs to be performed on the events within the data streams. A variety of functions for transforming data are provided, including filtering, mapping, joining, grouping, and aggregating.
  • Data sink triggers the execution of a stream to produce the desired result of the program, such as saving the result to the file system, printing it to the standard output, writing to the database, or writing to some other application.

The data source and data sink components can be set up easily using built-in connectors that Flink provides to different kinds of sources and sinks.

Flink transformations are lazy, meaning that they are not executed until a sink operation is invoked.

Code:

package com.uppi.poc.flink; import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.core.fs.FileSystem.WriteMode;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; public class UpperCaseTransformationApp { public static void main(String[] args) throws Exception { DataStream < String > dataStream = null; final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); final ParameterTool params = ParameterTool.fromArgs(args); env.getConfig().setGlobalJobParameters(params); if (params.has("input") && params.has("output")) { //data source dataStream = env.readTextFile(params.get("input")); } else { System.err.println("No input specified. Please run 'UpperCaseTransformationApp --input <file-to-path> --output <file-to-path>'"); return; } if (dataStream == null) { System.err.println("DataStream created as null, check file path"); System.exit(1); return; } //transformation SingleOutputStreamOperator < String > soso = dataStream.map(String::toUpperCase); //data sink soso.writeAsText(params.get("output"), WriteMode.OVERWRITE); env.execute("read and write"); } }

DataSet API Transformation

As you can see, dataStream is initialized as null but later, we will create it.

DataStream<String> dataStream=null;

Initializing Flink Environment

The next step is to initialize stream execution environment by calling this helper method:

StreamExecutionEnvironment. getExecutionEnvironment()

Flink figures out which environment you submitted (whether it is a local environment or cluster environment).

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

Data Stream Creation 

To get the data stream from the data source, just call the built-in Flink API method readTextFile() from StreamExecutionEnvironment. This method reads file content from a given file and returns it as a dataStream object.

Example:

dataStream.readTextFile(params.get("input"));

ParameterTools

The ParamerterTools class represents user command line arguments. Example:

$ flink run flink-basic-example-1.0.jar --input c:\tools\input.txt--output c:\tools\output.txt

We need to send all command line arguments to execution environment by calling:

env.getConfig().setGlobalJobParameters(params);

Writing Output to Data Sink

printn(): The data stream print method writes each entity of the data stream to a Flink log file.

writeAsText(): This method has two arguments: the first argument is the output file/path and the second argument is writer mode.

Example:

soso.writeAsText(params.get("output"),WriteMode.OVERWRITE);

Trigger Flow Execution

All actions specified earlier will happen. If you don’t call the execute method, your program will complete without doing anything. When calling the execute method, you can specify the name of the job. Example:

env.execute("read and write");

Running Flink Application

Step 1: Clone the project from GitHub and run the Gradle command >  gradlew clean build. Once the build is a success, it generates a flink-basic-example-1.0.jar file in the current project folder’s /build/libs directory.

Step 2: Run the Flink server on Windows with start-local.bat.

Image title

Step 3: Run your own Flink application command line by going to the Flink installation folder and type the following command:

flink run <path-to-jar-file> --input <path-to-file> --output <path-to-file>

Example:

Image title

Image title

input.txt:

training in Big Data Hadoop, Apache Spark, Apache Flink, Apache Kafka, Hbase, Apache Hadoop Admin

output.txt:

TRAINING IN BIG DATA HADOOP, APACHE SPARK, APACHE FLINK, APACHE KAFKA, HBASE, APACHE HADOOP ADMIN

Step 5: Running the Flink application using the Flink web UI. Typelocalhost:8081(by default, Flink runs on port 8081).

Image title

Click Submit New Job > Add New Task and upload generated JAR file. This JAR lets you generate with:

gradlew clean build (jar location - /build/libs)

Image title

Just enter the program arguments input box as:

--input <path-to-file> --output <path-to-file>

And hit Submit. After running the job, you will see the below screen if you hit Completed Jobs.

Image title

Here’s the GitHub link.

Original Link

A Guide to Creating an API for Your Mobile App

In the current innovative atmosphere, a host of helpful and informative content on designing proper APIs that let mobile applications communicate with specific web or cloud-based services has been put out. Although most of these concepts and practices may have aided mobile API designers over the years, many others have faded away as time passed. Making the most of back-end APIs for fantastic mobile consumer experience is something that should be considered by everyone who is looking to improve the performance of a mobile app. This piece will showcase some proven tips when it comes to designing excellent mobile APIs that will ensure that customers are remotely served with app and data resources. Read on and learn more.

Optimize LocalStorage and Caching

If you want to halt the negative impact that slow mobile networks cause for your app’s performance, make sure that you store the CSS, HTML, and all images in localStorage. Many mobile app owners have reportedly witnessed a cut-down in the average size of their HTML data, from 200KB to 30KB. Additionally, it is ideal to move all unchangeable data like main navigation and categories (to name a few) within your mobile app. By doing this, you won’t require a trip through the chosen mobile network, removing the stress of pre-fetching information (like queries, user data, and paginated results) that would normally be loaded on the device with no extra requests.

Pagination of Results Is Compulsory

Pagination is a unique technique used to prevent the return of thousands of records of numerous consumers at a particular time. Hence, while building your mobile API, do not forget to paginate the results that will return a specific list of items. In batches, you can conveniently initiate pagination manually using the LIMIT and OFFSET statements in your data collections and queries or data. A point that you must bear in mind is that it is compulsory to display pagination metadata when each paginated result is returned. A feasible option for carrying this out is using HTTP Link-Headers in the responses. It is essential for you to understand that this header will include a full URL to the starting, last, next, and previous pages of the result set, making it more comfortable for clients to simultaneously deal with multiple paginated results. Simple parsing is attainable with on-time pagination of outcomes.

Use JSON

The days where passing POST information as URL-encoded data was seen as an effective option are gone. Today, things have changed, which is why it is recommended that you use JSON to send data to endpoints. By doing this, you make requests more readable and easier to assemble by the user. Rails have served as a faster means of dealing with JSON-encoded parameters. The Rails ActionController will automatically unwrap data fetched with JSON, allowing you access to it using params hash.

Do Not Neglect Authentication for a Non-Public API

Web hackers are all over the place. If you are creating a non-public API, you need to have made a streamlined authentication system available. Unlike the traditional TokenAuthenticatable initiative that has been used for authentication mechanisms in private APIs, it is ideal to leverage HTTPs’ primary Authentication. Implemented in each HTTP client, this basic authentication expects consumers to input valid usernames and passwords to gain access to the API. Additionally, you can let your users sign into your API using private access tokens.

Make Sure You Have Proper Versioning From the Start

Since your mobile API must go through changes at some point in the future, it is up to you to optimize it for appropriate versioning. Serving as the contract between backend and applications using it, your API must be available in a wide range of versions. By doing this, you will be able to continue using the app, which is very well familiar with the recent API changes that are introduced, as time passes into the new application version. Ignoring versioning of your API would make apps non-functional when there is any modification made to that API.

Implement a Rate Limit Early in the API Designing Process

Even though things are calm in the starting phase when clients begin to make use of your API, things may eventually get out of control if your app becomes a massive success. This means that more and more people will opt to integrate or assimilate your API into their infrastructure and workflow, thus calling your endpoint and requesting the same URL over and over thousands of times every single hour. This is where the concept of initiating a rate limit serves as a bright spot or ray of hope. An average rate limit would prevent the servers from going down with a CI server and offer users a clearer indication of how your API can be used more accurately. For larger infrastructure, Nginx limit_req should serve as the appropriate approach to initiate a rate limit; Redis can work wonders for others.

Your API Should Be Accompanied by Commendable Documentation

Documentation is the paramount part of your API design plan. Ambiguous or unclear documentation can frustrate app developers to the point that they might just abandon your product for something different. Hence, it is important to offer nice, error-free documentation that is free of lengthy code snippets. Most developers prefer simply browsing through examples. You should ensure that you include this to provide a better understanding of the API and its utilities. If you want to offer code snippets to your users, it’s better to include them in different test scenarios. Doing this will authenticate the updated status of your documentation every time the API is moderated. To create the appropriate documentation for your API, go for any of the renowned ready-made tools, such as Apipie, or choose a custom-made solution.

Original Link

How an API-in-a-Box Can Deliver Data Analytics Nirvana

“Information  is the oil of the 21st century, and analytics is the combustion engine.”

As early as 2011, Peter Sondergaard, a Senior Vice President at Gartner, predicted a change in data management strategies known as big data, the pursuit of which would create an unprecedented amount of information of enormous variety and complexity.

He was right. Today, organizations store vast amounts of data, much of it across multiple, disparate databases that are unable to talk to each other. It’s a problem that’s exasperated by mergers and acquisitions where new datasets are inherited.

Organizations generally understand the power behind analytics, but how do you make it work culturally and technically? We take a look at the barriers to data analytics success and suggest new approaches that buck the system, with dramatic results.

The Technology Challenge

Different departments will always need separate access rights. Probably most HR data should be accessible other than financial data. Highly sensitive data should only be accessed by authorized personnel, while other data (sales/marketing) might need to be shared among cross-functional teams during certain time intervals.

A common approach to this problem is to store all “sharable” data in a data warehouse — which is an expensive and time-consuming approach. 80% of the effort of a typical data project is focused solely on cleaning data. Furthermore, extracting data from its original source into a warehouse duplicates that data, increasing storage demands. Another problem is that data, such as sales forecasts, ages quickly. Without a way to continually and automatically update that data in the warehouse — in real-time — your analytics will be founded on outdated information.

The Cultural Challenge

To become a truly data-driven organization, a cultural shift is necessary. Change always prompts concern. Fear of change and subsequent data fiefdoms are some of the main reasons why data analytics projects fail. Data owners fear they’ll lose relevance or control of data if they are forced to share datasets with other departments, agencies, or external expertise is brought in.

Without a Shift, Data Analytics Is Destined to Fail

Welcome to the world where technological complexity and cultural fiefdom is killing data analytics projects. It comes as no surprise that 60% of data analytics projects fail.

How can organizations counteract these challenges and find a way to connect disparate data (only using the original data source) while gaining buy-in from the team? The answer lies in an unlikely source: application program interfaces (APIs).

Addressing the Technology Challenge: Break Down Silos With APIs

Data may be today’s oil, but it will be tomorrow’s oxygen. Mobile devices, IoT, and cloud applications generate vast data streams. We’ve come to expect access to valuable information at our fingertips.

Enterprise data problem-solving has also changed. Gone are the days when software giants, such as Microsoft, SAP, Oracle, and MicroStrategy, were one-stop shops for addressing your data challenges. Today. you can mix and match data from different systems without the help of the big guys.

Thanks to APIs, disparate systems can now interact with one another and exchange data. With APIs being lightweight (APIs eliminate the need for traditional hard-coded system integration), modern, flexible, and less risky than other data sharing approaches, API use is booming.

Concurrently, data warehouses are losing relevance. They still have a role to play but are no longer predominant as the single or predominant source of data in the enterprise. And that’s okay. It’s not necessary to maintain a “golden record” of all the data entities in your organization. And that’s where APIs truly excel. They allow you to work with real-time data (as opposed to historical data) and real-time analytics to provide a better understanding of what’s going on at any given time.

Addressing the Cultural Challenge: Take an API-Enabled Iterative Approach

With APIs, fear and entrenched data fiefdoms are a thing of the past. Instead of grabbing data from a department’s database, cleaning it and prepping it for analysis, the data stays right where it is, under the control of the data owner. Opening your API also helps you maintain the health of your business intelligence program by promoting data hygiene. Knowing that their data will be shared, data owners instinctively become more accountable for keeping that data clean. Whereas with a data warehouse approach, once the data leaves the department, data owners no longer feel responsible for it.

APIs also support an iterative approach to analytics. Data owners can decide what to share based on what they feel most comfortable with. As they see the fruits of their sharing, they start giving up their data monopoly. It’s a nimble and cost-effective approach that increases team buy-in.

Of course, it doesn’t happen overnight. How can your organization achieve this shorter, nimbler path to actionable data insights? Read more about what we call the Minimal Viable Prediction (MVP) approach.

Turn your APIs into A Powerful Analytics Foundation: Meet the API-in-a-Box

More and more businesses are embracing an API business model. But how do you enable this API-driven analytics transformation? Allow us to introduce the API-in-a-box.

An API-in-a-box is a containerized API adapter that can be deployed in a plug-and-play fashion, quickly and cost-effectively. It integrates disparately stored data by providing a safe passage for non-sensitive data or data that’s been given a green light by a department to be shared. With an API-in-a-Box, data remains in situ at its original source but is accessed in real-time.

APIs are a proven method for encouraging cross-departmental collaboration, analytics, and reporting, while facilitating the identification and correction of data discrepancies. Teams maintain full control of their data and can provide exact rules as to who can access that data.

An API-in-a-box can be spun up in an extremely short period of time, eliminating the time-consuming data integration problem. Plus, after data errors are found and one department’s data is merged with another, actionable insights start to emerge and the barriers of fear and fiefdom start to break down.

Go Ahead, Resist the Big Bang Approach

The traditional approach to data analytics is often risky big bang-thinking. Some of these projects have worked, but those successes are few and far between. They call for a huge planning endeavor: one that’s beyond the time and resources of many organizations. That old safeguard, the data warehouse, has also run its course as the stalwart of business intelligence initiatives.

As the arguments in this piece show, it’s time for a new approach.

Using new technology concepts (API-in-a-Box) and iterative approaches (minimal viable prediction), results emerge, sometimes in a matter of weeks, not months or years, and at a fraction of the cost of doing it the old way.

Data owners become heroes as new and actionable insights are achieved. A culture shift starts to take place as more people pull in the direction of a data culture.

The field of API for data analytics is still new, but to be successful in the long term, it’s an approach that we vehemently advocate. Give it a try.

Original Link

  • 1
  • 2