ALU

configuration

Hive Metastore Configuration After Fresh Installation

For the beginners playing around in Hive, a stoppage arises with the proper configuration. After placing Hive libraries in designated folders and updating necessary environment variables, many times the first eager execution of hive fails with the exception “HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient”. That’s when Hive Metastore needs to be configured, which is pretty simple and straightforward.

There are two ways to configure Hive Metastore. We can use ‘schematool’ or directly source the hive-schema-3.1.0.mysql.sql script provided by Hive into the Metastore database.

Original Link

Feature Management for DevOps [Video]

At our September meeting of Test in Production, our own Tim Wong—LaunchDarkly’s Principle TAM and Chief of Staff—gave a talk about Feature Management for DevOps.

“You can get to a point where you are pushing configuration based on routes or other aspects of your instance. This is kind of poor man’s tracing in some ways or you can degrade functionality based on a particular route or a particular IP or a particular node set, particular method, particular account. You can provide different configuration sets for different types of things as long as you instrument them that way.”

Original Link

Top AWS Lambda Gotchas You Must Know Before Configuring Them

“Once You Set It, You Forget It.”

Often, we see several cloud practitioners quote this line about serverless technologies. While serverless services have brought so much convenience, they have brought few challenges as well along with them. Like no visibility into abstracted layers that the service provider takes care of! One such serverless service is AWS Lambda functions, where AWS allocates CPU power to each function on your behalf depending on the memory allocation. "Some abstractions do not actually simplify our lives as much as they were meant to do," like Joel Spolsky says in his article, "The Law of Leaky Abstractions."

"What’s with Lambda? Misconfigured functions, most of the time?"

So, in this post, we walk you through how CPU allocation affects Lambda execution time.

Original Link

How to Create a Self-Healing IT Infrastructure

There always is some transition from here to there, some evolutionary process. For example, the next big thing in the automotive industry is the worldwide acceptance of self-driving cars. Despite certain fatal failures from Elon Musk’s Tesla autopilot, Ford plans to produce self-driving cars and Daimler-Benz is already testing self-driving trucks. These manufacturers act according to a 5-step plan to achieve driverless cars.

The hard part of the transition is the attitude shift — the drivers must accept the role of passengers, not the masters of the road. The benefits are supreme, though fully-automated delivery system working 24/7 and ensuring fewer car crashes and human casualties on the roads. The road to this utopia might seem long, yet the automotive giants cover it with seven-league steps.

Original Link

Running Jenkins Server With Configuration-as-Code

Some days ago, I came across a newly created Jenkins plugin called Configuration as Code (JcasC). This plugin allows you to define Jenkins configuration in  a very popular format — YAML notation. It is interesting that such a plugin has not been created before, but better late than never. Of course, we could have used some other Jenkins plugins, like Job DSL Plugin, but it is based on the Groovy language.

If you have any experience with Jenkins, you probably know how many plugins and other configuration settings it requires to have in order to work in your organization as the main CI server. With the JcasC plugin, you can store such configuration in human-readable declarative YAML files.

Original Link

Jenkins Configuration as Code: Need for Speed!

This blog post is the fourth in the Jenkins configuration as code series.

Using Jenkins configuration as code one can manage the Jenkins master configuration with simple, declarative YAML files, and manage them as code. But one can argue this was already feasible, by managing Jenkins’ XML configuration files as code and storing them in a Git repository.

Original Link

Kubernetes FAQ: How Do I Configure Storage for a Bare Metal Kubernetes Cluster?

In our ongoing series on the most frequently asked questions from the Kubernetes community meetings, we are going to look at how to configure storage for bare metal installations. Much like the problems with defining ingress and routing traffic for bare metal, you obviously can’t rely on the convenient services that are available from the major cloud providers to provide persistent storage volumes for your stateful applications.

On the other hand, you don’t want to fall into the trap of having to look after your persistent volumes like pets. But let’s back up a little bit and explain why state can be a problem with Kubernetes and why you even need to consider storage management for your application.

Original Link

Introduction to JSON With Java

Programming plays an indispensable role in software development, but while it is effective at systematically solving problems, it falls short in a few important categories. In the cloud and distributed age of software, data plays a pivotal role in generating revenue and sustaining an effective product. Whether configuration data or data exchanged over a web-based Application Programming Interface (API), having a compact, human-readable data language is essential to maintaining a dynamic system and allowing non-programmer domain experts to understand the system.

While binary-based data can be very condensed, it is unreadable without a proper editor or extensive knowledge. Instead, most systems use a text-based data language, such as eXtensive Markup Language (XML), Yet Another Markup Language (YML), or Javascript Object Notation (JSON). While XML and YML have advantages of their own, JSON has become a front-runner in the realm of configuration and Representational State Transfer (REST) APIs. It combines simplicity with just enough richness to easily express a wide variety of data.

Original Link

Setting Up an Azure B2C Tenant: The Long Walk

B2C is (one of) Microsoft’s offering to allow us programmers to pass the business of managing logins and users over to people who want to be bothered with such things. This post contains very little code, but lots of pictures of configuration screens, that will probably be out of date by the time you read it.

A B2C set-up starts with a tenant. So the first step is to create one.

Original Link

Making Django, Elastic Beanstalk and AWS RDS Play Well Together

A couple of days ago I decided I should learn a bit more hands-on AWS stuff. So I created a free tier AWS account, and looked around. I decided I’d take a common use case: deploy a web application to Elastic Beanstalk and add a domain and SSL.

Setting Up Tools

Step 1: reading documentation. AWS has a lot of documentation, and it is mostly written in a friendly manner with easy-to-follow instructions. Based on the documentation I opted for using the command line Elastic Beanstalk tool. To use this you need Python and pip. You can install it with the command

Original Link

DevOps Tools Are Not Magic Bullets!

They’re shiny. They’re new. And they’re what DevOps is all about! The cool toolsets that enable such amazing IT performance call out to us, "Download me! Install me! I’ll make your IT life great!"

Don’t fall for their siren song!

Original Link

How to Enable HTTP/HTTPS on Spring Boot [Snippet]

You can configure Spring Boot services to be accessed over the SSL layer. And, the configuration can be done in the application configuration YML or property file. If there is ever a need to support the services so that they can be accessed in both the HTTP and HTTPS layer, you will have to plug in an additional connector.

By default, it allows only one connector to use the properties. The key to supporting both connectors would require some customization.

Original Link

Why JSON Isn’t a Good Configuration Language

Many projects use JSON for configuration files. Perhaps the most obvious example is the package.json file used by npm and yarn, but there are many others, including CloudFormation (originally JSON only, but now supports YAML as well) and composer (PHP).

However, JSON is actually a pretty terrible configuration language for a number of reasons. Don’t get me wrong – I like JSON. It is a flexible format that is relatively easy for both machines and humans to read, and it’s a pretty good data interchange and storage format. But as a configuration language, it falls short.

Why Is JSON Popular as a Config Language?

There are several reasons why JSON is used for configuration files. The biggest reason is probably that it is easy to implement. Many languages have JSON support in the standard library, and those that don’t almost certainly have an easy-to-use JSON package readily available. Then there is the fact that developers and users are probably already familiar with JSON and don’t need to learn a new configuration format to use the product. And that’s not to mention all the existing tooling for JSON, including syntax highlighting, auto-formatting, validation tools, etc.

These are actually all pretty good reasons. It’s too bad that this ubiquitous format is so ill-suited for configuration.

The Problems With JSON

Lack of Comments

One feature that is absolutely vital for a configuration language is comments. Comments are necessary to annotate what different options are for and why a particular value was chosen and-perhaps most importantly-to temporarily comment out parts of the config while using a different configuration for testing and debugging. If you think of JSON as a data interchange format, then it doesn’t really make sense to have comments.

There are, of course, workarounds for adding comments to JSON. One common workaround is to use a special key in an object for a comment, such as “//” or”__comment”. However, this syntax isn’t very readable, and in order to include more than one comment in a single object, you need to use unique keys for each. David Crockford (the inventor of JSON) suggests using a preprocessor to remove comments. If you are using an application that requires JSON configuration, I recommend that you do just that, especially if you already have any kind of build step before the configuration is used. Of course that does add some additional work to editing configuration, so if you are creating an application that parses a configuration file, don’t depend on your users being able to use that.

Some JSON libraries do allow comments as input. For example, Ruby’s JSON module and the Java Jackson library with the JsonParser.Feature.ALLOW_COMMENTS feature enabled will handle JavaScript-style comments just fine in JSON input. However, this is non-standard, and many editors don’t properly handle comments in JSON files, which makes editing them a little harder.

Overly Strict

The JSON specification is pretty restrictive. Its restrictiveness is part of what makes it easy to implement a JSON parser, but in my opinion, it also hurts the readability and, to a lesser extent, writability by humans.

Low Signal to Noise

Compared to many other configuration languages, JSON is pretty noisy. There is a lot of punctuation that doesn’t aid human readability, although it does make it easier to write implementations for machines. In particular, for configuration files, the keys in objects are almost always identifiers, so the quotation marks around the keys are redundant.

Also, JSON requires curly braces around the entire document, which is part of what makes it an (almost) subset of JavaScript and helps delimit different objects when multiple objects are sent over a stream. But, for a configuration file, the outermost braces are just useless clutter. The commas between key-value pairs are also mostly unnecessary in config files. Generally, you will have a single key-value pair per line, so it would make sense to accept a newline as a delimiter.

Speaking of commas, JSON doesn’t accept trailing commas. If you need commas after each pair, it should at least accept trailing commas, since trailing commas make adding new entries to the end easier and lead to cleaner commit diffs.

Long Strings

Another problem with JSON as a configuration format is it doesn’t have any support for multi-line strings. If you want newlines in the string, you have to escape them with “\n”, and what’s worse, if you want a string that carries over onto another line of the file, you are just out of luck. If your configuration doesn’t have any strings that are too long to fit on a line, this isn’t a problem. However, if your configuration includes long strings, such as the description of a project or a GPG key, you probably don’t want to put it on a single line with “\n” escapes instead of actual newlines.

Numbers

In addition, JSON’s definition of a number can be problematic in some scenarios. As defined in the JSON spec, numbers are arbitrary precision finite floating point numbers in decimal notation. For many applications, this is fine. But if you need to use hexadecimal notation or represent values like infinity or NaN, then TOML or YAML would be able to handle the input better.

{ "name":"example", "description":"A really long description that needs multiple lines.\nThis is a sample project to illustrate why JSON is not a good configuration format. This description is pretty long, but it doesn't have any way to go onto multiple lines.", "version":"0.0.1", "main":"index.js", "//":"This is as close to a comment as you are going to get", "keywords":[ "example", "config" ], "scripts":{ "test":"./test.sh", "do_stuff":"./do_stuff.sh" }, "bugs":{ "url":"https://example.com/bugs" }, "contributors":[ { "name":"John Doe", "email":"johndoe@example.com" }, { "name":"Ivy Lane", "url":"https://example.com/ivylane" } ], "dependencies":{ "dep1":"^1.0.0", "dep2":"3.40", "dep3":"6.7" }
}

What You Should Use Instead

The configuration language you choose will depend on your application. Each language has different pros and cons, but here are some choices to consider. They are all languages that are designed for configuration first and would each be a better choice than a data language like JSON.

TOML

TOML is an increasingly popular configuration language. It is used by Cargo (Rust build tool), pip (Python package manager), and dep (golang dependency manager). TOML is somewhat similar to the INI format, but unlike INI, it has a standard specification and well-defined syntax for nested structures. It is substantially simpler than YAML, which is attractive if your configuration is fairly simple. But if your configuration has a significant amount of nested structure, TOML can be a little verbose, and another format, such as YAML or HOCON, may be a better choice.

name = "example"
description = """
A really long description that needs multiple lines.
This is a sample project to illustrate why JSON is not a \
good configuration format. This description is pretty long, \
but it doesn't have any way to go onto multiple lines.""" version = "0.0.1"
main = "index.js"
# This is a comment
keywords = ["example", "config"] [bugs]
url = "https://example.com/bugs" [scripts] test = "./test.sh"
do_stuff = "./do_stuff.sh" [[contributors]]
name = "John Doe"
email = "johndow@example.com" [[contributors]]
name = "Ivy Lane"
url = "https://example.com/ivylane" [dependencies] dep1 = "^1.0.0"
# Why we depend on dep2
dep2 = "3.40"
dep3 = "6.7"

HJSON

HJSON is a format based on JSON but with greater flexibility to make it more readable. It adds support for comments, multi-line strings, unquoted keys and strings, and optional commas. If you want the simple structure of JSON but something more friendly for configuration files, HJSON is probably the way to go. There is also a command line tool that can convert HJSON to JSON, so if you are using a tool that requires plain JSON, you can write your configuration in HJSON and convert it to JSON as a build step. JSON5 is another option that is pretty similar to HJSON.

{ name: example description: ''' A really long description that needs multiple lines. This is a sample project to illustrate why JSON is not a good configuration format. This description is pretty long, but it doesn't have any way to go onto multiple lines. ''' version: 0.0.1 main: index.js # This is a a comment keywords: ["example", "config"] scripts: { test: ./test.sh do_stuff: ./do_stuff.sh } bugs: { url: https://example.com/bugs } contributors: [{ name: John Doe email: johndoe@example.com } { name: Ivy Lane url: https://example.com/ivylane }] dependencies: { dep1: ^1.0.0 # Why we have this dependency dep2: "3.40" dep3: "6.7" }
}

HOCON

HOCON is a configuration designed for the Play framework but is fairly popular among Scala projects. It is a superset of JSON, so existing JSON files can be used. Besides the standard features of comments, optional commas, and multi-line strings, HOCON supports importing from other files, referencing other keys of other values to avoid duplicate code, and using dot-delimited keys to specify paths to a value, so users do not have to put all values directly in a curly-brace object.

name = example
description = """
A really long description that needs multiple lines. This is a sample project to illustrate why JSON is not a good configuration format. This description is pretty long, but it doesn't have any way to go onto multiple lines. """
version = 0.0.1
main = index.js
# This is a a comment
keywords = ["example", "config"]
scripts { test = ./test.sh do_stuff = ./do_stuff.sh
}
bugs.url = "https://example.com/bugs"
contributors = [ { name = John Doe email = johndoe@example.com } { name = Ivy Lane url = "https://example.com/ivylane" }
]
dependencies { dep1 = ^1.0.0 # Why we have this dependency dep2 = "3.40" dep3 = "6.7"
}

YAML

YAML (YAML Ain’t Markup Language) is a very flexible format that is almost a superset of JSON and is used in several conspicuous projects such as Travis CI, Circle CI, and AWS CloudFormation. Libraries for YAML are almost as ubiquitous as JSON. In addition to support of comments, newline delimiting, multi-line strings, bare strings, and a more flexible type system, YAML also allows you to reference earlier structures in the file to avoid code duplication.

The main downside to YAML is that the specification is pretty complicated, which results in inconsistencies between different implementations. It also treats indentation levels as syntactically significant (similar to Python), which some people like and others don’t. It can also make copy and pasting tricky. See YAML: probably not so great after all for a more complete description of downsides to using YAML.

name: example
description: > A really long description that needs multiple lines. This is a sample project to illustrate why JSON is not a good configuration format. This description is pretty long, but it doesn't have any way to go onto multiple lines.
version: 0.0.1
main: index.js
# this is a comment
keywords: - example - config
scripts: test: ./test.sh do_stuff: ./do_stuff.sh
bugs: url: "https://example.com/bugs"
contributors: - name: John Doe email: johndoe@example.com - name: Ivy Lane url: "https://example.com/ivylange"
dependencies: dep1: ^1.0.0 # Why we depend on dep2 dep2: "3.40" dep3: "6.7"

Scripting Language

If your application is written in a scripting language such as Python or Ruby, and you know the configuration comes from a trusted source, the best option may be to simply use a file written in that language for your configuration. It’s also possible to embed a scripting language such as Lua in compiled languages if you need a truly flexible configuration option. Doing so gives you the full flexibility of the scripting language and can be simpler to implement than using a different configuration language. The downside to using a scripting language is it may be too powerful, and of course, if the source of the configuration is untrusted, it introduces serious security problems.

Write Your Own

If for some reason a key-value configuration format doesn’t meet your needs, and you can’t use a scripting language due to performance or size constraints, then it might be appropriate to write your own configuration format. But if you find yourself in this scenario, think long and hard before making a choice that will not only require you to write and maintain a parser but also require your users to become familiar with yet another configuration format.

Conclusion

With so many better options for configuration languages, there’s no good reason to use JSON. If you are creating a new application, framework, or library that requires configuration choose something other than JSON.

Original Link

GDPR Forget-Me App (Part 3): Conditional Configuration With Spring Boot 2

In the previous part, I explained message flows in detail by implementing inbound and outbound messaging with Spring Integration’s AMQP support. I briefly mentioned that data handler adapters are loaded dynamically, and they’re plugged into the message flow. In this third part, we’ll explore one of those technical challenges in detail that the application’s modular design raise and how it can be tackled by using Spring Boot 2’s new property Binder API.

What Will You Learn After Reading This Part?

For most of the use cases, having a predefined, static message flow is sufficient. However, that’s not the case for the forget-me app, as multiple data handlers can be configured which will carry data erasure out. One major challenge to address is to decide whether or not a particular data handler needs to be initialized and plugged into the main message flow. I can tell you beforehand that Spring’s conditional configuration will be used to do that.

You can find the entire source code of the app on GitHub. Be aware, however, that the app hasn’t been released yet and I do code reorganizations from time to time.

Conditional Configuration With Spring Boot 2

I had been searching for a proper solution with regards to configuring and initializing the child application contexts of the data handler dynamically. Eventually, I stumbled upon OAuth2ClientRegistrationRepositoryConfiguration and it gave me a few ideas.

The app is going to come with a fixed number of built-in modules (data handlers or adapters as I sometimes refer to them). There are pre-configured with both metadata and runtime configuration data. Here is an example of a configuration:

forgetme: data-handler: registration: mailerlite: name: mailerlite display-name: MailerLite description: Email Marketing url: https://www.mailerlite.com/ data-scopes: - notification - profile provider: mailerlite: api-key: ${MAILERLITE_API_KEY:#{null}}

If you used the new OAuth2 support in Spring Security, you probably noticed that this piece of configuration looks very similar.

The first part (registration) of this configuration holds metadata about the data handler, but the second part (provider) may contain arbitrary key-value pairs for configuring it. In this case, an API key, MailerLite, needs only that.

Here is how this is going to work. When the MAILERLITE_API_KEY is defined, the corresponding child application context gets loaded. Otherwise, it remains inactive. As the configuration key/value pairs for individual data handlers cannot be known in advance, Spring Boot 2’s property Binder API is a good fit for loading them.

@Getter
@Setter
public class DataHandlerRegistration { static final Bindable<Map<String, String>> DATA_HANDLER_PROVIDER_BINDABLE = Bindable.mapOf(String.class, String.class); static final String DATA_HANDLER_PROVIDER_PREFIX = "forgetme.data-handler.provider"; static final Bindable<Map<String, DataHandlerRegistration>> DATA_HANDLER_REGISTRATION_BINDABLE = Bindable.mapOf(String.class, DataHandlerRegistration.class); static final String DATA_HANDLER_REGISTRATION_PREFIX = "forgetme.data-handler.registration"; private String name; private String displayName; private String description; private URI url; private Set<DataScope> dataScopes; public Optional<URI> getUrl() { return Optional.ofNullable(url); } public void validate() { Assert.hasText(getName(), "Data handler name must not be empty."); Assert.hasText(getDisplayName(), "Data handler display-name must not be empty."); Assert.hasText(getDescription(), "Data handler description must not be empty."); Assert.notEmpty(getDataScopes(), "Data handler data-scopes must not be empty."); } public enum DataScope { ACCOUNT, CORRESPONDENCE, ENQUIRY, NOTIFICATION, PROFILE, PUBLICATION, USAGE; }
}

What’s relevant here is the DATA_HANDLER_PROVIDER_BINDABLE, which is basically a mapping between a set of properties and a binding definition. The framework returns a BindResultobject. Although it resembles its well-known counterpart BindingResult from Spring MVC, it also embraces lambdas. You can use it in a similar way that you would with java.util.Optional.

public abstract class AbstractDataHandlerConfiguredCondition extends SpringBootCondition { private final String dataHandlerName; public AbstractDataHandlerConfiguredCondition(String dataHandlerName) { this.dataHandlerName = dataHandlerName; } @Override public ConditionOutcome getMatchOutcome( ConditionContext context, AnnotatedTypeMetadata metadata) { ConditionMessage.Builder message = ConditionMessage .forCondition("Data handler configured:", dataHandlerName); Map<String, String> dataHandlerProperties = getDataHandlerProperties(context.getEnvironment()); if (isDataHandlerConfigured(dataHandlerProperties)) { return ConditionOutcome.match(message.available(dataHandlerName)); } return ConditionOutcome.noMatch(message.notAvailable(dataHandlerName)); } protected abstract boolean isDataHandlerConfigured(Map<String, String> dataHandlerProperties); private Map<String, String> getDataHandlerProperties(Environment environment) { String propertyName = DATA_HANDLER_PROVIDER_PREFIX + "." + dataHandlerName; return Binder.get(environment) .bind(propertyName, DATA_HANDLER_PROVIDER_BINDABLE) .orElse(Collections.emptyMap()); }
}

Actual data handler adapters implement AbstractDataHandlerConfiguredConditionwhere they can define which constellation of provider properties that particular data handler should enable. In this case, MailerLite only has a single property (the API key), as long as that one contains a non-empty piece of text.

@Configuration
@Conditional(MailerLiteConfiguredCondition.class)
public class MailerLiteFlowConfig extends AbstractDataHandlerFlowConfig { static final String DATA_HANDLER_NAME = "mailerlite"; @Override protected String getDataHandlerName() { return DATA_HANDLER_NAME; } static class MailerLiteConfiguredCondition extends AbstractDataHandlerConfiguredCondition { public MailerLiteConfiguredCondition() { super(DATA_HANDLER_NAME); } @Override protected boolean isDataHandlerConfigured(Map<String, String> dataHandlerProperties) { return Optional.ofNullable(dataHandlerProperties.get("api-key")) .filter(StringUtils::hasText) .isPresent(); } }
}

Here, you can see MailerLite’s own configuration that enabled only when there’s was an API key set. Most of the heavy lifting is done by AbstractDataHandlerFlowConfig in terms of creating and configuring the child application context.

Conclusion

Using conditional configuration with Spring Boot 2’s new property Binder API is a powerful combination.

  • It comes very handy when you want to bind an arbitrary set of key-value pairs without knowing if they’re present or not.
  • Compared to Environment, it’s much more convenient to use, and it also provides an API similar to java.util.Optional.
  • You can even delay the initialization of @ConfigurationProperties annotated configuration,, because configuration data through the Binder API is available even before that happens.

Original Link

Configuring Memory for Postgres

work_mem is perhaps the most confusing setting within Postgres. work_mem is a configuration within Postgres that determines how much memory can be used during certain operations. At its surface, the work_mem setting seems simple: after all, work_mem just specifies the amount of memory available to be used by internal sort operations and hash tables before writing data to disk. And yet, leaving work_mem unconfigured can bring on a host of issues. What perhaps is more troubling, though, is when you receive an out of memory error on your database and you jump in to tune work_mem, only for it to behave in an un-intuitive manner.

Setting Your Default Memory

The work_mem value defaults to 4MB in Postgres, and that’s likely a bit low. This means that per Postgres, activity (each join, some sorts, etc.) can consume 4MB before it starts spilling to disk. When Postgres starts writing temp files to disk, obviously things will be much slower than in memory. You can find out if you’re spilling to disk by searching for temporary file within your PostgreSQL logs when you have log_temp_files enabled. If you see temporary file, it can be worth increasing your work_mem.

On Citus Cloud (our fully-managed database as a service that scales out Postgres horizontally), we automatically tune work_mem based on the overall memory available to the box. Our tuning is based on the years of experience of what we’ve seen work for a variety of production Postgres workloads, coupled with statistics to compute variations based on cluster sizing.

It’s tough to get the right value for work_mem perfect, but often, a sane default can be something like 64 MB if you’re looking for a one size fits all answer.

It’s Not Just About the Memory for Queries

Let’s use an example to explore how to think about optimizing your work_memsetting.

Say you have a certain amount of memory, say 10 GB. If you have 100 running Postgres queries, and each of those queries has a 10 MB connection overhead, then 100*10 MB (1 GB) of memory is taken up by the 100 connections, which leaves you with 9GB of memory.

With 9 GB of memory remaining, say you give 90 MB to work_mem for the 100 running queries. But wait, it’s not that simple. Why? Well, work_mem isn’t set on a per-query basis, rather, it’s set based on the number of sort/hash operations. But how many shorts/hashes and joins happen per query? Now that is a complicated question. A complicated question made more complicated if you have other processes that also consume memory, such as autovacuum.

Let’s reserve a little for maintenance tasks and for vacuum and we’ll be okay as long as we limit our connections right? Not so fast my friend.

Postgres now has parallel queries. If you’re using Citus for parallelism you’ve had this for a while, but now you have it on single node Postgres as well. What this means is on a single query, you can have multiple processes running and performing work. This can result in some significant improvements in speed of queries, but each of those running processes can consume the specified amount of work_mem. With our 64 MB default and 100 connections, we could now have each of those running a query per each core consuming far more memory than we anticipated.

More work_mem, More Problems

So, we can see that getting it perfect is a little more work than ideal. Let’s go back a little and try this more simply. We can start work_mem small at say, 16 MB, and gradually increase work_mem when we see temporary file. But why not give each query as much memory as it would like? If we were to just say each process could consume up to 1 GB of memory, what’s the harm? Well, the other extreme out there is that queries begin consuming too much memory, more than you have available on your box. When that happens, you get 100 queries that have 5 different sort operations and a few hash joins in them. It’s in fact very possible to exhaust all the memory available to your database.

When you consume more memory than is available on your machine, you can start to see out of memory errors within your Postgres logs, or in worse cases, the OOM killer can start to randomly kill running processes to free up memory. An out of memory error in Postgres simply errors on the query you’re running, whereas the OOM killer in Linux begins killing running processes, which in some cases might even include Postgres itself.

When you see an out of memory error, you either want to increase the overall RAM on the machine itself by upgrading to a larger instance, or you want to decrease the amount of memory that work_mem uses. Yes, you read that right: out-of-memory, it’s better to decrease work_mem instead of increase since that is the amount of memory that can be consumed by each process and too many operations are leveraging up to that much memory.

General Guidance for work_mem

While you can continually tune and tweak work_mem, a couple of broad guidelines for pairing to your workload can generally get you into a good spot:

If you have a number of short running queries that run very frequently and perform simple lookups and joins, then maintaining a lower work_mem  is ideal. In this case, you get diminishing returns by allowing it to be significantly higher because it’s simply unused. If you’re workload is relatively few active queries at a time that are doing very complex sorts and joins, then granting more memory to prevent things from spilling can give you great returns.

Happy Database Tuning

Postgres powerful feature set and flexibility means you have a lot of knobs you can turn and levers you can pull in tuning it. Postgres is often used for embedded systems, time series dataOLTP, and OLAP as well. This flexibility can often mean an overwhelming set of options when tuning. On Citus Cloud, we’ve configured this to be suitable for most workloads we see. Think of it as one size fits most, and then when you need to, you’re able to customize. If you’re not running on Citus Cloud, consider leveraging pgtune to help you get to a good starting point.

Original Link

Property Injection in Java With CDI

One of the more common tasks faced in Java application development environments is the need to obtain constant values (Strings, numbers, etc.) from external properties files or the environment. Out of the box, Java provides methods such as System.getenv and Properties.load for retrieving such values, but their use can often lead to excess boilerplate code and logic checking for missing values in order to apply defaults, etc. Using CDI or CDI extensions, most of the boilerplate can be avoided, moving property retrieval and usage out of the way.

Consider the simple case where we have a class declaring a String field that must be set dynamically at runtime to a property value. The assumption is that this class is being managed by a CDI runtime such as a Java EE container.

In this example, we would like to set the value of `simple` to be the value of a system property, if available. Otherwise, set the value to be the property contained in a standard format properties file on the class path. Finally, set the value to be an empty String if nothing is found. By convention, both the system property and the name of the properties file match the fully qualified name of the class plus the field. Exceptions are thrown to the caller for the sake of brevity.

package com.example.injection; public class Example { private String simple = null; public String getSimple() throws Exception { if (this.simple == null) { // Do we have a System property to use? String systemSimple = System.getProperty("com.example.injection.Example.simple"); if (systemSimple == null) { /* No System property found, check in * Example.properties on class path */ Properties classProperties = new Properties(); ClassLoader loader = getClass().getClassLoader(); String resName = "com/example/injection/Example.properties"; try (InputStream in = loader.getResourceAsStream(resName)) { classProperties.load( in ); } this.simple = classProperties.getProperty("simple", ""); } else { this.simple = systemSimple; } } return this.simple; }
}

There is quite a bit of code here for something that, on the surface, seemed to be a simple task. The level of complexity increases if we want to parse the property into another type of object such as an Integeror a Date, or if we need to cache the Properties for performance reasons. No developer wants to pollute the code with methods like this (not to mention test it).

Enter CDI and CDI extensions. In a Java EE environment or other runtime that supports CDI, we can replicate the functionality above in a much simpler way.

package com.example.injection; import javax.inject.Inject;
import io.xlate.inject.Property; public class Example { @Inject @Property(defaultValue = "") private String simple; public String getSimple() { return this.simple; }
}

This example makes use of a small library Property Inject to obtain values from system properties and/or property files. Using default naming conventions, all of the logic from the earlier example is handled under the hood. When necessary, we can override the default behavior and specify the name of the system property to use, the URL containing the Properties we want to reference, and the name of the key within those properties.

Consider the example where we would like to override the property location and naming. Below, the CDI extension will first attempt to find the value in the system property called `global.simple` (e.g. command line argument -Dglobal.simple="really simple". If not found, the properties file named `config/my-app.properties` will be loaded from the class path and searched for entry `example.simple`. Finally, if nothing has been found, the value will default to null since no defaultValue has been defined.

package com.example.injection; import javax.inject.Inject;
import io.xlate.inject.Property;
import io.xlate.inject.PropertyResource; public class Example { @Inject @Property(name = "example.simple", resource = @PropertyResource("classpath:config/my-app.properties"), systemProperty = "global.simple") private String simple; public String getSimple() { return this.simple; }
}

In addition to Strings, Property Inject also supports the injections of all Java primitive types and their wrapper classes, BigInteger, BigDecimal, Date, JsonArray, JsonObject, and java.util.Properties collections themselves.

How are you using properties in your CDI-enabled applications today?

Original Link

Logging With Log4j in Java

If we use SOP (System.out.print()) statements to print log messages, then we can run into some disadvantages:

  1. We can print log messages on the console only. So, when the console is closed, we will lose all of those logs.
  2. We can’t store log messages in any permanent place. These messages will print one by one on the console because it is a single-threaded environment.

To overcome these problems, the Log4j framework came into the picture. Log4j is an open source framework provided by Apache for Java projects.

Log4j Components

Log4j has three main components, which are the following:

  1. Logger
  2. Appender
  3. Layout

Logger

Logger is a class in the org.apache.log4j.* package. We have to initialize one Logger object for each Java class. We use Logger’s methods to generate log statements. Log4j provides the factory method to get Logger objects.

Syntax to get Logger objects:

static Logger logger = Logger.getLogger(CurrentClass.class.getName()).

Note: CurrentClass is a Java class name for which we are getting logger object.

Example

public class Student{ private static final Logger LOGGER = Logger.getLogger(Student.class); public void getStudentRecord() { }
}

The Logger class has some methods that are used to print application status.

We have five methods in the Logger class

  1. info()
  2. debug()
  3. warn()
  4. fatal()
  5. error()

How and when to use these methods depends on us. Here, the method names are different, but the process is the same for all of them: all will print a message only.

Levels

Level is a class in the org.apache.log4j.* package. We can also make a custom level by extending the Level class. Each level has a different priority order, like this:

debug < info < warn < error < fatal

It means fatal is the highest priority error, like if/when the database is closed.

Appender

Appender is used to write messages into a file or DB or SMTP.

Log4j has different types of appenders:

  1. SyslogAppendersends
  2. SMTPAppender
  3. JDBCAppender
  4. FileAppender
  5. SocketHubAppender
  6. SocketAppender
  7. TelnetAppender
  8. ConsoleAppender

Layout

This is used to define the formatting in which logs will print in a repository.

We have different types of layouts:

  1. PatternLayout
  2. SimpleLayout
  3. XMLLayout
  4. HTMLLayout

Log4j: Configuration

log4j.properties

# Root logger option
log4j.rootLogger=INFO, file, stdout # configuration to print into file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=D:\log\logging.log
log4j.appender.file.MaxFileSize=12MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n # configuration to print on console
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

Description of log4j.properties file :

These will define appender types: That means they will specify where we want to store application logs. RollingFileAppender will print all logs in a file, and ConsoleAppender will print all logs in the console.

That specifies the log file location.

These specify the pattern in which logs will print to the log file.

Example:

import org.apache.log4j.Logger; public class Student { static Logger logger = Logger.getLogger(Student.class); public static void main(String[] args) { logger.debug("This is debug message"); logger.info("This is info message"); logger.warn("This is warn message"); logger.fatal("This is fatal message"); logger.error("This is error message"); System.out.println("Logic executed successfully...."); } }

logging.log(log file):

2018-05-02 16:01:45 INFO Student:12 - This is info message
2018-05-02 16:01:45 WARN Student:13 - This is warn message
2018-05-02 16:01:45 FATAL Student:14 - This is fatal message
2018-05-02 16:01:45 ERROR Student:15 - This is error message

It will not print debug level error logs because we defined our root logger as INFO-level in our log4j.properties file. Error messages with a priority greater than INFO will print.

Console logs:

16:01:45,511 &nbsp;INFO Student:12 - This is info message
16:01:45,517 &nbsp;WARN Student:13 - This is warn message
16:01:45,517 FATAL Student:14 - This is fatal message
16:01:45,518 ERROR Student:15 - This is error message program executed successfully....

Original Link

Get to Know Customization: JSON Binding Overview Series

Let’s take a look at how the annotation model and runtime configuration work when customizing the JSON Binding serialization and deserialization processes.

Annotation Method

Using the annotation method, it’s possible to customize the default serialization and deserialization behavior by annotating fields, JavaBean methods, and classes.

@JsonbNillable
@JsonbPropertyOrder(PropertyOrderStrategy.REVERSE)
public class Book { @JsonbProperty("cost") @JsonbNumberFormat("#0.00") private Float price; }

For example, you could use the @JsonbNillable annotation to customize null handling and the @JsonbPropertyOrder annotation to customize the property order. These two annotations are specified at the class level.

You could specify the number format with the @JsonbNumberFormat annotation and change the name of a field with the @JsonbProperty annotation.

Runtime Configuration

Alternatively, you could choose to handle customization with the runtime configuration builder, by configuration an instance of JsonbConfig and passing it to the create method of the Jsonb builder, as shown in this code snippet.

JsonbConfig jsonbConfig = new JsonbConfig() .withPropertyNamingStrategy(PropertyNamingStrategy.LOWER_CASE_WITH_DASHES) .withNullValues(true) .withFormatting(true); Jsonb jsonb = JsonbBuilder.create(jsonbConfig);

Either way, the JSON Binding API provides extensive capabilities for the serialization and deserialization of Java objects. Let’s move on a look at how JSON-B handles custom object creation.

There is plenty more to know about the JSON Binding API than what I talk about in these blog posts and in my new book Java EE 8: Only What’s New, I cover this API in much more detail.

Original Link

Configurations: Are You Doing it Wrong?

Let’s go over some configuration formats that most of us may be familiar with. Talking about the good old days (not really sure about “good” though), the de-facto standard configuration format in Java was properties. Wait… This historic configuration format is still widely used nowadays. The parser (a.k.a. java.util.Properties, a.k.a. the Hashtable with load and store functions) was released in JDK 1.0. Do you know that you can parse .properties file, store them in a Properties object (Hashtable), and then export them into an XML file? It sounds promising until you see its DTD (Not Schema).

<!-- Copyright 2006 Sun Microsystems, Inc. All rights reserved.-->
<!-- DTD for properties -->
<!ELEMENT properties ( comment?, entry* ) >
<!ATTLIST properties version CDATA #FIXED "1.0">
<!ELEMENT comment (#PCDATA) >
<!ELEMENT entry (#PCDATA) >
<!ATTLIST entry key CDATA #REQUIRED>

Base on this DTD, the configuration is still flat. It is just… hmm… more verbose. I think it’s probably not a good idea to convert the more succinct file format into another file format that is more powerful but not using its power. XML is a good alternative because it supports types (you can even define one) and namespaces (not an XML namespace but a configuration namespace via a nested structure). It’s nice if you don’t care about the verbosity and the complexity that come with XML. Anyway, if you’re going to use XML as your configuration file, just don’t use java.util.Properties. Do yourself a favor and go for Apache commons-configuration.

What Else Do We Have Besides .properties and XML?

JSON? JSON is nice, but you may need something that you can put comments on. This format does not support comments. Putting a comment in a field as part of data is not quite practical.

Sure, everyone these days knows YAML. There are also other formats such as TOML and HOCON. They’re all awesome. It will come down to a personal preference when talking about which one is the best and what format you should use if there are no other restrictions. I personally like Lightbend config, which is a configuration library for HOCON.

Let’s take a look at the simple configuration written in properties file format.

# DRY (Do Repeat Yourself)
configuration.parser = java.util.Properties
configuration.path = /usr/local/etc/...
# Array (No such thing as an array. Parse your own string)
configuration.formats = XML, PROPERTIES

Note that the properties format is perfectly legit in HOCON as well as JSON format. Lightbend config can parse a configuration file either in properties or JSON without any problem. Let’s refactor the properties config a bit to make it look good using HOCON as the following.

# DRY (Don't Repeat Yourself)
configuration { parser = "com.typesafe.config.ConfigFactory" path = "/usr/local/etc/..." formats = ["JSON", "PROPERTIES", "HOCON"]
}

There are a lot of other things that you’ll love in HOCON (same for YAML and TOML). Just stay away from Properties if you’re going to implement something serious these days. I don’t need to go into the details on how to use these modern configuration file formats because they all have a ton of great documentation out there that are surely better sources than me.

One thing that I need to point out, since we’re in the era where multi-threading systems is a norm, properties is thread-safe because it is a Hashtable. But it comes with a cost because the read method is synchronized. Put in other words, no other threads can enter the read method until one thread finishes. And if you think that’s bad, there is a worse issue. It’s mutable! The configuration should be a single source of truth. Thus, it must not be altered at runtime. Period.

Ok. Let’s stop trash talking java.util.Properties. It had its day. The problem is that it still does. (Alright, I’m stopping…)

My next topic will be about Java System’s Properties. You should avoid it at all costs. Ok, sometimes the cost is too high to bear. Avoid it if it’s possible — or make it possible. It has every single bad trait of java.util.Properties because it is actually a java.util.Properties on a global scale.

Avoid Using System’s Properties

The following are some reasons why you should avoid using System’s properties as your source of configuration.

  1. It’s slow in a multi-threading environment because Hashtable methods are synchronized.
  2. It’s mutable. It sounds like a good idea that you can override a configuration at runtime. It’s actually a nightmare in a large system.
  3. We learned in programming 101 that the global variable is bad. System’s properties are like global variables. Assume that you have a module that uses system properties as its configuration. You cannot have more than one instance of that module running in the same JVM with a different set of system properties sharing the same names. You need to run it in a separate VM. That module may need 30 other modules to function properly. The workaround of having a separate VM is very inefficient in this case.
  4. It’s very hard to test. Many test frameworks can run tests in parallel, which could save a tremendous time on a regression test. How can you test your application with a different set of system properties in this case? You have to disable the parallel feature in your test framework, or it will be a tough task for you to write proper tests in that condition — or else don’t write tests at all (take this advice at your own risk).

What if You Cannot Avoid It?

I know it’s virtually impossible to avoid using system properties as a configuration in reality. It’s super convenient. I’d be surprised if it were not used at all. Actually, many popular and production-grade libraries and frameworks use it, and they pass this obligation onto you.

However, there is a way to mitigate the issue of having configurations in system properties. First, you have to lower the number of configurations in system properties as much as possible. Basically, you just don’t add your own configurations to system properties. Second, you have to convert the configurations in system properties into a Config object (Lightbend config).

Here is a short snippet on how to do that.

Config config = ConfigFactory.load()

That’s it! The load method loads and combines the configuration in the following order: system properties, application.conf, application.json, application.properties, and reference.conf. It’s ordered by priority as well. If you have a configuration named “format” in both system properties and application.conf, the one in system properties will be used. 

Note that the document says that this method should be used by libraries and frameworks. You may want to consider another way to construct a config object.

You can customize the load order and precedence however you like:

Config config = ConfigFactory.systemProperties() .withFallback(ConfigFactory.parseFile("myapp.conf")) .withFallback(ConfigFactory.parseResources("myapp.conf"))

The config above will load system properties, the myapp.conf file in the current directory, and the myapp.conf file in the classpath, respectively. Again, you can change the order any way you want. This makes it easy for testing as well because you can override anything easily. The config object is immutable. Thus, it’s thread safe without any performance penalty. It can convert system properties into a Config object. No one can mess with your system at runtime. You just have to make sure that you load the configuration before anyone else.

While system properties is bad in many aspects, it’s useful sometimes. Just make sure to keep usage minimal. My statement about avoiding it at all costs is totally exaggerating. Thank you for reading.

Original Link

Changing the Default Port of Spring Boot Apps [Snippets]

By default, Spring Boot applications run on an embedded Tomcat via port 8080. In order to change the default port, you just need to modify the server.port attribute, which is automatically read at runtime by Spring Boot applications.

In this tutorial, we provide a few common ways of modifying the server.port attribute.

application.properties

Create an application.properties file under src/main/resources and define the server.port attribute inside it:

server.port=9090

EmbeddedServletContainerCustomizer

You can customize the properties of the default servlet container by implementing the EmbeddedServletContainerCustomizer interface as follows:

package com.programmer.gate; import org.springframework.boot.context.embedded.ConfigurableEmbeddedServletContainer;
import org.springframework.boot.context.embedded.EmbeddedServletContainerCustomizer; public class CustomContainer implements EmbeddedServletContainerCustomizer { @Override public void customize(ConfigurableEmbeddedServletContainer container) { container.setPort(9090); }
}

The port defined inside the CustomContainer always overrides the value defined inside application.properties.

Command Line

The third way is to set the port explicitly when starting up the application through the command line. You can do this in two different ways:

  • java -Dserver.port=9090 -jar executable.jar
  • java -jar executable.jar –server.port=9090

The port defined using this way overrides any other ports defined through other ways.

Original Link

Spotlight on Kubernetes

It’s easy to overlook how young Kubernetes is in the world of containerization. Given its explosion in popularity, you’d be forgiven for forgetting that the software is not even four years old yet. Those using the software are quite literally on the frontier of cutting-edge technology that is leaving other platforms in its wake.

With the aid of containers, software development is simplified for developers through the abstraction of application execution details. Getting these operations right though has become critical for competing platforms. Running modular containerized deployments allows IT teams to drastically reduce overheads as well as operational complexity, in comparison to virtual machines.

Last year, Docker—the original frontrunner of container technology since its release in 2013—ceded the orchestration floor announcing that it too would be offering upcoming support for Kubernetes (also known as Kube or by the numeronym K8s). The revelation was made by CTO Solomon Hykes in October 2017 and engineers can already sign up for the beta version here.

Introduction to Kubernetes

Developed by Google, Kubernetes is a powerful platform for managing containerized applications in a clustered environment. In this article, we’ll shine a spotlight on the architecture, examine the problems it can solve, and take a look at the components the Kube model uses to handle containerized deployments and scaling.

So, What Is Kube?

Kube helps address the logistical challenges organizations face in managing and orchestrating containers in production, development, and test environments through declarative code which limits an abundance of errors. Without Kube’s—or Docker Swarm’s—container orchestration capabilities, teams would need to manually update hundreds of containers every time new features are released, making deployments error-prone and slow. Kube is an open source container orchestration platform for managing distributed containerized applications at a massive scale. With Kube, teams can automate application configuration, manage their life cycles, plus maintain and track resource allocations within server clusters. All containerized applications are run on rkt or docker typically.

As well as automating release and deployment processes quickly, should anything crash, Kube provides a “self-healing” environment for application infrastructure. In the event of an incident, the platform will reconcile observed cluster states with the user’s desired state. If worker nodes crash, for example, all pods will be rescheduled to available nodes.

The Benefits

With Kubernetes, teams can:

  • Automatically, and immediately, scale clusters on demand (scaling back when not needed to save resources and money.)
  • Run on-premise anywhere. Whether in a public or private cloud (e.g., AWS or Google) or in a hybrid configuration.
  • Deploy constantly across bare metal, local development, and cloud environments, thanks to its portability.
  • Spend less time debugging and more on delivering business value.
  • Separate and automate operations and development.
  • Rapidly iterate deployment cycles and improve system resilience.

Components

The learning curve for Kube is considered slightly more extensive in comparison to Docker as the concepts from vanilla Docker Swarm don’t directly translate across. To work productively with Kube, therefore, it’s worth understanding its components and their functionality within the architecture.

Pods

“Pods are the smallest deployable units of computing that can be created and managed in Kubernetes,” states Kubernetes’ official docs. In Docker, such units are single containers. However, in Kubernetes pods can contain a container but they are not limited to just one and can include as many as necessary. All containers within a pod run in alliance as though on a single-host sharing a set of Linux namespaces, IP address, and port space. As a group of one or more containers, pods communicate over the standard inter-process communications (IPC) namespace and access the same shared volumes. By itself, a pod is ephemeral and will not be rescheduled to a new node if it dies. This can be overcome though, by keeping one or more instances of a pod alive with replica sets. More on these later.

Labels & Selectors

Labels are key/value attributes that can be assigned to objects including pods or nodes. They should be used to determine distinguishing object characteristics that are significant and appropriate to the user. Labels can be assigned at the time of object creation, or they can be attached/modified at a later date. Use labels to identify, organize, select object subsets and create order within the multi-dimensions of a development pipeline. Information such as release launch (beta, stable) environment type (dev, prod) and/or architectural tier (front/backend) can all be identified in a label. Labels and selectors work in tandem with each other as the core means for managing objects and groups. There are two types of Kubernetes selectors: equality-based and set-based. Equality-based selectors use key-value pairs to sort objects/groups according to basic equality (or inequality). Set-based selectors sort keys according to sets of values.

Replica Sets

As mentioned above, pods won’t be rescheduled if the node it runs on goes down. Replica sets overcome this issue in Kube by ensuring that a specified number of pod instances (or replica sets) are running together at any given time. Therefore, to keep your pod alive, make sure that there is at least one replica set assigned to it. As well as managing single pods, replica sets can manage—and scale to major numbers—groups of pods categorized by a common label. As much of this is automated within deployment, you will never need to actively manage this scaling capability, but it’s worth understanding how the system functions to better manage your applications.

Networking

Within Kube, networking is all about connecting the network endpoints (pods). Containers in different pods must communicate via an alternative method to IPC due to their distinct IP addresses. Kubernetes networking solves this cross-node pod-to-pod connectivity as well as achieving service discovery and pod-to-pod load balancing. Pods are secured by limiting access through network segmentation. Network policies define how subsets of pods are allowed to interact with each other and other network endpoints. Configuration is on a per-namespace basis.

Services

A Kubernetes service is an abstraction which outlines a logical subset of pods according to labels (see above). Kube’s services identify and leverage the label selectors for groups of pods in accordance with the service they are assigned. Such management ease of endpoints through services is all down to the labels. As well as service discovery capabilities, this abstraction of services further provides internal load balancing for pods within a cluster. Kubernetes provides two primary methods of finding a service. As a pod is run on a node, the kubelet (node agent) adds environment variables for each active service according to predefined conventions. The other method is to use the built-in DNS service (a cluster add-in). The DNS server monitors the Kubernetes API for all new services and assigns a set of DNS records for each. When DNS is enabled throughout the cluster, then all pods should be able to do a name resolution of services automatically.

To meet the demand for Kube support, Caylent has already begun development on a Kubernetes offering. We hope to be our audience’s go-to choice for multi-cloud Kubernetes before long.

Looking for help running containers in the more immediate future? Caylent has you covered. Check out our new DevOps-as-a-Service offering; it’s like having a full-time DevOps Engineer on staff for a fraction of the cost.

Original Link

Data Migration Assistant Custom Configuration

The Data Migration Assistant (DMA) is a great tool made available by Microsoft. Successor to the SQL Server Upgrade Advisor, the DMA will perform an assessment of your database against a target version. The DMA can also perform the migration of both schema and data, if desired.

The other day, I wanted to run DMA from a command line. When I went to the install directory, I noticed that there were some .config files available:

Data Migration Assistant configuration file

Yes, very interested. So, I did what anyone else would do, I opened the file to have a look. The file was set to use Visual Studio Code by default, so opening the file was easily done. Once opened, I found what I expected, an XML file with configuration settings for DMA.

I scrolled through the file looking for interesting pieces of information. I could write a whole series of posts on what that file contains, but today, I will keep it short. We will focus on one item: stretch database recommendations.

The file contains this line:

<!-- Tables are eligible to stretch only if the number of rows is equal or greater than the recommendedNumberOfRows treshold --> <stretchDBAdvisor useSimulator="false" timeBetweenIssues="0.00:00:00.10" timeBetweenTables="0:00:00:00.10" recommendedNumberOfRows="100000" />

So, the DMA will recommend a table as a stretch database candidate for if the number of rows is equal to or greater than 100,000. We could debate all day if that number is the correct number, but the DMA needs to have a starting point. 100,000 isn’t a bad place to start. But here’s where things get interesting.

If your shop has specific requirements, you can edit these config files to match your requirements. For example, maybe you want a minimum number of rows to be 1,000,000. You can edit the config file and run your assessment. Let’s take a look.

Check out what the assessment looks like against a copy of GalacticWorks with a target of SQL 2017:

Image title

You can see that we have 2 results returned, and I highlighted the row count and size of the table. Now, we will modify the config file. I will make the default row number to be 10,000, because I want to show you that the number of recommendations will increase. So, same source database, the only change being made here is the configuration of DMA. Here’s what the assessment looks like after the change:

Image title

Notice that the first image returns a total of 2 objects, one “High value” and one “Medium value” as candidates for stretch. After modifying the config file, we get see a total of 19 objects. I have highlighted the 2MB sized Sales.Customer table and the 19,820 rows it contains.

This is just an example to show that modifying the config file changed how the DMA worked. I am not recommending that you stretch such small tables, I just wanted to show you a quick test. Here’s a handful of other items inside the config file that may be of interest to you:

  • BCP argument defaults (both BCP in and out)

  • Database collation settings

  • Scripting options

Go download the Data Migration Assistant and have a look for yourself.

The Data Migration Assistant is a great tool to help you evaluate and migrate your database to newer versions of SQL Server, including Azure SQL Database. The DMA is also customizable to a certain degree, and can be run from a command line. With a handful of lines in PowerShell you could run assessments against a large number of databases in a short period of time.

(If you liked this post, you’ll love our session at SQL Konferenz later this month. We are going to walk you through the entire migration process, helping you to understand how to avoid the common pitfalls that affect many migration projects.)

Original Link

Karaf Configuration as a Groovy File

By default, Apache Karaf keeps configuration for bundles in the etc directory as flat properties files. We can override the configuration for the storing mechanism by providing our own implementation of the org.apache.felix.cm.PersistenceManager interface and use a much more readable format for bundle properties, e.g. Groovy config.

Turning Off Built-In Karaf Persistence

As we can read in the Karaf documentation:

Apache Karaf persists configuration using own persistence manager in case of when available persistence managers do not support that.

We will use our custom implementation of persistence, so Karaf persistence is not needed. We can turn it off by setting the variable storage to an empty value:

$ cat etc/org.apache.karaf.config.cfg
storage=

This option has been available since version 4.1.3 when this issue was resolved.

Registering a Custom Persistence Manager

First, we have to create and register an OSGi service implementing org.apache.felix.cm.PersistenceManager. If we build and install the bundle with such a service while Karaf is running (e.g. by putting a JAR in the deploy directory), then we should have at least two PersistenceManager services registered:

karaf@root()> ls org.apache.felix.cm.PersistenceManager
[org.apache.felix.cm.PersistenceManager]
---------------------------------------- service.bundleid = 7 service.description = Platform Filesystem Persistence Manager service.id = 14 service.pid = org.apache.felix.cm.file.FilePersistenceManager service.ranking = -2147483648 service.scope = singleton service.vendor = Apache Software Foundation
Provided by : Apache Felix Configuration Admin Service (7)
Used by: Apache Felix Configuration Admin Service (7) [org.apache.felix.cm.PersistenceManager]
---------------------------------------- osgi.service.blueprint.compname = groovyConfigPersistenceManager service.bundleid = 56 service.id = 117 service.pid = com.github.alien11689.osgi.util.groovyconfig.impl.GroovyConfigPersistenceManager service.ranking = 100 service.scope = bundle
Provided by : groovy-config (56)
Used by: Apache Felix Configuration Admin Service (7)

Loaded configurations will be cached by the configuration admin. We can use an org.apache.felix.cm.NotCachablePersistenceManager interface if we want to implement a custom caching strategy.

Creating a New Properties File

Let’s create a new properties file in Groovy, e.g:

$ cat etc/com.github.alien11689.test1.groovy
a = '7'
b { c { d = 1 e = 2 } z = 9
}
x.y.z='test'

If we search for properties with the pid com.github.alien11689.test1, Karaf will find these.

karaf@root()> config:list '(service.pid=com.github.alien11689.test1)'
----------------------------------------------------------------
Pid: com.github.alien11689.test1
BundleLocation: null
Properties: a = 7 b.c.d = 1 b.c.e = 2 b.z = 9 service.pid = com.github.alien11689.test1 x.y.z = test

If we make any changes to the file, they won’t be mapped to properties because there are no file watchers defined for it.

We could manage such properties using Karaf commands instead.

Managing Configuration via Karaf Commands

We can define a new pid using Karaf commands:

karaf@root()> config:property-set -p com.github.alien11689.test2 f.a 6
karaf@root()> config:property-set -p com.github.alien11689.test2 f.b 'test'

Since our PersistenceManager has a higher service.ranking (100 > -2147483648), the new pid will be stored as a Groovy file:

$ cat etc/com.github.alien11689.test2.groovy
f { b='test' a='6'
}

We can also change/remove properties or remove the whole configuration pid using Karaf commands, and it will all be mapped to Groovy configuration files.

Sources

Sources are available on GitHub.

Original Link

AWS Security: What Makes Misconfiguration Critical?

In the cloud, where there are no perimeters and limitless endpoints, there are many ways attackers can get direct access to your environment if you make the wrong move. Given the speed that companies are moving to and scaling in the cloud, it’s easy to miss a step along the way and leave your business wide open for an attack.

In a recent survey, we found that 73 percent of companies have critical AWS cloud security configurations. Issues like wide open SSH and infrequent software updates are among the top risks identified, and of course, some of the biggest exposures in the recent past (Verizon, Dow Jones, and the RNC) were the result of AWS S3 configuration errors. But there are many others that are more obscure, yet just as dangerous if left unaddressed.

So, how do you know whether a misconfiguration is going to put you at risk? And how do you identify where your gaps are? In this post, we’ll walk through the four signs of a critical misconfiguration, how to spot one, and how you can fix it — fast.

Signs of a Critical AWS Security Misconfiguration

The beauty of the cloud is that you can configure it in any number of ways to fit your organization’s unique needs. The only problem is, it can be difficult to know the difference between a configuration that deviates from the norm but does not put your security at risk and one that could lead to a breach.

If a misconfiguration could lead to any of the following situations, then it’s considered critical:

  1. Can be leveraged in a direct data breach
  2. Can be leveraged in a more complex attack
  3. Enables trivial attacks on an AWS console
  4. Reduces or eliminates critical visibility (security or compliance)

The best way to determine whether a misconfiguration could lead to any of the above is to think like an attacker. If you can envision an attack based on a misconfiguration, chances are, someone else can too.

How to Spot a Critical Misconfiguration

The best process for spotting misconfigurations is to scan for them as soon as you move to the cloud and again each time you make a change to your environment. Running a configuration audit will help you see what you may have missed and give you the opportunity to remediate before attackers can find and exploit it.

Looking for some examples? Mishaps like leaving SSH wide open to the internet can allow an attacker to attempt remote server access from anywhere, rendering traditional network controls like VPN and firewalls moot. Failing to enforce multi-factor authentication (MFA) is another big misconfiguration concern. In our survey, 62 percent of companies did not actively require users to use MFA, making brute force attacks all too easy for adversaries to carry out. Auditing your configurations regularly will show you how you hold up against CIS Benchmarks and AWS best practices.

The sooner you begin to regularly audit your configurations, the faster you’ll be able to spot misconfigurations before someone else does.

Original Link

Getting Started With Ansible

What Is Configuration Management?

Before starting with Ansible, let’s discuss what configuration management is. Configuration Management (CM) is the process of handling changes in any system systematically, and it maintains the consistency of the product. It retains its consistency because it is applied over the entire lifecycle of the system. Configuration Management provides the capability of controlling and monitoring the performance of the system. Using this capability of monitoring we can prevent errors by notifying and controlling the capability to change anything in the system. If any of node in cluster gets failed we can reconfigure it. Also, configuration management keeps the snapshot of all the version of infrastructure.

Why Configuration Management?

The reason why we should use configuration management is to overcome the difficult situation that we face while setting up the cluster. A few of these are:

  • Managing multiple servers
  • Scaling up and scaling down
  • Syncing up with development team and infrastructure team

What Is Ansible?

Ansible is a radically simple IT automation engine that automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs. It can be used for advanced tasks such as continuous deployments or zero downtime rolling updates. Ansible is a push-based configuration management tool; it means we can push configuration onto the node machine directly without having any central node. It communicates with remote machines over SSH. The main goals of Ansible are ease of use and simplicity.  

Features of Ansible

  • Simple and easy to use
  • Agentless
  • YAML-based playbook

Installation

Ansible communicates with other node using SSH protocol. It does not require any database installation or any running process. You only need to install it on one machine; it can even be your local machine. You just need Python installed on your system. However, Windows is not supported for the control machine.

Ubuntu:

sudo apt install ansible

CentOS/Fedora :

sudo dnf install ansible
yum install ansible

 

Arch Linux-based :

sudo pacman -S ansible

FreeBSD :

sudo pkg install ansible

 

PIP :

sudo pip install ansible

Ansible Inventory

Ansible can work for multiple systems at a time. It achieves this by selecting portions of systems listed in Ansible’s inventory, which is by default saved in the location /etc/ansible/hosts. You can specify a different inventory file using the -i <path> option on command line.

The inventory file can be in one of many formats, depending on the inventory plugins you have. For this example, the format for /etc/ansible/hosts is an INI-like and looks like this:

[mailservers]
mail.example.com [webservers]
foo.example.com
bar.example.com [dbservers]
one.example.com
two.example.com
three.example.com

A YAML version would look like this:

all: hosts: mail.example.com children: webservers: hosts: foo.example.com: bar.example.com: dbservers: hosts: one.example.com: two.example.com: three.example.com:

It is easy to assign variables to hosts that will be used in playbooks:

[testservers]
host1 http_port=80 maxRequestsPerChild=808
host2 http_port=303 maxRequestsPerChild=909

We can also define variables that can be applied to an entire group:

[testservers]
host1
host2 [testserver:vars]
ntp_server=ntp.testservers.example.com
proxy=proxy.atlanta.example.com

In Ansible inventory, we can create groups of groups and can set a variable for those groups of groups:

[india]
host1
host2 [japan]
host3
host4 [asia:children]
India
Japan [asia:vars]
ansible _user=value [world:children]
asia
europe

There are two default groups: all and ungrouped. all contains every host and ungrouped contains all the hosts that do not have a group aside from all.

Ansible Ad-Hoc Commands

An ad-hoc command is something that you might type in to do something rapidly, but don’t want to save for later. Just like executing a command in the shell instead of creating the shell script for that. An ad-hoc command contains two different parameters; the host group on which task is going to run and the module to run. If you want to ping each host with a single command, you can do it using the following:

ansible host_group -m ping

Similarly, you can perform many other operations using ansible like copying a file, managing packages, gathering facts, etc.

Ad-hoc commands are a powerful yet straightforward feature of Ansible.

Ansible Playbook and Modules

Playbooks are a completely different way to use Ansible than in ad-hoc task execution mode and are particularly powerful. There is a way to send commands to the remote node using the script, like a shell script that contains the set of command. Ansible Playbooks are written in the YAML format. YAML is a data serialization language.

In every Playbook, there are one or more “plays” in a list. The goal of the play is to map hosts with a certain function. Ansible does it through the task, which is nothing more than a call to an Ansible module.

Example of a playbook:

---
- hosts: webservers vars: http_port: 80 max_clients: 200 remote_user: root tasks: - name: ensure apache is at the latest version yum: name=httpd state=latest - name: write the apache config file template: src=/srv/httpd.j2 dest=/etc/httpd.conf notify: - restart apache - name: ensure apache is running (and enable it at boot) service: name=httpd state=started enabled=yes handlers: - name: restart apache service: name=httpd state=restarted

Every playbook starts with three dashes (—) followed by host list, then a variable list, then a task list, and at the end there are handlers.

The host list contains the list of hosts where we want to run the task.

The variable list is to set the properties for the current play.

The task list contains the number of tasks which are going to execute.

The handlers are also tasks; the only difference is that in order to execute handler we need some trigger in the list of task. For example, notify. These ‘notify’ actions are triggered at the end of each block of tasks in a play, and will only be triggered once even if notified by multiple different tasks.

To run a playbook, we can use the following command:

ansible -playbook

Ansible ships with many modules (called the “module library”) that can be executed directly on remote hosts or through Playbooks.

Users can also write their own modules. These modules can control system resources like services, packages, or files (anything really), or handle executing system commands.

Original Link

Tidy Config With Owner

It is a truth universally acknowledged that a single program in possession of a good configuration must be in want of a way to easily access it.
—Jane Austen?

Any non-trivial software, at some point, needs a way to allow users to configure it. By far the easiest solution is to use a text file with some convention (.ini, .yaml, .json, .xml, .you name it) and parse it at the start.

Java has had support for properties files since version 1, and it’s probably the easiest way to configure Java programs. The class Properties has the methods load and store to read and write property files. So far so good.

FileOutputStream out = new FileOutputStream("appProperties");
Properties prop = new Properties();
prop.setProperty("port", "1234");
prop.store(output, "comment");
out.close();
. . .
Properties prop = new Properties();
FileInputStream in = new FileInputStream("appProperties");
prop.load(in);
String port = prop.getProperty("port");
in.close();

Where do we keep the configuration in the program?

There are three common solutions:

  1. A singleton. Once read, we can keep the Properties object as a singleton and then we are able to read from it anywhere in the code.
  2. A higher level object that wraps all the around configuration. Typically, this is called ConfigManager or PropertyManager. Then, we can pass this object to any method that needs to read the configuration.
  3. We parse the config and make specific value objects immutable to keep them logically organized.

The singleton solution is the easiest to implement, but it’s also the most fragile: If we want to change a configuration, we have to check all our code, since it can be called everywhere. Then, for any class that needs them, we need to mock the singleton. Finally, there could be some logic to access configuration, and we need to remember to use same logic everywhere.

The manager solution has the advantage of keeping all the logic together, and it is usually easier to mock, since we own the interface. The problem is that we need to write more code, again and again.

The value objects are the cleanest solution, but they require a lot of boilerplate code to read the property file and set the value object. Or do they?

Enter Owner.

Disclaimer and full disclosure: The author of Owner is a dear friend of mine and one of the best programmers I have ever worked with.

As you may have guessed at this point, Owner is a library that loads value objects from your configuration files, or creates the config files from your objects.

Now I have a small project on GitHub that exposes a rest API to do simple mortgage-like calculations. I’m using it as an exercise and starting point for a more complicated proof of concept.

Now I want a configuration file to specify the port to listen and the list of UserAccounts.

The first step is straightforward enough: Just add the dependency in Gradle (or Maven). Note that if you are using a Java version older than Java 8 you the need the owner package, not owner-java8.

dependencies { compile 'com.sparkjava:spark-core:2.6.0' testCompile "junit:junit:4.12" compile 'org.slf4j:slf4j-simple:1.7.21' compile 'org.aeonbits.owner:owner-java8:1.0.9'
}

Step two: create an interface with all the properties you need. You can specify the default values using annotations.

Note that all properties come properly typed. While using a standard Properties class, all properties are Strings. For List properties, you just have to specify values separated by commas.

By default, Owner looks for the config file inside the resources, with the same package/file of the interface. I prefer an external text file, so I just have to specify the file location with another annotation. Note that I can even specify more than one location and the algorithm used to merge them!

Step three: You just have to call ConfigFactory.create and everything just works!

Owner can do much more than this: You can specify property separators, hot-reload, remote properties (Zookeeper), mutable values, etc. Just read the documentation.

Have fun!

Original Link

MySQL vs. MariaDB: Default Configuration Differences

In this blog post, I’ll discuss some of the MySQL and MariaDB default configuration differences, focusing on MySQL 5.7 and MariaDB 10.2.

MariaDB Server is a general purpose, open-source database created by the founders of MySQL. MariaDB Server (referred to as MariaDB for brevity) has similar roots as Percona Server for MySQL but is quickly diverging from MySQL compatibility and growing on its own. MariaDB has become the default installation for several operating systems (such as Red Hat Enterprise Linux/CentOS/Fedora). Changes in the default variables can make a large difference in the out-of-box performance of the database, so knowing what is different is important.

As MariaDB grows on its own and doesn’t remain 100% compatible with MySQL, the default configuration settings might not mean everything or behave the way they used to. It might use different variable names or implement the same variables in new ways. You also need to take into account that MariaDB uses its own Aria storage engine that has many configuration options that do not exist in MySQL.

Note: In this blog, I am looking at variables common to both MySQL or MariaDB, but have different defaults, not variables that are specific to either MySQL or MariaDB (except for the different switches inside the optimizer_switch).

Binary Logs

Variable MariaDB Default MySQL Default
sync_binlog 0 1
binlog_format Mixed Row 

MySQL has taken a more conservative stance when it comes to the binary log. In the newest versions of MySQL 5.7, they have updated two variables to help ensure all committed data remains intact and identical. binlog_format was updated to row in MySQL in order to prevent non-deterministic statements from having different results on the slave. Row-based replication also helps when performing a lot of smaller updates. MariaDB defaults to the mixed format. The mixed format uses statement-based format unless certain criteria are met. It that case, it uses the row format. You can see the detailed criteria for when the row format is used here.

The other difference that can cause a significant impact on performance is related to sync_binlogsync_binlog controls the number of commit groups to collect before synchronizing the binary log to disk. MySQL has changed this to 1, which means that every transaction is flushed to disk before it is committed. This guarantees that there can never be a committed transaction that is not recorded (even during a system failure). This can create a big impact to performance, as shown by Roel Van de Paar in his post.

MariaDB utilizes a value of 0 for sync_binlog, which allows the operating system to determine when the binlog needs to be flushed. This provides better performance but adds the risk that if MariaDB crashes (or power is lost) that some data may be lost.

MyISAM

Variable MariaDB Default MySQL Default
myisam_recover_options BACKUP, QUICK OFF
key_buffer_size 134217728 8388608

InnoDB has replaced MyISAM as the default storage engine for some time now, but it is still used for many system tables. MySQL has tuned down the MyISAM settings since it is not heavily used.

When mysqld opens a table, it checks whether the table is marked as crashed or was not closed properly, and runs a check on it based on the myisam_recover_options settings. MySQL disables this by default, preventing recovery. MariaDB has enabled the BACKUP and QUICK recovery options. BACKUP causes a table_name-datetime.bak file to be created whenever a data file is changed during recovery. QUICK causes mysqld to not check the rows in a table if there are no delete blocks, ensuring recovery can occur faster.

MariaDB 10.2 increased the key_buffer_size. This allows for more index blocks to be stored in memory. All threads use this buffer, so a small buffer can cause information to get moved in and out of it more quickly. MariaDB 10.2 uses a buffer 16 times the size of MySQL 5.7: 134217728 in MariaDB 10.2 vs. 8388608 in MySQL 5.7.

InnoDB

Variable MariaDB Default MySQL Default
innodb_max_undo_log_size 10485760 (10 MiB) 1073741824 (1024 MiB)

InnoDB variables have remained primarily unchanged between MariaDB 10.2 and MySQL 5.7. MariaDB has reduced the innodb_max_undo_log_size starting in 10.2.6. This was reduced from MySQL’s default of 1073741824 (1024 MiB) to 10485760 (10 MiB). These sizes reflect the maximum size an undo tablespace can become before it is marked for truncation. The tablespace doesn’t get truncated unless innodb_undo_log_truncate is enabled, and it is disabled in MySQL 5.7 and MariaDB 10.2 by default.

Logging

Variable MariaDB Default MySQL Default
log_error /var/log/mysqld.log
log_slow_admin_statements ON OFF
log_slow_slave_statements ON OFF
lc_messages_dir /usr/share/mysql

Logs are extremely important for troubleshooting any issues, so the different choices in logging for MySQL 5.7 and MariaDB 10.2 are very interesting.

The log_error variable allows you to control where errors get logged. MariaDB 10.2 leaves this variable blank, writing all errors to stderr. MySQL 5.7 uses an explicitly created file at /var/log/mysqld.log.

MariaDB 10.2 has also enabled additional slow statement logging. log_slow_admin_statements creates a record for any administrative statements that are not typically written to the binlog. log_slow_slave_statements logs the replicated statements sent from the master if they are slow to complete. MySQL 5.7 does not enable logging of these statements by default.

lc_messages_dir is the directory that contains the error message files for various languages. The variable defaults might be a little misleading in MariaDB 10.2. lc_messages_dir  is left empty by default, although it still uses the same path as MySQL 5.7. The files are located in /usr/share/mysql by default for both databases.

Performance Schema

Variable MariaDB Default MySQL Default
performance_schema OFF ON
performance_schema_setup_actors_size 100 -1 (auto adjusted)
performance_schema_setup_objects_size 100 -1 (auto adjusted) 

The performance schema is an instrumentation tool that is designed to help troubleshoot various performance concerns. MySQL 5.7 enables the performance schema and many of its instruments by default. MySQL even goes so far as to detect the appropriate value for many Performance Schema variables instead of setting a static default. The Performance Schema does come with some overhead, and there are many blogs regarding how much this can impact performance. I think Sveta Smirnova said it best in her blog Performance Schema Benchmarks OLTP RW: “…test on your system! No generic benchmark can exactly repeat a workload on your site.”

MariaDB has disabled the Performance Schema by default, as well as adjusted a couple of the dynamic variables. Note that if you wish to disable or enable the Performance Schema, it requires a restart of the server since these variables are not dynamic. performance_schema_setup_actors_size and performance_schema_setup_objects_size have both been set to a static 100, instead of the dynamic -1 used in MySQL 5.7. These both limit the number of rows that can be stored in relative tables. This creates a hard limit to the size these tables can grow to, helping to reduce their data footprint.

SSL/TLS

Variable MariaDB Default MySQL Default
ssl_ca ca.pem
ssl_cert server-cert.pem
 ssl_key server-key.pem

Secure Sockets Layer (SSL) and Transport Layer Security (TLS) are cryptographic protocols that allow for secure communication. SSL is actually the predecessor of TLS, although both are often referred to as SSL. MySQL 5.7 and MariaDB 10.2 support both yaSSL and OpenSSL. The default configurations for SSL/TLS differ only slightly between MySQL 5.7 and MariaDB 10.2. MySQL 5.7 sets a specific file name for ssl_cassl_cert, and ssl_key. These files are created in the base directory, identified by the variable basedir. Each of these variables is left blank in MariaDB 10.2, so you need to set them before using secure connections. These variables are not dynamic, so be sure to set the values before starting your database.

Query Optimizer

MariaDB 10.2 MySQL 5.7 Optimization Meaning Switch
N/A OFF Batched Key Access Controls use of BKA join algorithm batched_key_access
N/A ON Block Nested-Loop Controls use of BNL join algorithm block_nested_loop
N/A ON Condition Filtering Controls use of condition filtering condition_fanout_filter
Deprecated ON Engine Condition Pushdown Controls engine condition pushdown engine_condition_pushdown
ON N/A Engine Condition Pushdown Controls ability to push conditions down into non-mergeable views and derived tables condition_pushdown_for_derived
ON N/A Exists Subquery Allows conversion of in statements to exists statements exists_to_in
ON N/A Exists Subquery Allows conversion of exists statements to in statements in_to_exists
N/A ON Index Extensions Controls use of index extensions use_index_extensions
OFF N/A Index Merge Allows index_merge for non-equality conditions index_merge_sort_intersection
ON N/A Join Algorithms Perform index lookups for a batch of records from the join buffer join_cache_bka
ON N/A Join Algorithms Controls use of BNLH and BKAH algorithms join_cache_hashed
ON N/A Join Algorithms Controls use of incremental algorithms join_cache_incremental
ON N/A Join Algorithms Controls use of block-based algorithms for outer joins outer_join_with_cache
ON N/A Join Algorithms Controls block-based algorithms for use with semi-join operations semijoin_with_cache
OFF N/A Join Buffer Creates the join buffer with an estimated size based on the estimated number of rows in the result optimize_join_buffer_size
ON N/A Materialized Temporary Tables Allows index creation on derived temporary tables derived_keys
ON N/A Materialized Temporary Tables Controls use of the rowid-merge strategy partial_match_rowid_merge
ON N/A Materialized Temporary Tables Controls use of the partial_match_table-scan strategy partial_match_table_scan
OFF ON Multi-Range Read Controls use of the multi-range read strategy mrr
OFF ON Multi-Range Read Controls use of cost-based MRR, if mrr=on mrr_cost_based
OFF N/A Multi-Range Read Enables key ordered scans if mrr=on mrr_sort_keys
ON N/A Order By Considers multiple equalities when ordering results ordery_uses_equalities
ON N/A Query Plan Allows the optimizer to use hidden components of InnoDB keys extended_keys
ON N/A Query Plan Controls the removal of irrelevant tables from the execution plan table_elimination
ON N/A Subquery Stores subquery results and correlation parameters for reuse subquery_cache
N/A ON Subquery Materialization Controls us of cost-based materialization ubquery_materialization_cost_based
N/A ON Subquery Materialization and Semi-Join Controls the semi-join duplicate weedout strategy duplicateweedout

The query optimizer has several variances that not only affect query performance but also how you write SQL statements. The query optimizer is substantially different between MariaDB and MySQL, so even with identical configurations, you are likely to see varying performance.

sql_mode puts restrictions on how you can write queries. MySQL 5.7 has several additional restrictions compared to MariaDB 10.2. only_full_group_by requires that all fields in any select…group by statement are either aggregated or inside the group by clause. The optimizer doesn’t assume anything regarding the grouping, so you must specify it explicitly.

no_zero_date and no_zero_in_date both affect how the server interprets zeroes in dates. When no_zero_date is enabled, values of 0000-00-00 are permitted but produce a warning. With strict mode enabled, then the value is not permitted and produces an error. no_zero_in_date is similar, except it applies to any section of the date (month, day, or year). With this disabled, dates with 0 parts, such as 2017-00-16 are allowed as-is. When enabled, the date is changed to 0000-00-00 without warning. Strict mode prevents the date being inserted unless ignore is provided, as well. INSERT IGNORE and UPDATE IGNORE insert the dates as 0000-00-00. 5.7.4 changed this. no_zero_in_date was consolidated with strict mode, and the explicit option is deprecated.

The query_prealloc_size determines the size of the persistent buffer used for statement parsing and execution. If you regularly use complex queries, it can be useful to increase the size of this buffer, as it does not need to allocate additional memory during the query parsing. MySQL 5.7 has set this buffer to 8192, with a block size of 1024. MariaDB increased this value in 10.1.2 up to 24576.

query_alloc_block_size dictates the size in bytes of any extra blocks allocated during query parsing. If memory fragmentation is a common problem, you might want to look at increasing this value. MySQL 5.7 uses 8192, while MariaDB 10.2 uses 16384 (twice that). Be careful when adjusting the block sizes: going too high consumes more than the needed amount of memory, and too low causes significant fragmentation.

The optimizer_switch  variable contains many different switches that impact how the query optimizer plans and performs different queries. MariaDB 10.2 and MySQL 5.7 have many differences in their enabled options and even the available options. You can see a brief breakdown of each of the options below. Any options with N/A are not supported on that server.

Miscellaneous

Variable MariaDB Default MySQL Default
default_tmp_storage_engine NULL InnoDB
group_concat_max_len 1048576(1M) 1024(1K)
lock_wait_timeout 86400 (1 DAY) 31536000 (1 YEAR)
max_allowed_packet (16777216) 16MB 4194304 (4MB)
max_write_lock_count 4294967295 18446744073709551615
old_passwords OFF 0
open_files_limit 0 dependent on OS
pid_file /var/lib/mysql/ /var/run/mysqld/
secure_file_priv Varies by installation
sort_buffer_size 2097152 262144
table_definition_cache 400 autosized
table_open_cache_instances 8 16
thread_cache_size autosized autosized
thread_stack 292KB 192KB/256KB 

There are many variables that do not fit well into a group. I will go over those here.

When creating temporary tables, if you do not specify a storage engine, then a default is used. In MySQL 5.7, this is set to InnoDB, the same as the default_storage_engine. MariaDB 10.2 also uses InnoDB, but it is not explicitly set. MariaDB sets the default_tmp_storage_engine to NULL , which causes it to use the default_storage_engine. This is important to remember if you change your default storage engine, as it would also change the default for temporary tables.

An important note: In MariaDB, this is only relevant to tables created with CREATE TEMPORARY TABLE. Internal in-memory temporary tables use the memory storage engine, and internal, on-disk temporary tables use the aria engine by default.

The group_concat function can cause some very large results if left unchecked. You can restrict the maximum size of results from this function with group_concat_max_len. MySQL 5.7 limits this to 1024 (1K). MariaDB increased the value in 10.2.4 up to 1048576 (1M).

lock_wait_timeout controls how long a thread waits as it attempts to acquire a metadata lock. Several statements require a metadata lock, including DDL and DML operations, lock tables, flush tables with read lock, and handler statements. MySQL 5.7 defaults to the maximum possible value (one year), while MariaDB 10.2 has toned this down to one day.

max_allowed_packet sets a limit to the maximum size of a packet, or a generated/intermediate string. This value is intentionally kept small (4MB) on MySQL 5.7 in order to detect the larger, intentionally incorrect packets. MariaDB has increased this value to 16MB. If using any large BLOB fields, you need to adjust this value to the size of the largest BLOB, in multiples of 1024, or you risk running into errors transferring the results.

max_write_lock_count controls the number of write locks that can be given before some read lock requests being processed. In extremely heavy write loads, your reads can pile up while waiting for the writes to complete. Modifying the max_write_lock_count allows you to tune how many writes can occur before some reads are allowed against the table. MySQL 5.7 keeps this value at the maximum (18446744073709551615), while MariaDB 10.2 lowered this to 4294967295. One thing to note is that this is still the maximum value on MariaDB 10.2.

old_passwords controls the hashing method used by the password function, create user, and grant statements. This variable has undergone several changes in MySQL 5.7. As of 5.7.4, the valid options were MySQL 4.1 native hashing, Pre-4.1 (“old”) hashing, and SHA-256 hashing. Version 5.7.5 removed the “old” Pre-4.1 method, and in 5.7.6, the variable has been deprecated with the intent of removing it entirely. MariaDB 10.2 uses a simple boolean value for this variable instead of the enumerated one in MySQL 5.7, though the intent is the same. Both default the old_passwords to OFF, or 0, and allow you to enable the older method if necessary.

open_files_limit restricts the number of file descriptors mysqld can reserve. If set to 0 (the default in MariaDB 10.2), then mysqld reserves max_connections * 5 or max_connections + table_open_cache * 2, whichever is larger. It should be noted that mysqld cannot use an amount larger than the hard limit imposed by the operating system. MySQL 5.7 is also restricted by the operating systems hard limit but is set at runtime to the real value permitted by the system (not a calculated value).

pid_file allows you to control where you store the process ID file. This isn’t a file you typically need, but it is good to know where it is located in case some unusual errors occur. On MariaDB, you can find this inside /var/lib/mysql/, while on MySQL 5.7, you will find it inside /var/run/mysqld/. You will also notice a difference in the actual name of the file. MariaDB 10.2 uses the hostname as the name of the pid, while MySQL 5.7 simply uses the process name (mysqld.pid).

secure_file_priv is a security feature that allows you to restrict the location of files used in data import and export operations. When this variable is empty, which was the default in MySQL before 5.7.6, there is no restriction. If the value is set to NULL, import and export operations are not permitted. The only other valid value is the directory path where files can be imported from or exported to. MariaDB 10.2 defaults to empty. As of MySQL 5.7.6, the default will depend on the install_layout CMAKE option.

INSTALL_LAYOUT DEFAULT VALUE
STANDALONE, WIN NULL(>=MySQL 5.7.16_, empty(<MySQL 5.7.16)
DEB, RPM, SLES, SVR4 /var/lib/mysql-files
Other Mysql-files under the CMAKE_INSTALL_PREFIX value

mysqld uses a sort buffer regardless of storage engine. Every session that must perform a sort allocates a buffer equal to the value of sort_buffer_size. This buffer should at minimum be large enough to contain 15 tuples. In MySQL 5.7, this defaults to 262144, while MariaDB 10.2 uses the larger value 2097152.

The table_definition_cache restricts the number of table definitions that can be cached. If you have a large number of tables, mysqld may have to read the .frm file to get this information. MySQL 5.7 auto detects the appropriate size to use, while MariaDB 10.2 defaults this value to 400. On my small test VM, MySQL 5.7 chose a value of 1400.

The table_open_cache_instances vary in implementation between MySQL and MariaDB. MySQL 5.7 creates multiple instances of the table_open_cache, each holding a portion of the tables. This helps reduce contention, as a session needs to lock only one instance of the cache for DML statements. In MySQL 5.7.7 the default was a single instance, but this was changed in MySQL 5.7.8 (increased to 16). MariaDB has a more dynamic approach to the table_open_cache. Initially there is only a single instance of the cache, and the table_open_cache_instances variable is the maximum number of instances that can be created. If contention is detected on the single cache, another instance is created and an error logged. MariaDB 10.2 suspects that the maximum eight instances it sets by default should support up to 100 CPU cores.

The thread_cache_size controls when a new thread is created. When a client disconnects, the thread is stored in the cache as long as the maximum number of threads do not exist. Although this is not typically noticeable, if your server sees hundreds of connections per second, you should increase this value to so that new connections can use the cache. thread_cache_size is an automatically detected variable in both MySQL 5.7 and MariaDB 10.2, but their methods to calculate the default vary significantly. MySQL uses a formula, with a maximum of 100: 8+ (max_connections / 100). MariaDB 10.2 uses the smaller value out of 256 or the max_connections size.

The thread_stack is the stack size for each thread. If the stack size is too small, it limits the complexity of SQL statements, the recursion depth of stored procedures and other memory-consuming actions. MySQL 5.7 defaults the stack size to 192KB on 32-bit platforms and 256KB on 64-bit systems. MariaDB 10.2 adjusted this value several times. MariaDB 10.2.0 used 290KB, 10.2.1 used 291KB and 10.2.5 used 292KB.

Hopefully, this helps you with the configurations options between MySQL and MariaDB. Use the comments for any questions.

Original Link

PaaS for Java Developers (Part 3)

I want to start Part 3 (check out Part 1 and Part 2 if you haven’t already) by saying that I really do like and recommend Pivotal Web Services and Cloud Foundry as a simple and robust way to deploy Java applications. I’ve been running Structurizr on Pivotal Web Services for over three years now and I’ve had very few issues with the core platform. The marketplace services, on the other, are a different story.

In addition to providing a deployment platform to run your code, most of the Platform-as-a-Service providers (Pivotal Web Services, Heroku, Azure, etc) provide a collection of “marketplace services”. These are essentially add-on services that give you easy access to databases, messaging providers, monitoring tools, etc. As I write this, the Pivotal Web Services marketplace includes many of the popular technologies you would expect to see; including MySQL, PostgreSQL, Redis, Memcached, MongoDB, RabbitMQ, etc.

MySQL-as-a-Service

Let’s imagine that you’re building a Java web application and you’d like to store data in a MySQL database. You have a few options. One option is to build your own database server somewhere like Amazon AWS. Of course, you need to have the skills to do this and, given that Part 1 was all about the benefits of PaaS over building your own infrastructure, the DIY approach is not necessarily appealing to everybody.

Another option is to find a “Database-as-a-Service” provider that will create and run a MySQL server for you. ClearDB is one such example, and it’s also available on the Pivotal Web Services marketplace. All you need to do is create a subscription to ClearDB through the marketplace (there is a free plan), connect to the database and create your schema. That’s it. Most of the operational aspects of the MySQL database are taken care of; including backups and replication.

To connect your Java application to ClearDB, again, you have some options. The first is to place the database endpoint URL, username, and password in configuration, like you might normally do.

The other option is to use the Cloud Foundry command line interface to issue a “cf bind” command to bind your ClearDB database instance to your application instance(s), and use Cloud Foundry’s auto-reconfiguration feature. If you’re building a Spring-based application and you have a MySQL DataSource configured (some caveats apply), Cloud Foundry will automagically reconfigure the DataSource to point to the MySQL database that you have bound to your application. When you’re getting started, this is a fantastic feature as it’s one less thing to worry about. It also means that you don’t need to update URLs, usernames, and passwords if they change.

I used this approach for a couple of years and, if you look at the Structurizr changelog, you can see the build number isn’t far off 1000. Each build number represents a separate (automated) deployment to Pivotal Web Services. So I’ve run a lot of builds. And most of them have worked. Occasionally though, I would see deployments fail because services (like ClearDB) couldn’t be bound to my application instances. Often these were transient errors, and restarting the deployment process would fix it. Other times I had to raise a support ticket because there was literally nothing I could do. One of the big problems with PaaS is that you’re stuck when it goes wrong, because you don’t have access to the underlying infrastructure. Thankfully this didn’t happen often enough to cause me any real concern, but it was annoying nonetheless.

More annoying was a little bug that I found with Structurizr and UTF-8 character encoding. When people sign up for an account, a record is stored in MySQL and a “please verify your e-mail address” e-mail is sent. If the person’s name included any UTF-8 characters, it would look fine in the initial e-mail but not in subsequent e-mails. The problem was that the UTF-8 characters were not being stored correctly in MySQL. After replicating the problem in my dev environment, I was able to fix it by adding a characterEncoding parameter to the JDBC URL. Pushing this fix to the live environment is problematic though, because Cloud Foundry is automatically reconfiguring my DataSource URLs. The simple solution here is to not use automatic reconfiguration, and it’s easy to disable via the Java buildpack or by simply not binding a MySQL database instance to the Java application. At this point, I’m still using ClearDB via the marketplace, but I’m specifying the connection details explicitly in configuration.

The final problem I had with ClearDB was earlier this summer. I would often see error messages in my logs saying that I’d exceeded the maximum number of connections. The different ClearDB plans provide differing levels of performance and numbers of connections. I think the ClearDB databases offered via the marketplace are multi-tenanted, and there’s a connection limit to ensure quality of service for all customers. And that’s okay, but I still couldn’t work out why I was exceeding my quota because I know exactly how many app instances I have running and the maximum number of permitted connections in the connection pools per app instance. I ran some load tests with Apache Benchmark and I couldn’t get the number of open connections to exceed what had been configured in the connection pool. Often I would be watching the ClearDB dashboard, which shows you the number of open connections, and my applications wouldn’t be able to connect despite the dashboard only showing a couple of live connections.

Back to vendor lock-in and migration cost. The cost of migrating from ClearDB to another MySQL provider is low, especially since I’m no longer using the Cloud Foundry automatic reconfiguration mechanism. So I exported the data and created a MySQL database on Amazon RDS instead. For not much more money per month, I have a MySQL database running in multiple availability zones, with encrypted data at rest and I know for sure that the JDBC connection is happening over SSL (because that’s how I’ve configured it).

Email-Delivery-as-a-Service

Another marketplace service that I used from an early stage is SendGrid, which provides “e-mail delivery as a service”. There’s a theme emerging here! Again, you can run a “cf bind” command to bind the SendGrid service to your application. In this case, though, no automatic reconfiguration takes place, because SendGrid exposes a web API. This raises the question of where you find the API credentials. One of the nice features of the marketplace services is that you can get access to the service dashboards (e.g. the ClearDB dashboard, SendGrid dashboard, etc) via the Pivotal Web Services UI, using single sign-on. The service credentials are usually found somewhere on those service dashboards.

After finding my SendGrid password, I hardcoded it into a configuration file and pushed my application. To my surprise, trying to connect to SendGrid resulted in an authentication error because my password was incorrect. So I again visited the dashboard and yes, the password was now different. It turns out that, and I don’t know if this is still the case, the process of running a “cf bind” command would result in the SendGrid credentials being changed. What I didn’t realize is that service credentials are set in the VCAP_SERVICES environment variable of the running JVMs, and you’re supposed to extract credentials from there. This is just a regular environment variable, with JSON content. All you need to do is grab it and parse out the credentials that you need, either using one of the many code samples or libraries on GitHub to do this. From a development perspective, I now have a tiny dependency on this VCAP stuff, and I need to make sure that my local Apache Tomcat instance is configured in the same way, with a VCAP_SERVICES environment variable on startup.

Some time later, SendGrid moved to v3 of their API, which included a new version of the Java library. So I upgraded, which resulted in the API calls failing. After signing in to the SendGrid dashboard, I noticed that I now have the option of connecting via an API key. Long story short, I ditched the VCAP stuff and configured the SendGrid client to use the API with the API key, which I’ve also added to my deployment configuration.

Other Services

I used the Pivotal SSL Service for a while too, which provides a way to upload your own SSL certificate. When used in conjunction with the Cloud Foundry router, you can serve traffic from your own domain name with a valid SSL certificate. I also had a few issues with this, resulting in downtime. The Java applications were still running and available via the cfapps.io domain, but not via the structurizr.com domain. I’ve since switched to using CloudFlare’s dedicated SSL certificate service for $5 per month. I did try the free SSL certificate, but some people reported SSL handshake issues on some corporate networks when uploading software architecture models via Structurizr’s web API.

I also used the free Redis marketplace service for a while, in conjunction with Spring Session, as a way to store HTTP session information. I quickly used up the quota on that though, and found it more cost effective to switch to a Redis Cloud plan directly with Redis Labs.

PaaS Without the Marketplace

There are certainly some benefits to using the marketplace services associated with your PaaS of choice. It’s quick and easy to get started because you just choose a service, subscribe to it and you’re ready to go. All of your services are billed from, and managed, in one place, so that’s nice too. And, with Cloud Foundry, I can live with configuration via the VCAP_SERVICES; at least everything is in one place.

If you’re just starting out with PaaS, I’d certainly take a look at the marketplace services on offer. Your mileage may vary, but I find it hard to recommend them for production use though. As I said at the start of this post, the core PaaS functionality on Pivotal Web Services has been solid for the three years I’ve been using it. Any instability I’ve experienced has been around the edge, related to the marketplace services. It’s also unclear what you’re actually getting in some cases, and where the services are running. If you look at the ClearDB plans, the free plan (“Spark DB”) says that it’s “Perfect for proof-of-concept and initial development”, whereas the $100 per month “Shock DB” plan says “Designed for apps where high performance is crucial”. These plans are not listed on the ClearDB website, so it’s hard to tell whether they are multi-tenant or single-tenant services. Some of the passwords created by marketplace services also look remarkably short (e.g. 8 characters) considering they are Internet-accessible.

With all of this in mind, I prefer to sign up with a service directly and integrate it in the usual way. I don’t feel that the pros of using the marketplace services outweigh the cons. I’m also further reducing my migration cost, should I ever need to move away from my PaaS. In summary then, the live deployment diagram for Structurizr now looks like this:

Structurizr - Deployment

The Java applications are hosted at Pivotal Web Services, and everything else is running outside, yet still within Amazon’s us-east-1 AWS region. This should hopefully help to address another common misconception that you need to run everything inside of a PaaS environment. You don’t. There’s nothing preventing you from running Java applications on a PaaS, and have them connect to a database server that you’ve built yourself. And it gives you the freedom to use any technology you choose, whether it’s available on the marketplace or not. You do need to think about collocation, performance, and security, of course.

So that’s a summary of my experience with the marketplace services. In Part 4, I’ll discuss more about my build/deployment script, and how straightforward it is to do zero-downtime, blue-green deployments via Cloud Foundry. Comments or questions? Tweet me at @simonbrown.

Original Link

Configuring Spring Boot on Kubernetes With Secrets

In Part 1 of this series, we saw how to use ConfigMaps to configure a Spring Boot app on Kubernetes. ConfigMaps are OK when we use simple configuration data that does not contain sensitive information. When using sensitive data like API keys, passwords, etc., Secrets are the preferred and recommended way. In this second part of the series, we will explore configuring Spring Boot on Kubernetes with Secrets.

The sources for this blog post are available in my GitHub repo.

Setup

You might need access to a Kubernetes cluster to play with this application. The easiest way to get a local Kubernetes cluster up and running is using minikube.The rest of this post assumes you have minikube up and running.

Like ConfigMaps, Secrets can be configured in two ways:

  1. As Environment Variables
  2. As Files

Secrets as Environment Variables

The Spring Boot application that we will build in this blog post uses spring-security. Spring Security, by default, enables security on the entire Spring Boot application.

The default user and password of the application will be displayed to the developer during application boot up.

Using default security password: 981d5f9f-c8ea-413f-8f3b-71daaa20d53c

To override the default security user/password, you need to update the application.properties to be:

security.user.name=${SECRETS_DEMO_USER:demo}
security.user.password=${SECRETS_DEMO_USER_PASSWD:demo}

Let’s now follow the next steps to inject the environment variables.

Create Secrets

Developers can start by creating a Kubernetes Secret called spring-security. This is just the name I am using, but it could be anything of your choice, but remember to use the same name in our deployment.yaml that will configure later.

You can then add two properties — “spring.user.name” and “spring.user.password” — to the Secrets by executing the following command:

kubectl create secret generic spring-security \
--from-literal=spring.user.name=demo \
--from-literal=spring.user.password=password

If you wish to see how your Secrets look, execute the following command,

kubectl get secret spring-security -o yaml

The sample output of the above command is shown below.

apiVersion: v1
data: spring.user.name: ZGVtbw== spring.user.password: cGFzc3dvcmQ=
kind: Secret
metadata: creationTimestamp: 2017-09-19T15:24:29Z name: spring-security namespace: default resourceVersion: "71363" selfLink: /api/v1/namespaces/default/secrets/spring-security uid: a0e0254e-9d4e-11e7-9b8d-080027da6995
type: Opaque

NOTE: All the values of the properties in the Secrets will be displayed as base64 encoded values.

Create the Fragment deployment.yaml

To configure Spring Boot application on Kubernetes and inject environment variables from Secrets, we need to create the deployment.yaml fragment. Fragments are only bits and pieces of complete Kubernetes resources like deployments, services, etc. It is the responsibility of fabric8-maven-plugin to merge the existing fragments to a complete Kubernetes resource(s) or generate new and missing ones.

The following sections show the required fragments that can be created by the developers inside $PROJECT_HOME/src/main/fabric8 folder:

deployment.yaml

spec: template: spec: containers: - env: - name: SECRETS_DEMO_USER valueFrom: secretKeyRef: name: spring-security key: spring.user.name - name: SECRETS_DEMO_USER_PASSWD valueFrom: secretKeyRef: name: spring-security key: spring.user.password

The environment variables SECRETS_DEMO_USER and SECRETS_DEMO_USER_PASSWD will have its value injected from secret with name matching secretKeyRef –> name with its value from secret property specified by secretKeyRef –> value

NOTE: As the application is configured to use fabric8-maven-plugin, we can create a Kubernetes deployment and service as fragments in ‘$PROJECT_HOME/src/main/fabric8’. The fabric8-maven-plugin takes care of building the complete Kubernetes manifests by merging the contents of the fragment(s) from ‘$PROJECT_HOME/src/main/fabric8’ during the deploy.

Deploy the Application

To deploy the application, execute the following command from the $PROJECT_HOME:

./mvnw clean fabric8:deploy.

Access the Application

The application status can be checked with the command kubectl get pods -w . Once the application is deployed, let’s do a simple curl like this:

curl $(minikube service spring-boot-secrets-demo --url)/; echo "";

It should return an HTTP 401 Unauthorized error, as we did not provide the credentials to access the app.

Now do a curl -u demo:password $(minikube service spring-boot-secrets-demo --url)/;
echo "";
, which should still return HTTP 404, as we don’t have any resource at that URI, but we are now authorized.

NOTE:

  • The very first deployment of this application tends to take a bit of time, as Kubernetes needs to download the required Docker images for application deployment.
  • The application service URL is found using the command:

    minikube service blog-configmaps-secrets-demo --url
    .

Mounting Secrets as Files

Let’s consider a very simple scenario. Say you want to write a REST API that will call the GitHub API to get all the organizations that your GitHub user account is associated with. The GitHub API to get the organizations you belong to is an authorized call, meaning you need to send a GitHub Personal Access Token as part of the request. Injecting your personal access token as an environment variable might not be as secure as you think, so how do you do it?

The simple way for us to do that is by making the application mount the Secrets as volumes. Once we are able to do that, then we can alter and set permissions on those volumes like how we do for an ssh private key.

Before we get started, I assume that you have created a GitHub Personal Access Token. Once you have it, store them in files.

Create Secrets From a File

Let’s start creating a new secret called spring-github-demo, similar to how we configured a Spring Boot application on Kubernetes to use Secrets as Environment Variables.

kubectl create secret generic spring-github-demo \ --from-file ./github.user \ --from-file ./github.token

When we execute the command kubectl get secret spring-github-demo -o yaml, it will display an output similar to the one shown below.

apiVersion: v1
data: github.token: NjE2OTliMjJjOWQ3YTQ5MDJjZjI5NjBhZThjOWMxNWIxMGQzMmI3Ngo= github.user: a2FtZXNoc2FtcGF0aAo=
kind: Secret
metadata: creationTimestamp: 2017-09-18T13:55:59Z name: spring-github-demo namespace: default resourceVersion: "28217" selfLink: /api/v1/namespaces/default/secrets/spring-github-demo uid: 19ad298b-9c79-11e7-9b8d-080027da6995
type: Opaque

Update Fragment deployment.yaml

Update the deployment.yaml to add the volume mounts that will allow us to mount the application.properties under /deployments/config inside the container.

spec: template: spec: containers: - env: - name: SECRETS_DEMO_USER valueFrom: secretKeyRef: name: spring-security key: spring.user.name - name: SECRETS_DEMO_USER_PASSWD valueFrom: secretKeyRef: name: spring-security key: spring.user.password volumeMounts: - name: github-user mountPath: "/deployments/github" readOnly: true volumes: - name: github-user secret: secretName: spring-github-demo items: - key: github.user path: user - key: github.token path: token 

The container will now have the secrets:

  • github.user mounted as a file inside the container at /deployments/github/user
  • github.token mounted as a file inside the container at /deployments/github/token

The “GitHubController” REST Controller loads your GitHub user and token from the mounted paths and uses them when interacting with the GitHub API. You can access the REST URI path /mygithuborgs, which will return all your organizations that your GitHub ID is associated with as JSON.

Deploy the application again using the command ./mvnw clean fabric8:deploy and access the application using this curl command:

curl -u demo:password $(minikube service blog-configmaps-secrets-demo --url)/mygithuborgs

If you omit the -u demo:password, then it will result in an HTTP 401 Unauthorized error.

In Part 2 of this blog series, we saw how to configure Spring Boot on Kubernetes with Secrets. In the next part, we will see on how to use the spring-cloud-kubernetes Spring module to configure Spring Boot applications on Kubernetes.

Original Link

Configuring Spring Boot on Kubernetes With ConfigMap

ConfigMaps is the Kubernetes counterpart of the Spring Boot externalized configuration. ConfigMaps is a simple key/value store, which can store simple values to files. In this post, we will see how to use ConfigMaps to externalize application configuration.

One way to configure Spring Boot applications on Kubernetes is to use ConfigMaps. ConfigMaps is a way to decouple the application-specific artifacts from the container image, thereby enabling better portability and externalization.

The sources of this blog post are available in my GitHub repo. In this blog post, we will build simple GreeterApplication, which exposes a REST API to greet the user. The GreeterApplication will use ConfigMaps to externalize the application properties.

Setup

You might need access to a Kubernetes cluster to play with this application. The easiest way to get a local Kubernetes cluster up and running is using minikube.The rest of the blog assumes you have minikube up and running.

There are two ways to use ConfigMaps:

  1. ConfigMaps as Environment variables
  2. Mounting ConfigMaps as files

ConfigMaps as Environment Variables

Assuming you have cloned my GitHub repo, let’s refer to the cloned location of the source code as $PROJECT_HOME throughout this document.

You will notice that com.redhat.developers.GreeterController has code to look up an environment variable GREETER_PREFIX.

package com.redhat.developers; import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController; @RestController
@Slf4j
public class GreeterController { @Value("${greeter.message}") private String greeterMessageFormat; @GetMapping("/greet/{user}") public String greet(@PathVariable("user") String user) { String prefix = System.getenv().getOrDefault("GREETING_PREFIX", "Hi"); log.info("Prefix :{} and User:{}", prefix, user); if (prefix == null) { prefix = "Hello!"; } return String.format(greeterMessageFormat, prefix, user); }
}

By convention, Spring Boot applications — rather, Java applications — pass these kinds of values via system properties. Let us now see how we can do the same with a Kubernetes deployment.

  • Let’s create a Kubernetes ConfigMaps to hold the property called greeter.prefix, which will then be injected into the Kubernetes deployment via an environment variable called GREETER_PREFIX.

Create ConfigMap

kubectl create configmap spring-boot-configmaps-demo --from-literal=greeter.prefix="Hello"
  • You can see the contents of the ConfigMap using the command: kubectl get configmap spring-boot-configmaps-demo-oyaml

Create Fragment deployment.yaml

Once we have the Kubernetes ConfigMaps created, we then need to inject the GREETER_PREFIX as an environment variable into the Kubernetes deployment. The following code snippet shows how to define an environment variable in a Kubernetes deployment.yaml.

spec: template: spec: containers: - env: - name: GREETING_PREFIX valueFrom: configMapKeyRef: name: spring-boot-configmaps-demo key: greeter.prefix
  • The above snippet defines an environment variable called GREETING_PREFIX, which will have its value set from the ConfigMap spring-boot-configmaps-demo key greeter.prefix.

NOTE: As the application is configured to use fabric8-maven-plugin, we can create a Kubernetes deployment and service as fragments in ‘$PROJECT_HOME/src/main/fabric8’. The fabric8-maven-plugin takes care of building the complete Kubernetes manifests by merging the contents of the fragment(s) from ‘$PROJECT_HOME/src/main/fabric8’ during deployment.

Deploy Application

To deploy the application, execute the following command from the $PROJECT_HOME ./mvnw clean fabric8:deploy.

Access Application

The application status can be checked with the command kubectl get pods -w . Once the application is deployed, let’s do a simple curl like:

curl $(minikube service spring-boot-configmaps-demo --url)/greet/jerry; echo "";

The command will return the message, “Hello jerry! Welcome to Configuring Spring Boot on Kubernetes! The return message has a prefix called “Hello”, which we had injected via the environment variable GREETING_PREFIX with the value from the ConfigMap property “greeter.prefix”.

Mounting ConfigMaps as Files

Kubernetes ConfigMaps also allows us to load a file as a ConfigMap property. That gives us an interesting option of loading the Spring Bootapplication.properties via Kubernetes ConfigMaps.

To be able to load application.properties via ConfigMaps, we need to mount the ConfigMaps as the volume inside the Spring Boot application container.

Update application.properties

greeter.message=%s %s! Spring Boot application.properties has been mounted as volume on Kubernetes!

Create ConfigMap from File

kubectl create configmap spring-app-config --from-file=src/main/resources/application.properties

The command above will create a ConfigMap called spring-app-config with the application.properties file stored as one of the properties.

The sample output of kubectl get configmap spring-app-config -o yaml is shown below.

apiVersion: v1
data: application.properties: greeter.message=%s %s! Spring Boot application.properties has been mounted as volume on Kubernetes! on Kubernetes!
kind: ConfigMap
metadata: creationTimestamp: 2017-09-19T04:45:27Z name: spring-app-config namespace: default resourceVersion: "53471" selfLink: /api/v1/namespaces/default/configmaps/spring-app-config uid: 5bac774a-9cf5-11e7-9b8d-080027da6995
Modifying GreeterController

Modifying GreeterController

package com.redhat.developers; import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController; @RestController
@Slf4j
public class GreeterController { @Value("${greeter.message}") private String greeterMessageFormat; @GetMapping("/greet/{user}") public String greet(@PathVariable("user") String user) { String prefix = System.getenv().getOrDefault("GREETING_PREFIX", "Hi"); log.info("Prefix :{} and User:{}", prefix, user); if (prefix == null) { prefix = "Hello!"; } return String.format(greeterMessageFormat, prefix, user); }
}

Update Fragment deployment.yaml

Update the deployment.yaml to add the volume mounts that will allow us to mount the application.properties file under /deployments/config.

spec: template: spec: containers: - env: - name: GREETING_PREFIX valueFrom: configMapKeyRef: name: spring-boot-configmaps-demo key: greeter.prefix volumeMounts: - name: application-config mountPath: "/deployments/config" readOnly: true volumes: - name: application-config configMap: name: spring-app-config items: - key: application.properties path: application.properties

Let’s deploy and access the application like we did earlier, but this time, the response will be using the application.properties from our ConfigMaps.

In this Part 1 of our blog series, we saw how to configure Spring Boot on Kubernetes with ConfigMaps. In the Part 2, we will see on how to use Kubernetes Secrets to configure Spring Boot applications.

Original Link