Terraform Design Pattern – multiple providers

Having gotten to grips with terraform over the past few months, there’s an awful lot I’ve learned from the multitude of useful posts and books published on the subject, but also from getting firsthand experience.

One of the design patters in terraform that has piqued my interest was the use of multiple providers, as given in the aws_vpc_peering_connection_accepter example in the terraform documentation.  Though fairly lightly covered in the documentation, it does lend itself to a good many use cases.

The most popular one being the ability for the provider to assume a different role (even across different accounts) to setup and manage resources.  I’ve found this incredibly useful as a design pattern, as it also allows for finer-grained permissions management of such resources.  With the ability to call dynamic applications to provide creds to the AWS provider, using the credential_process parameter, it will no doubt be plausible to make sure that the only resource allocation in AWS is managed by your master terraform configuration.

How do you use multiple providers in terraform?

Hello AWS, my old friend.

For the past few months I’ve been working back on Amazon Web Services (AWS) – trying to remember all the knowledge I curated during the earlier years of my career.  It’s been really interesting to see quite how far AWS has developed since I first used it nearly a decade ago.  Whilst I’d seen the headline stats published at the start of the ACloudGuru training course; it’s still a massive shock seeing quite how many services AWS now run out of the box!

Having now spent two months back in the AWS world, one of the things that really struck me was my dependency on other AWS engineers.  Whilst my sweet spot is designing the systems at the architectural level – actually getting someone who understands the lower level intricacies of AWS is vastly important.  The construction analogy is massively overused, but it really is congruent with a building architect working with an engineering team to make sure the whole thing doesn’t fall down.

One of the most frustrating things for me, was that some of the stuff that I’d consider ‘simple’ still have a relatively steep learning curve, and there’s a significant paradigm shift from network to host-based security.  What this means in reality is that the IAM (Identity and Access Management) module should be your first port of call.  Operating with the principle of least privilege is absolutely beautiful.  I’d recommend that any old school SysAdmins really do need to go and sign up to ACloudGuru or Udemy to get clued up on the implications.

The next ‘hidden’ gem on AWS for me has been the EC2 parameter store; in conjunction with IAM roles and lambda.  I do need to write a more detailed post on my setup once I’ve validated it’s not too heavily over engineered – but the combination of KMS keys, IAM roles, lambda (to run a simply random password generator), and EC2 parameter store does give me a warm glowing feeling inside.  Setting something like this up 10 years ago was a feat of engineering and relative fragile (it wouldn’t survive a reboot!) – I really like the cleanliness of this new approach.

The one thing that does bother me is competition.  I need to make some time to work with Azure to replicate the AWS environments on a competitor cloud (with all the support trimmings) – but also investigate minimising the barrier to exit from any cloud platform to running bare metal.

Facebook “We’re not listening.” Overreach?

Facebook are currently disputing the idea that they listen in on real-world conversations through over-reaching it’s permissions to access the microphone. Yes – users want the apps to access the microphone when actively recording video or using audio – but not when having a private conversation with someone in the ‘real world’ – with the phone off.

The biggest issue I have with this is the denial. They explicitly denied listening via the core Facebook app; and also their messenger app. However, we’re all using a variety of additional apps curated and managed by Facebook.

  • Facebook.
  • Messenger.
  • Whatsapp. (need microphone for audio/video calls)
  • Instagram. (needs microphone for video)

It’s not practicable to manage access or revoke access based on individual use. What’s required here is for iOS/Android to provide better UX around which apps are accessing which features each time they’re accessed. In the same way we have the battery monitor, the OS providers should be providing an audit log of what exactly your phone has been doing.

I’m not sure what privacy campaigners are calling it yet – I need to have a read up and familiarise myself.  I’d call it overreach. We should be protected from such overreach; and the OS creators need to provide better tools by default, rather than requiring rooting and/or technological expertise to understand what a device that you’re paying for and is with you for practically 24 hours a day is doing with your data.

Back to the DevOps world (temporarily)

For the past few years I’ve been kept away from lower-level coding and ops through being employed to fulfill quite different functions. It’s all part of a wider career ambition, so I can’t fault the fact I’ve not been down and dirty with the code – but coming back to it has been quite a shock (in a good way).

I left full time SysAdmining towards the end of 2011, with puppet becoming mainstream, and containers becoming a thing (but with OpenVZ, rather than docker at that stage) – at least for me.

In the current brave new world I’ve been introduced to terraform. It’s definitely firmly towards the ‘dev’ end of the DevOps spectrum; with a fairly low barrier to entry, but an obvious need for structure, design and adherence to best practices.  Kudos to those guys publishing their modules and sharing their ideas, especially the modules already up on the terraform registry. It’s also quite fun to be working on something pre V1.0.0 (love SemVer!) and with progress really yet to be made on tooling.

Two things have really helped me get up to speed quickly with it, and a big shout out has to goto GruntWork.io for their part.

Here’s my learning so far:

1) Follow the Getting Started guide and get some stuff up. Did this in my personal account, on the free tier – then pulled it all down. Neat!

2) Did a few bits of hacking, and put together a module for an Airflow service.

3) Got into work early one morning to get 45 minutes to properly watch this awesome video on reusable terraform modules. It’s wonderfully articulated, and the guy presenting has a great and relaxed style.  . (cheers Jim!)

4) Did a bit more hacking and discovered terraform env is deprecated.  Hmmm.

5) Lastly, (and only tonight) I bought “Terraform, up and running.” for my kindle and read it cover to cover (despite the fact I only had three hours sleep last night). Though it’s a little bit out of date already (v.0.8, whereas we’re now on v0.10), it covers off some really good concepts, and shares some best practices around working in larger teams.  Hopefully soon I’ll be posting on what new concepts introduced since v0.8 I’m finding useful.

Let me know if you’re using terraform, and any tips or problems you’ve faced.

Apache Nifi setup on Ubuntu

I feel a bit like I’m going back to my roots with this blog post.  Technical documentation on how I’ve setup a relatively niche product, with an opinionated stance on how it should be setup.

What is Apache Nifi?

Apache Nifi is an easy to use, powerful, and reliable system to process and distribute data.  It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. See http://nifi.apache.org/ for more info.

What’s the context of the setup?

I want to wrap a control process around some of the disparate lambda tasks that I have triggered by S3.  I need a control function for reporting & monitoring, as well as something relatively simple to use to allow support workers to identify and make adjustments upstream from the ‘critical path.’  I reviewed Luigi, Airflow & Nifi as options, and ended up picking Nifi due to the speed & flexibility of usability.

Docker (ECS) v. Barebones (EC2)

The first decision was on the deployment model for Nifi.

At this stage the bounding resource for the use case that I’m pushing through Nifi is likely to be memory, as some of the files passing through the process may reach up to 5GB in size; that said, I hope that the majority are going to be closer to the 100MB end of the scale.  For initial discovery, I’d used a well made docker image mkobit/nifi, to stand up and validate Nifi.

Unfortunately I found a gotcha with the docker setup (having not been “down in the dirt” for a few months) and either docker’s changed it’s default behaviour, or the persistence model has changed.  Either way, without explicitly defining a volume for /opt/nifi/conf – upon container restart I’d lost all my configuration.  The good news is that this allowed me to validate how simple Nifi was – within 15 minutes I’d been able to rebuild my model from scratch, having now familiarised myself with a number of Nifi standard processors.

I therefore took the decision to build a barebones installation of Nifi on top of the latest Ubuntu LTS release.  There’s no official package; so I’ll be relying on the releases directly from nifi.apache.org.

Initial installation

Once you’ve got your vanilla LTS release stood up, add the following to the end of /etc/security/limits.conf
* hard nofile 50000
* soft nofile 50000
* hard nproc 10000
* soft nproc 10000

The documentation mentions you may also need to edit /etc/security/limits.d/90-nproc.conf – however that is not required on Ubuntu.

You’ll also need to allow additional tcp sockets, as that’s how your flow will communicate with each other.  Once we’ve got our flows setup a stable, these should be monitored on a production environment to make sure that capacity levels are set correctly.

In /etc/sysctl.conf add the following line:

net.ipv4.ip_local_port_range=10000 65000

I mentioned earlier that the process is likely to be memory bound.  For performance reasons (and to avoid disk i/o becoming a bottleneck) we’ll need to set the swapfile to be disabled so that we don’t end up with weird behaviours.

In /etc/sysctl.conf add the following line:

vm.swappiness = 0

The final implementation key will be looking at the overhead of things like the core notification time on files.  Assuming that this instance is tailored specifically for Nifi (which it should; 1 server per role and all that jazz) – we can disable access time logging by setting /etc/fstab and adding noatime to each of the volumes that will have high input/output for Nifi.  Again, I’d probably recommend putting /opt/nifi in it’s own volume so the scope is clear (and you keep the atime logging for your generic system files & logs).

Package dependencies

Nifi requires a JRE, so run the following to get OpenJDK version 8 JRE:

apt-get install openjdk-8-jre

Installing Nifi

To install Nifi in the base system, I’m going to home it in /opt/nifi/.

Download the latest binary from https://nifi.apache.org/download.html – which for this tutorial is 1.3.0.

wget http://www.mirrorservice.org/sites/ftp.apache.org/nifi/1.3.0/nifi-1.3.0-bin.tar.gz

Then extract the file:

tar zxvf nifi-1.3.0-bin.tar.gz

Once that’s done – let’s add our new nifi scripts to our PATH environment variable:

echo 'export PATH=$PATH:/opt/nifi/nifi-1.3.0/bin' >> /etc/bash.bashrc

You’ll need to reload the shell; run /bin/bash to do so (or simply close and reload a new session).

We can now install nifi as a service in Ubuntu; by running nifi.sh install.  This will make monitoring slightly easier, and allow us to start the service on boot.  Unfortunately there’s a bug in the service script which causes the following error when trying to add using systemctl:

systemctl enable nifi

insserv: There is a loop at service nifi if started.

Instead you can also start nifi without installing a a service by running:

nifi.sh start

The next post will be on configuring nifi.properties and setting up a certificate store for client-side SSL authentication.