Continuous Security, the Next Evolution of Developer Velocity – thenewstack.io

Posted: October 15, 2022 at 4:28 pm

This is the second in a three-part series on continuous security.

In our previous post we outlined Jits philosophy behind continuous security, and how elite and modern engineering teams who will embrace this approach will actually increase velocity despite common misconceptions around security bogging down engineering.

Security is here to stay and needs to be embedded early. Weve learned this from shift left, which is already proving to not be early enough, and born-left security is now an emerging practice embedding security as early as the first line of code.

However, applying this in practice requires some realignment of process and practice. By identifying fixes that can be made immediately and prioritizing the rest, we can then create a good framework and baseline security posture to maintain and improve.

When we talk about continuous security, it also consists of a few pillars that will help bring security closer to engineering practices and unleash the true potential of developer-owned security.

These pillars include:

We spoke about this briefly in our previous post, but to unpack this more deeply, lets take a look at what elite engineering looks like, and what security can take from this approach. When we talk about the metrics that quantify elite engineering, there are two primary categories DORA looks at: speed and safety.

With the continued evolution of the attack surface and sophistication of exploits as the stakes get higher with each breach, safety must stop being an afterthought in engineering. While this is usually quantified in metrics such as change failure rate (CFR) and mean time to restore (MTTR), another extremely important focus gaining more and more centrality in engineering processes is security risk management.

But this begs the question: Why hasnt security been a native citizen in development processes until now despite all of the efforts to shift it left and farther left, and even make it born left.

This is because of the entire mindset of security is issue-driven, versus the engineering mindset is fix-driven.

Lets take a look at the DORA safety metrics, that are measured in CFR and MTTR. Even if you have introduced failure into your systems (where CFR measures how often this even happens), MTTR measures how quickly you can introduce the fix and restore your systems, which are largely regarded as metrics that define elite engineering teams. However, security tools today only introduce noise with the many issues they flag, and very few tools take a fix-first approach.

The most common OSS tools in use today are laser-focused on detection and much less on remediation. Even those that provide guidance for how to fix the issue, rarely point you to the exact line of code to fix.

Engineering teams focused on high velocity arent interested in what is wrong. Theyre interested in how to fix it when it has gone wrong (theyll reserve the what happened for the post-mortem or the sprint retrospective). Resolving the issue becomes the highest value in software delivery.

DevOps and automation have introduced best practices and eventually even playbooks that can be automated with the most common failures.

If we start with the first pillar of continuous security, security as code (SaC) is aligned with developer workflows and provides fixes to known problems throughout the coding process. Even more importantly, it also provides extensibility since security is codified and therefore programmable, which ultimately allows people to manage their own custom risks and processes.

Security orchestration is achieved by evolving remediation processes to be more automated, batch-oriented and simpler. Finally, continuous monitoring serves to ensure no new threats are introduced, and those emerging threats in running services are rapidly caught and mitigated.

While the fix-first mindset enables us to not introduce new issues, this doesnt negate the specific effort and resources you should dedicate to continuously reduce your security debt of existing critical issues. Which is the orchestration part of it.

By aggregating similar issues and processing them as a batch, you can achieve greater security efficiency and minimize your backlog more rapidly (many security products now take this approach.). Continuous monitoring and security orchestration go hand in hand, as once you are aware of existing problems and discover production issues, a good automation process will help mitigate these risks more rapidly than former processes.

This is similar to fixing a breach in a boat. You start by making sure to fix the hole before removing the water. Once you can nail down the mechanism for fixing new issues as they come up and instill a fix-first mindset, it is then possible to decouple this from the effort of fixing existing problems and automating this process as well.

Yet, this is basically where security fails. A laundry list of vulnerabilities simply doesnt do the job any longer.

While visibility and observability are the first steps to fixing failure, understanding how to actually follow through and fix issues rapidly will be the true measure of making security a first-class citizen in engineering and delivery, and then automating whats possible and prioritizing the rest.

Great, so hows that done?

Once we understand this fundamental mind shift, we can reverse engineer how to go about applying a fix-first approach to security for modern engineering processes where the primary goal is to ship code to production as rapidly as possible with security already embedded.

In our previous post, we discussed the three core pieces to making this possible: differentiation, prioritization and remediation. Below well take a deep dive on how to apply this in your engineering practices right away.

Lets talk about current security gating and where this can be optimized, automated or moved to the backlog when necessary. When we talk about continuous integration and deployment, the typical diagram includes Build >> Test >> Release >> Deploy.

To this, over the years, weve added layers for pre-coding, coding and post-deployment so this now looks like: Plan >> Code >> Build >> Test >> Release >> Deploy >> Run / Operate

To each one of these phases we have tried to embed security as seamlessly as possible, and this has had some successes and some failures.

One of the great successes of the DevSecOps approach was embedding security checks in a way that is code-centric, and in a place that in any case has other gates, during the pull request (PR) (with different controls for build, test and release).

This made it possible to include actionable security fixes into other code and bug fixes, while still in the same context of engineering that same piece of code. Its a method that has proven highly effective for embedding security into code early, and before merging and deploying code to production.

What has proven less successful are the ways that security vulnerabilities have been handled at runtime both at the level of the cloud provider infrastructure and the application. The common practice for this layer is to run these checks on a predefined schedule and alert the DevOps or cloud engineer to any issue (during the run/operate phase). This is completely decoupled from any engineering process, and often leaves this area in no mans land or opens the door to creating infrastructure drift. The same type of problem occurs with security findings discovered against the runtime application.

Once the code is deployed and running in production, tracking down the code owner is difficult and bringing them back into this piece of codes context, even more so. Infrastructure security issues that arise after the fact are a common contributor to infrastructure drift, as many times engineers prefer to make changes in the console or UI instead of through infrastructure as code. This would require the code to be redeployed through the regular pipelines and checks and adds other humans to the process.

The other half of it is even when debugging is done in production, many times due to the urgency of the fix, these changes also often bypass code gating as well. This also assumes that the fix is simple and detection as well, where in reality neither of these is true. Rarely is a solution provided as code, as these have the tendency to be error-prone and complex solutions. And even if it is, the fix is not always straightforward and easy, although more than ever there is a need for shift-left practices for runtime as well.

This commonly happens because after deployment there is no longer real clarity regarding ownership, and scheduled checks are decoupled from any ongoing engineering process. Therefore, if an alert arises, the engineer will want to deal with it as quickly as possible, and manual changes or drift will only be detected upon the next scheduled run. That can be hours, days or weeks away when another engineer is on call.

A good practice would be to move these checks and controls into the same code-centric gate the PR and ensure that at the very least they are caught upon first deployment to staging, so as not to reach production again while the engineer who wrote the code is still in context. This will make it possible to ensure that there are no alerts or issues with the infrastructure and runtime of choice.

To take this further, there are security measures we can take as early as the coding itself through in-IDE security alerts and pre-commit hooks to help embed security as early as possible into our products and systems.

The code-centrics aspects are the easy part. They are often already implemented in security-minded organizations, where each PR is viewed as a new security delta from existing code.

For the non-code centric changes, such as infrastructure changes, IaC and even config as code and to be even more accurate and clear, these include both changes in the code that lead to non-code issues, such as a change in IaC has some consequences in the infrastructure runtime, or a change in the application code that can lead to non-code issues in the runtime it is slightly harder. But it is not impossible to find a good framework for defining a baseline and to ensure this is maintained with every deployment or environment change.

Anything that doesnt fall into these two categories of immediately fixable issues are treated as backlog fixes, and go through the prioritization and remediation framework we define based on their severity, fixability and ability to be automated and orchestrated.

Examples Include:

These are just some of the parameters that affect our prioritization and decision-making around issues in our backlog.

Continuous security is possible by breaking down the formerly daunting domain of security into developer-centric language, tools, workflows and processes. By delivering security as code, the automation already possible in other engineering disciplines is also now possible in security. Once we identify the areas we can automate, its possible to achieve true security orchestration basically the automation of workflows and not simply one-off tasks. The final piece is to ensure we are constantly maintaining this baseline security posture we have defined and achieved, through continuous monitoring and grooming of our security backlog.

In our next post, well dive into how the adoption of this approach benefits CISOs as well and share the CISO perspective that demonstrates how formerly opposing views are now converging into a single worldview that has, until recently, been a source of friction and frustration in many engineering organizations. All of these together will be the enabler of engineering velocity, making it possible for CTOs and CISOs alike to deploy rapidly and with confidence.

Go here to read the rest:

Continuous Security, the Next Evolution of Developer Velocity - thenewstack.io

Related Posts