11min

Overview

Welcome to the docs for CloudGraph, the universal GraphQL API for AWSAzureGCP, and K8s - query resources, relationships and insight data to solve security, compliance, asset inventory, and billing challenges. These docs are currently a work in progress but should be enough to get you started with the major AWS, Azure, GCP, and K8s services.

example queries
example queries

Why CloudGraph

AWS, Azure, and GPC have done a wonderful job of building solutions that let engineers like us create systems to power our increasingly interconnected world. Over the last 15 years, products such as EC2, S3, RDS, and Lambda have fundamentally changed how we think about computing, storage, and databasing.

With the proliferation of Kubernetes and Serverless in the last 5 or so years, cloud services have become increasingly abstract on top of racks of physical servers. To end-users, everything on the cloud is just an API, so we don't necessarily need to know how Lambda Functions or EKS work under the hood to be able to use them for building applications. With a little documentation, API or console access, and a tutorial anyone can pretty much create anything they need.

These abstractions have led to massive improvements in the overall convenience and breadth of CSP service offerings. What was once a painstaking, time-consuming, and error-prone process of provisioning new servers, databases, or filesystems can now be done in seconds with just the click of a button or deployment of IAC. Since everything is just an API abstraction, when a CAP is ready to introduce a new "product" they simply need to expose a new API - yes, I'm of course simplifying slightly :)

Anyone familiar with the CSPs knows that service APIs are almost always split into modular namespaces that contain dozens, if not hundreds, of separate API methods for single resources. For example, the AWS EC2 service contains over 500 different API methods, with new ones added occasionally. Any company building substantial systems on a CSP is likely using many, many different services.

While a masterpiece of datacenter architecture, this choice of hundreds of services and configuration options put the burden of knowledge on how to properly use these services squarely on us engineers. As a result, we find ourselves having to constantly stay up to date and learn about all the service offerings or new changes. This takes a significant amount of time and mental energy. As developers, it can be difficult, time-consuming, and frustrating to use the AWS CLI to make 5 different API calls to describe, as an example, an AWS ECS cluster, its services, task definitions, tasks, container definitions, etc. We often find ourselves lost in documentation and having to use half a dozen of APIs to get answers to questions like "What exactly is running in this VPC?"

This means that AWS, Azure, and GCP can feel overwhelming quickly even to seasoned cloud architects. While the CSPs are fantastic at building the actual services that power our businesses, not a lot of headway has been into simplifying the day-to-day UX of querying these hundreds of services in a sane manner.

New solutions like the Cloud Control API for AWS have attempted to create a standardized interface for querying many different types of AWS resources. Unfortunately, the Cloud Control API's usage is severely limited, and users still need to know how to correctly query their data. This means more time spent reading documentation and understanding how services work and are related to one another.

While the modularity of the CSP APIs is a great logical organization system and does make sense, it's a burden on end-users in terms of the cognitive overhead and learning curve. Having to remember how hundreds of constantly changing services work and are connected leads to a caffeine addiction and time wasted playing detective.

Wouldn't it be great if we as DevOps/Cloud engineers had a simpler way to get our data out of AWS, Azure, GCP, and the others? One that reflected our need to easily query any data about any service in any account without having to spend hours on docs or stack overflow?

It is for these reasons that we built CloudGraph, the GraphQL API for everything cloud. CloudGraph extracts, normalizes, processes, and enriches your cloud data allowing you to access deep insights across multiple providers effortlessly. Check out our blog post The GraphQL API for everything to learn more.

How It Works

Note that CloudGraph requires READ ONLY permissions to run and as such can never mutate your actual cloud infrastructure. Additionally, none of your cloud environment information is ever sent to or shared with CloudGraph, AutoCloud, or any other third parties.

Under the hood, CloudGraph reaches out to your cloud provider(s), sucks up all of the configuration data, processes it, and stores a copy of this data for you in Dgraph. It then exposes an endpoint at http://localhost:8997 that allows you to write GraphQL Queries against your stored data. These queries not only allow you do to anything that you would do with say, the AWS SDK/CLI, but they also allow you to run much more powerful queries as well. CloudGraph ships with pre-packaged GraphQL query tools including GraphQL Playground and Altair but you can also feel free to use your own. It also includes a schema visualization tool called Voyager so you can understand relationships between entities.

Architecture

The CloudGraph CLI tool is built using a plugin architecture. These plugins provide all provider-centered functionality to the tool such as fetching data, enriching that data, and much more! All CloudGraph plugins are installed at runtime and therefore any user can create their own providers, policy packs, and other extensions to enrich their cloud provider data. Please refer to the contributing documentation for more details on how to create your own CloudGraph providers and plugins. Below is a simple diagram on the CloudGraph tool's architecture.

Document image



Updated 14 Feb 2022
Did this page help?
Yes
No