Google Cloud Architecture - Secure Service Deployment

Notes from https://cloud.google.com/security/security-design/resources/google_infrastructure_whitepaper_fa.pdf.

"Service" = An application binary that a developer wants to deploy to the infrastructure.

  • Services are controlled by a cluster orchestration service called "Borg".
  • Infrastructure does not assume trust between any services.

Service identity, integrity and isolation

  • No reliance on firewalling or network segregation.
  • Ingress and egress filtering are used to prevent IP spoofing.
  • Each service has a service account identity.
  • Service has a set of cryptographic credentials used to prove its identity when making RPC calls.
    • Used by clients to ensure they're talking to the intended server.
    • Used by servers to limit access to data and methods to particular services.
  • Current and historic source code is held in a central repository that is fully auditable.
  • Code reviews:
    • Service binaries can be built from reviewed, checked-in and tested source code.
    • Require inspection and approval from at least one other engineer.
    • Code modifications must be approved by the owners of the system.
    • Provides a forensic trail from a service back to its source.
  • Isolation and sandboxing:
    • Used to protect a service from other services running on the same machine.
    • Techniques:
      • Linux user separation.
      • Language and kernel-based sandboxes.
      • Hardware virtualisation.
    • More layers used for riskier workloads.
    • Very sensitive services (e.g. cluster orchestration and key management) run on dedicated machines.

Intra-service access management

  • Service owner can specify which services (via service account identities) are allowed to communicate with it (e.g. they can whitelist certain services to access the API). This restriction is enforced by the infrastructure.
  • Engineers are issued individual identities so services can be configured to allow or deny their access.
  • All identities (machine, service and employee) are in a global namespace within the infrastructure. User identities are handled separately.
  • Identity management:
    • Infrastructure provides identity management workflow:
      • Approval chains
      • Logging
      • Notification
    • Identities can be assigned to access control groups.
    • Provides two-party control (an engineer can propose a change to a group that is then approved by a group administrator).
  • Services also provided to give services the ability to use ACL databases to implement fine-grained access control.

Encryption of inter-service communication

  • Infrastructure provides cryptographic privacy and integrity for RPC data on the network.
  • These benefits are also available to other protocols (e.g. HTTP) by encapsulating them into infrastructure RPC mechanisms.
  • This provides application layer isolation and ensures intra-service communications can remain secure even if the network is tapped or a network device compromised.
  • Services can configure the amount of cryptographic protection they want.
  • To protect against WAN-tapping, the infrastructure encrypts all RPC traffic between data centres without each service needing to configure it.
  • Hardware cryptographic accelerators are being used to provide encryption to all RPC traffic inside of the data centres.

Access management of end-user data

  • A service (e.g. contacts) can be configured to only allow requests from specified other services (e.g. Gmail).
  • To limit the ability to access only a certain end-user's data, the infrastructure allows a service to present an "end user permission ticket" to prove the service is servicing a request on behalf of the end user.
  • These permission tickets are provided via a central user identity service.
  • Central user-identity service:
    • Verifies an end-user login.
    • Issues a credential (e.g. cookie, OAuth token) to the user's device.
    • This credential is required to access the user's data.
  • Use of end-user tokens:
    • When a service receives the token, it passes it to the central identity service (CIS) for verification.
    • If verification is successful, the CIS provides a short-lived end user permission ticket that can be used for RPCs related to the request.
    • For any cascading calls, the ticket can be handed-down the service as part of the RPC calls.