Developer Documentation
# Let's Data : Focus on the data - we'll manage the infrastructure!
Cloud infrastructure that simplifies how you process, analyze and transform data.
Vpcs
Connector Destinations such as AWS Kafka require setting up a Virtual Private Cloud (VPC). Let's Data automatically creates and secures the Vpc at write connector initialization and deletes the VPC when write connector is deleted. #LetsData provides self-service infrastructure to enable connectivity to these Vpcs.IP Address Management
The VPC IP Address management is done automatically by #LetsData, here are some details on how LetsData manages the IP addresses for the datasets.
- Each Vpc is assigned an IP address range (cidrBlock) from the Amazon's recommended private IP space. (We currently assign IPs in the 10.0.0.0 Ip address range).
- We allocate a defined IP Range for a tenant (10.X.X.X/21 cidr ~ 2000 IPs) (TenantIPRange). Each dataset allocates a dataset IP range (DatasetIPRange) from the tenant's TenantIPRange. Dataset resources are then created using IPs from this dataset IP range (DatasetIPRange).
- One important point to note is that if the customer wants to establish a VpcPeeringConnection to this dataset's Vpc, the customer Vpc's cidrBlock should not overlap with the Dataset's cidrBlock. Choosing a specific IPRange is not allowed at this time, we may allow users to select a non overlapping IP Address Range in future.
In terms of how many IPs to allocate to each dataset, we allow users to specify a Vpc size (small, medium, large) in their dataset configuration. We currently support the following vpc sizes:
- small: This allocates a IP range with subnet mask set to 25 - this allocates around 128 IPs.
- medium: This allocates a IP range with subnet mask set to 23 - this allocates around 512 IPs.
- large: This allocates a IP range with subnet mask set to 22 - this allocates around 1024 IPs.
Network Architecture
Our standard vpc setup, as of now, is as follows:
- Private Subnets: Three private subnets, each in a different availability zone. The cluster resources are created in these private subnets.
- Public Subnets: Three public subnets, one in each availability zone. These are connected to 3 NAT Gateways (one for each availability zone) for connectivity to AWS resources.
- Lambda Connectivity: Elastic network interfaces are created for the Data Task Lambda function to connect to the resources in the VPC.
- Security Groups / ACLs: The security groups and network ACLs are currently configured to allow inbound / outbound TPC traffic on all ports.
- External Connectivity: No Routes are currently configured to route external traffic to the resources in the private / public subnets. When a VpcPeeringConnection is established, we allow inbound traffic from the peer Vpc for TCP on all ports. (In case you are testing connectivity, ICMP (ping) / UDP etc might not work. Test with TCP.)
- Customer can access the vpc by creating a VpcPeeringConnection to this Vpc.
VPC Peering Connections
Customers can establish VPC Peering Connections to #Let's Data Vpcs to access the resources in the VPC. You can learn about VPC Peering Connections at: AWS Docs - VPC peering basics
To establish a VPC Peering connection, we'll need a client VPC - this is a VPC that the customer has created. We also refer to it as the requester VPC. The client / requester VPC should be in the dataset's customerAccountForAccess AWS account. We'll use this VPC to connect to the #Let's Data Write Connector VPC that has resources that we need to access e.g. the Write Connector Kafka cluster.
Setup
Creating a VPC peering connection is essentially a two step proccess, the requester VPC (client VPC) sends a VPC Peering Connection request and the accepter VPC (lets data VPC) accepts the request. Here are the commands to setup VPC Peering Connection:
- Gather the #LetsData VPC Details: Get the dataset's #Let's Data VPC details. We'll use the LetsData CLI's vpcs list command and will need the vpcId, ownerId and cidrBlock from the output.
- Create a VPC Peering Connection Request: Create a VPC peering connection request using the following AWS CLI command. Save the vpcPeeringConnectionId from the output, we'll need it in the next step.
- Accept the VPC Peering Connection: Accept the VPC peering connection on behalf of #Let's Data by using the vpcs vpcPeeringConnections accept LetsData CLI command
- List VPC Peering Connections: List the VPC peering connections for a #LetsData VPC by using the vpcs vpcPeeringConnections list LetsData CLI command
- Delete a VPC Peering Connection: To delete a VPC peering connection for a #LetsData VPC, use the vpcs vpcPeeringConnections delete #LetsData CLI command