Step-3: IoT to Cloud Integration

Now the data redundancy set and the data is secure. Fault tolerance of up to one disk in each RAID 1 array in the nested RAID 10 array is also achieved. This is good as far as there is no system wide outage like a natural disaster or in case of total hardware failure. If there is a total system failure, then the data is not redundant even if we implement RAID as the whole data is lost in case of a total system failure. To avoid this issue another layer of data retention is added and that is cloud backup.

Selecting best Cloud Storage Solution

There are many options when it comes to backing up the data to a cloud drive. Some of the popular ones include Google’s G-Drive, Apple’s iCloud, Microsoft’s Onedrive, etc. These are good with their own set of advantages. But the data is not that secure when it is stored in these online drives as they are not well secured by the provider. Also there is a fixed rate for a fixed amount of storage capacity used. For example, when we take an example of google drive, after the free storage of 15 GB, user has to pay a premium of 210 rupees per month to use 200 GB of extra storage space. This is costly and is not efficient or scalable which is not suitable for all the users. If some user wants to store a total of 65 GB on google cloud, he already has 15 GB of free storage space. Now to store the remaining 50 GB of data, he needs the purchase the next lowest available plan (i.e., 210 rs/month for 200 GB of data). So the additional 150 GB is not being used by the user but he has to pay for that empty space as well as there is no scalability offered by the provider.

All the cloud storage providers are the same. No cloud storage provider provides dynamic storage options and that is a huge deal breaker for most of the people. S we can make use of the AWS resources to store the data in it. While selecting the AWS storage service for this application, we should consider the following points:

The solution must be cheap
It should have dynamic behaviour (storage size should change based on the users needs)
The service should be cheap but should be better in terms of retrieval based on the application of the user.

Comparing all the available services in AWS for storage to select a better solution for the application. The storage services offered by AWS are s follows:

Service	Abbreviation	Description
EBS	Elastic Block Storage	It is a network drive which has to connected to an EC2 instance (can only be connected to one instance at a time)
EFS	Elastic File System	Managed NFS Storage from AWS (can be connected to multiple AWS instances)
FSx	-	Managed third-party file storage service with better performance (has three types)
S3	Simple Storage Service	This is a object-based storage service offered by AWS and it has multiple tiers based on the application type (independent of AWS EC2 instance)

Table: AWS Storage Services

AWS Elastic Block Storage, Elastic File System and FSx cannot be used as they have to be connected to an EC2 instance to work and they cannot be connected directly to the Raspberry Pi in local network. So the only option we are left with is AWS S3. There are other storage services available from AWS and they are also not fit for our application. And AWS S3 is cost efficient compared to other services for the current application of just backing up data to cloud.

AWS S3 or Simple Storage Service is a object based storage service offered by AWS. AWS S3 is a scalable, highly available storage service from AWS which has many features such as store unlimited amount of data (scalability), has 99.99999999999 (11 9’s) durability and ensures high availability, various security features like Access Control Lists (ACLs), bucket policies, encryption at rest using Server Side Encryption with KMS or Client Side Encryption, with IAM policies for additional security. AWS S3 also provides data management tools like life-cycle policies which will enable us to change the tier of the object based on the set “expiration rules” or “deletion rules”. Each of the several available tiers in AWS S3, each of them has its own application and based on the application we are using we are going to change the S3 storage tier. The tiers present in AWS S3 are as follows:

	Durability	Availability	Retrieval Charge	First Byte Latency
S3 Standard	99.999999999% (11 9’s)	99.99 %	N/A	ms
S3 Intelligent Tiering	99.999999999% (11 9’s)	99.9 %	N/A	ms
S3 Standard -IA	99.999999999% (11 9’s)	99.9 %	per GB retrieved	ms
S3 One Zone - IA	99.999999999% (11 9’s)	99.5 %	per GB retrieved	ms
S3 Glacier Instant Retrieval	99.999999999% (11 9’s)	99.9 %	per GB retrieved	ms
S3 Glacier Flexible Retrieval	99.999999999% (11 9’s)	99.99 %	per GB retrieved	Minutes to hours
S3 Glacier Deep Archive	99.999999999% (11 9’s)	99.99 %	per GB retrieved	Hours

Table: Details of all the available AWS S3 tiers

Storage usage	Price
First 50 TB/Month	0.023$ per GB = 1.91 rupee/GB
Next 450 TB/Month	0.022$ per GB = 1.832 rupee/GB
Over 500 TB/Month	0.021$ per GB = 1.748 rupee/GB

Table: AWS S3 Standard Tier Pricing

S3 Storage Tier	Price
S3 Standard - IA	0.0125$ per GB = 1.04 rupee/GB
S3 One Zone - IA	0.01$ per GB = 0.8327 rupee/GB
S3 Glacier Instant Retrieval	0.004$ per GB = 0.3331 rupee/GB
S3 Glacier Flexible Retrieval	0.0036$ per GB = 0.2998 rupee/GB
S3 Glacier Deep Archive	0.00099$ per GB = 0.0824 rupee/GB

Table: Pricing of AWS S3 - other tiers (comparison)

AWS S3 lifecycle policies automate the management of objects in S3 buckets. They enable transitioning objects between storage classes, setting expiration and deletion rules, and applying filters to specific objects using prefixes. You can also automate the deletion of non-current versions in versioned buckets. These policies help optimize storage costs, enforce data retention requirements, and streamline data management. You can monitor policy effects and track object transitions and deletions. Common use cases include cost optimization, compliance, data archiving, and data access pattern-based transitions.

Now there is a device in local network and there is a cloud storage service called S3 on the cloud. To connect them together and to enable file transfer from Raspberry Pi and AWS S3, a couple of tools are combined and a bash script has been made to make sure the process is as automated as possible. The bash script is as follows:

A tool called AWS CLI is first installed and configured on th raspberry pi using the access key from AWS Management Console. Before getting the access key from the AWS management console, a new IAM user is created with a IAM policy attached to it. The IAM policy only allows the access to S3 bucket and no other permissions. This is a tactic used in cloud which is called Least-Privilege Rule, which says that a user should only be provided with the required permissions or policies.

After the AWSCLI has been configured on raspberry pi, the next step would be to create a single time synchronization process between the raid mount point and the S3 bucket created for the backup. Then using a tool called S3FS, the synchronization is turned into dual side continuous synchronization which synchronizes the changes on the local folder into the AWS and vice versa. Now the data from the RAID mount point is been uploaded into the S3 bucket and changes made online are been replicated in the local RAID mount path.