The continued and exponential growth in business is cognizant and proportional to the compounding demand for scaling infrastructure on a constant basis. As we determine the pace of scale, the significance of keeping up the supply and demand of security tends to remain crucial, all for the growth factor. In our quest to achieve the same here at CRED, we explored various cloud native security tools, integrated third party tools, set up different tooling and a variety of controls to ensure overall security. With time though, we realized that all these tooling and processes will only help us to the limit of what we know.
And that brings us to the important question:
What about the things we don’t know?
“Security through obscurity is the belief that a system can be secure as long as nobody on the outside understands what’s going on the inside”. But what if even the insiders don’t know what’s going on?
And that is what prompted us to take a step back and think about solving this holistically. What we wanted to achieve was clear for us – Creating a security visibility strategy.
To combat the above problem before it becomes a legacy issue, we started working on what we call the “the three essences of Cloud Security”.
Peanut butter of our Cloud Security
Let’s start with the first two essences of security – logging and monitoring.
Logging everything without monitoring is just like a ship without a rudder: you will not be able to make sense of log data unless you log what you have monitored. Logging provides visibility into what malicious events are flowing on each of the infrastructure components, what’s the cause of incidents, and helps drill deeper into security incidents. We started logging everything to ensure that we could answer the 3 W’s:
or more specifically,
- What’s the event being logged, the event name?
- Who has created the event i.e the offender?
- When does the event take place i.e the time frame?
Tracking logs and events, and monitoring it for unusual or suspicious activity completes the combination.
How do we go about our “butter”? – Monitoring malicious traffic
Sponsoring India’s biggest cricketing event meant bracing ourselves for huge traffic. And with huge traffic comes the fair share of malicious attackers.
So it was imperative for us to monitor any malicious traffic hitting our applications. We could do this thanks to the Cloudfront Logging provided by AWS, on top of which we built our own monitoring and alerting system. We extensively used open-source options like the ELK (Elasticsearch, Logstash, and Kibana) stack for our complete infrastructure visibility with Elastalert for effective alerting. This gave us a great insight into what actually hit our servers, what kind of traffic we are receiving, whether there are any IOC in our infrastructure, etc while also allowing us to aggregate, store, search and visualize the log data.
We have several use cases penned down for which we continuously monitor our logs.
- Top and bottom URLs being hit on our server with 3xx/4xx response codes – Indicates someone might be trying Layer7 attacks on our infrastructure.
- Top IP’s hitting us – Our geolocation enrichment services gives us an idea as from where we are getting the most hits. Combining this with our IP threat enrichment service, we can easily find out if it’s an anonymous/malicious or blacklisted IP.
- Top HTTP methods being used in the requests – Could indicate an anomaly if there are high number of HTTP verbs which we don’t see with regular traffic.
- Average no of bytes served/received – Gives us a baseline for average number of bytes that goes along with our requests.
- Top URLs along exceeded the IP rate limit – A possible indication of a DDoS attack or API bruteforcing.
- User agents hitting us outside the app – User agents can indicate any automated scanners used against us and it also tell us different kinds of clients who are trying to connect to us other than our regular app users.
Any anomalies in the above metrics can indicate an incident.
- Huge app traffic in a short timeline – Indicator of a possible attack.
- The average number of bytes being sent gets increased abnormally – Indicates someone might be sending malformed data.
These are just a few basic cases that we built; any anomaly is sent out as an alert with GeoIP/threat enrichment and meta IP information.
Security in the Cloud
As much as we care about what’s going on in the cloud, we also put efforts to detect and prevent unusual activity in the cloud infrastructure. AWS provides extensive logging features in terms of AWS CloudTrail that enables governance and compliance and helps in operational and risk auditing of AWS accounts. With AWS CloudTrail, we were able to discover and troubleshoot security and operational issues by capturing a comprehensive history of changes that occurred in the AWS account within a specified period of time. We have also created in-house security and audit automation scripts that continuously look for any unusual activity in our AWS infrastructure.
We knew that many of the use cases could have been covered by AWS config rules but keeping the cost in mind and our power to build things in front, we chart down use cases to look out for abnormal and unauthorized activity in our cloud infrastructure and build them on our own. Below are some of them –
- Continuous monitor AWS Asset – Asset monitoring becomes a critical aspect not just from a cost perspective but also for detecting any unauthorized activity performed to create or delete the instances. We continuously monitor the asset landscape and any unauthorized changes detected are immediately alerted.
- Keep an eye on DNS Misconfigurations – Vulnerabilities like subdomain takeover or internal domains getting created over the public route are mostly due to misconfiguration. We are always monitoring all the changes taking place to DNS to detect any unauthorized changes.
- Prohibited Data Exfiltration – We are constantly monitoring if any object/bucket has been made public by mistake or if the bucket has any policy added which could lead to security risk and monitoring for any unauthorized exfiltration of data from S3 buckets.
- Detection risks and pitfalls of IAM – Our in-house tool continuously looks for any modifications made to IAM and immediately alerts us. All the activity pertaining to these users and roles are actively monitored.
- The danger of wide open security group – Security groups are among the most important baseline building blocks in any AWS cloud deployment. Eg. Allowing incoming access by opening up ports for 0.0.0.0/0 in security groups is the most common mistake made by professionals when provisioning resources. We have setup continuous monitoring and alerting if there is a Security Group and the ingress or egress rules change, it automatically sends notifications to SNS subscriptions, alerting our security team.
Ingesting AWS CloudTrail events into our log management and analytics solution has certainly enabled us to monitor and track a large set of network events and behaviors and target different distinct use cases.
As we started seeing effective results, we went about developing and covering more space and strengthening our logging and monitoring system. To name a few, S3 bucket access logs, Database audit logging are a couple of new entries to our stack. Monitoring S3 bucket access logs and database audit logging helps us find if any unauthorized event is made.
As we move forward, it makes our realization stronger that it’s possible to implement and monitor the entire spectrum of control areas: ranging from network controls (including firewalls and intrusion detection services) to continuous vulnerability scanning.
Another use case we started building upon was monitoring our VPN traffic since the VPN instance is in DMZ and is exposed to public traffic. Our ELK stack is in the picture again!
We set up the logging solution to monitor for any network intrusion activity and used Suricata as the NIDS system. It is an open-source network threat detection engine that provides capabilities including intrusion detection (IDS), intrusion prevention (IPS), and network security monitoring.
The reason to use Suricata was that it does extremely well with deep packet inspection and pattern matching which makes it incredibly useful for threat and attack detection. It has its own default rules set and on top of it, we built our own to detect all threats.
Monitoring VPN traffic using Suricata IDS
We set up AWS VPC traffic mirroring, which helps us monitor network traffic, analyze traffic patterns, and proactively detect malicious traffic. VPC traffic mirroring makes it much easier for us to monitor network traffic within our AWS VPCs. Some of the benefits it provides are –
- Detect Network & Security Anomalies – You can extract traffic of interest from any workload in a VPC and route it to the detection tools of your choice. You can detect and respond to attacks more quickly than is possible with traditional log-based tools.
- Gain Operational Insights – You can use VPC Traffic Mirroring to get the network visibility and control that will let you make better-informed security decisions.
- Implement Compliance & Security Controls – You can meet regulatory & compliance requirements that mandate monitoring, logging, and so forth.
All the logging and monitoring in the world won’t be helpful if no one is watching to see if there are problems! This is where alerting comes in.
Jelly is everywhere – Alerting
We are a slack-go-to team, so our logging and monitoring platform publishes and sends out alerts to slack as and when any unusual event is monitored. For achieving this, we have integrated Elastalert (It’s a simple framework for alerting on anomalies, spikes, or other patterns of interest from data in Elasticsearch) in our ELK stack, which not only alerts us but provides complete insights for the action items.
The complete stack has helped us tie together monitoring, vulnerability scanning, and control plane visibility along with alerting to create a complete continuous monitoring strategy.
With the proper visibility in place through logging and monitoring, along with large-scale analytics and data processing tools and capabilities, we can now track and monitor both control plane activity and threats from both internal and external sources over time. With a more complete picture of behavior, we are now able to detect malicious, suspicious, and accidental/unintended actions and events.
As we scale and bring in new team members and technologies into the system, we will have to strengthen our practices to ensure they hold up to the demands. Since security is a continuous process, we continue working towards strengthening our infrastructure and apps’ security through tools and processes.
In the following weeks, we intend to capture our internal processes in more depth. Give us a follow to get updates as soon as we post them.
Thanks to Rashid Feroze, Govind Menon, and Avinash Jain for continuously making CRED secure.
- The Peanut, Butter, and Jelly in Cloud Security - December 2, 2020
- How to build a security-first culture: lessons from CRED - October 9, 2020