About the Company

Health Care company – New Jersey’s priority is improving population health by strengthening New Jersey’s health system. The Department’s five branches, Public Health Services, Health Systems, Integrated Health, Office of Population Health and the Office of Policy and Strategic Planning work collaboratively toward that goal. Population health focuses on keeping healthy New Jerseyans well, preventing those at risk from getting sick, and keeping those with chronic conditions from getting sicker. Population health promotes prevention, wellness and equity in all environments, resulting in a healthy New Jersey.

The Challenge

The company had the immediate need for the implementation of enterprise data lake services for the COVID-19. The data analytics solution would have helped them in order to do efficient contact tracing, Vaccination and Immunization for COVID-19 and also provide reports (public and internal) for overall performance. The data lake solution would then be used for other sources and use cases in future. The system had to be highly secure and highly available as is dealt with critical Covid-19 related data. Reports generated using this information was used by the New Jersey Governor to provide necessary information for the Press and News agencies.


Kapstone proposed and implemented a solution using Amazon Web Service (AWS) which would meet all the needs of the company. Various third-party data sources were integrated using services like API Gateway, Lambda, Kinesis, S3, EC2 etc. S3 buckets were designed to accommodate various data set types (viz. JSON, CSV and Parquet) and schemas and Glue Crawler were developed to parse through data and generate logical schema. The data was then reported from various dashboards using Tableau which used the data from S3 via Athena. In a scenario where data had to be pulled from Azure Storage, an EC2 instance was spun up in the private subnet. This EC2 instance has RClone setup which pulls Gigabytes of data every hour from Azure. The data is processed within EC2 and is pushed to S3.
Any solution provided to Health Care company has to be HIPPA compliant. To achieve the necessary compliance and security, least access privileges were given to end users of the data lake. We used AWS Secrets Manager to hold all endpoints, username and password for third party applications in a secured way. Set up a monitoring using AWS CloudWatch which helps in investigating through services logs in case of issues. The data within the S3 bucket and the environment variables used for the Lambda functions is encrypted using AWS KMS. Tableau uses designated IAM user with limited access to query Athena.

The Benefits

As a result, a very secure and highly available data lake solution was built to handle near to real-time data. Serverless services helped in order to reduce management overhead. The process of reporting was automated to reduce the work of business users. Implemented serverless and independent services architecture with robust and scalable solution to handle bulk load for new data.