About the Company
The Lending firm addresses gaps in the financial landscape by offering loan solutions specifically designed for individuals with poor or limited credit histories. Catering to those who face challenges with traditional banking-such as individuals with lower credit scores or those needing quick access to funds-the firm promotes financial inclusivity and extends support to a broader range of higher-risk consumers.
Background of Business Problem
- The client sought to modernize the current data ingestion, ETL pipelines and business intelligence reporting data. Traditional data warehouses can struggle with scalability, particularly as data volumes grow. Scaling up can be costly and time consuming. Performance can degrade with large datasets and complex queries. Indexing and optimization are necessary but can be labor-intensive. Performance tuning, adding indexes, and optimizing queries can help but require significant expertise.
Our Approach & Solution
Infinite proposed solution using serverless spark to offload database computation to spark, this will significantly improve overall performance.
After evaluating multiple solutions like OCI Dataflow, Snowflake, Databricks – we converged on databricks.
We used a more evolutionary approach on the ingestion side, by using ADF to orchestrate the databricks notebooks. We also proposed workflows, but agreed with the client that we focus on the core ETL flows.
We reverse engineered the existing legacy SSIS packages and maintained a similar bounded context, however, completely offload most computation from the database query to the spark Dataframes.
The proposed Medallion architecture provided clear egress points for data and specifically for the existing BI stored procedures. However, the client did not want to reauthor the existing BI stored procedures to work with Databricks SQL Analytics. We ended up copying the data to a Synapse Endpoint so the BI stored procs could execute in a backward compatible mode.
Business Outcomes
The solution provided a convergence of ETL, data science and BI workflows. Infinite’s proposed solution helped to unify structured and unstructured data processing, enabling comprehensive data assessment models.
Here are some of the key performance metrics:
- Resilient pipeline processing, checkpointing help pipelines recover much faster 3x. Monitor the number of job failures to identify stability issues and areas needing debugging.
- Business data generation saw a 2x multiple improvement.
- Metrics and tracing in databricks notebooks helped on call engineer with key debugging pointers. It also helped setup alerting for on call.
- We were able to provide the customer with cost per job query for various data sources like Salesforce, loan management platforms.
- We also provided data lineage for the customer, which was needed for various compliance initiatives.
- Data processing and quality checks helped the customer with their KPI evaluations.