Pre-Screening index

Company:

Coinbase

Role:

Associate Engineer, Compliance Technology

Project description:

Context: As a leading cryptocurrency exchange, Coinbase must diligently prevent sanctioned individuals from using its platform. This requires screening users against regulatory lists. Unnecessary costs are incurred by screening multiple users with the exact same name.

Solution: We developed a Pre-screening Index, a sophisticated data processing layer that intelligently identifies duplicate screening cases. By categorizing users and leveraging previous screening results, we significantly reduced redundant API calls, streamlining the process and cutting costs.

Technology:

Airflow, Python, SQL

Responsibilities and Achievements:

  • Saves the company .5M dollars per year by decreasing screening API calls by 34%. Savings will increase annually as the customer base grows.

  • Became the lead developer under the supervision of the engineering manager.

  • Formulated and executed system decisions.

  • Designed and executed end-to-end testing. 

  • Designed and implemented system health monitoring.

Technical design:  

  1. “Scheduler” instructs Airflow on the cadence and order to run the operations. 

  2. “Extractor” pulls from various data sources in Snowflake to 1) Identify all “in scope” users 2) categorize users as a “root” (first user with a specific name), or a “leaf” (user with the same name as an existing user).  

  3. “Disseminator” pulls screening information for “roots,” identifies if there are any new screening results which have not been processed yet, and disseminates root results to the leaf users.

  4. Results are further processed in downstream systems.

Challenges: 

The project was designed to use data sources in Snowflake, which update a specific number of times daily from the original data sources, thereby impacting the immediacy of data updates. I worked with several other engineering teams to increase the cadence of the updates to meet the business requirements. This experience highlighted the trade-offs between system complexity and data accuracy.