Sevatec delivers enhanced organizational decision-making capabilities and accelerated access to data through the implementation of next generation data management platforms, business intelligence services, and advanced data modeling.
Data Ingestion / Wrangling
- Extract/Transform/Load (ETL): Automation, model driven stream and batch processing enables timely access to information.
- Data Enrichment, Cleansing, and Entity Resolution: Integration and rationalization of data from disparate sources improves data quality and relevancy.
- Standardization and Governance: Enables data democratization and effective data sharing throughout organizations.
- Data Engineering: Maximize cloud investments and reduce total cost of ownership across the data architecture.
Business Intelligence (BI) and Data Analytics
- Descriptive and Predictive Analytics: Derive actionable intelligence.
- Artificial Intelligence (AI) and Machine Learning (ML): Discover new insight and knowledge throughout data.
- Visualization and Data Storytelling: Effectively communicate and provide mission situational awareness through visualization of anomalies in data.
- Geospatial Analysis: Layering correlated data to drive emergency preparedness.
Customer Challenges We Solve
- Poor data quality leads to inability to integrate and rationalize disparate data sources.
- Lack of standardization prohibits data sharing and limits the ability to propagate data across applications.
- Democratizing data without considering security increases operational risks.
- Legacy architectures limit an organizations ability to rapidly respond to ad hoc reporting needs.
- Inability to dynamically scale to meet increasing demand for data and IO resources impacts performance and data availability.
Sevatec’s data methodology exploits the scalable computing power of the Cloud to unlock the potential of our customer’s data, while reducing the overall total cost of ownership in a world in which data is measured in the petabytes. Our Data Ingest Mechanisms handle both batch and real-time streaming of data sources to efficiently pass incoming data for computation and advanced enrichment processes that consolidate, standardize and enrich data for further consumption. The Data Management Layer encapsulates data access, provides governance and security, and enables data propagation through ontology layers and APIs. Data repositories span a variety of persistent (AWS S3, RDBMS, noSQL) and non-persistent (Apache Spark memory clusters) technologies to provide cost-effective and dynamically scalable options. This methodology is supported by distributed process engines, Business Intelligence, Analytics and AI and ML tools that unleash the power of Cloud computing to enable advanced data science experiments.
CASE STUDY #1
What happens when PII doesn’t accurately identify a true individual due to data quality issues? Name changes, typos, transposed dates, and missing data can seriously limit an organization’s ability to aggregate and analyze a person’s individual data. This data is typically strewn across decades of information that resides in legacy systems, with potentially billions of rows of “PII” data. The ramifications of these data quality issues can cost an organization significant time and money, and in certain situations, impact national security.
As part of an initiative for a federal customer to advance the agency toward a person-centric analytics capability, Sevatec implemented a repeatable entity resolution solution using tools offered as part of Informatica’s entity resolution suite, most notably their Identity Match Option (IMO). We used the Informatica entity resolution tool suite’s industry standard algorithms (e.g. Jaro-Winkler, Hamming Distance) as part of a fuzzy matching process to identify similar or duplicate entities both within and across datasets. Our team customized our process for the systematic quantification and evaluation of record pairs – using matching scores, weighting factors, and thresholds to facilitate agglomeration of records into a single entity or to identify connections between records, such as family relationships. The implemented process provided flexibility to manage varying degrees of data quality issues (e.g. missing or incomplete data, data transposition, non-standard addresses) and to adjust thresholds to maximize entity resolution of source records while minimizing type II errors. Sevatec’s solution resolved and linked over 600 million source records across 11 disparate source systems comprising of more than 30 years of immigration data into 50 million relevant person entities, all while maintaining the integrity of the original source data. This data was then stored in a data lake on Amazon S3 where data scientists use dynamically scalable Spark clusters managed by Databricks to analyze person data across historic data sets using R, Python or Scala.
CASE STUDY #2
In today’s fast-paced environment rapid access to information is critical to enabling timely, relevant, accurate and informed decisions. While an organization may have all the data needed to respond, often these same organizations are limited in their ability to consolidate, process and analyze large volumes of data quickly enough to turn that data into actionable information. Additionally, to react on rapidly evolving emergency situations including natural disasters, organization-specific data must be merged with real-time non-structured publicly available data such as weather or social media feeds.
Our team ingested and consolidated real property, asset and energy sustainability data from across all components, bringing this data together in a central data warehouse using Informatica ETL. Sevatec’s geospatial subject matter experts then formatted and enriched the data (using Oracle and ArcGIS controlled by Python scripts housed in a cloud-based environment) to provide geospatially informed decision-making applications. GeoExplorer, as this solution is known, enabled rapid access to decision focused visualizations and reports in emergent and rapidly evolving situations. As an example, the CAPSIS GeoExplorer enabled the agency to use ArcGIS Online and Tableau to mash-up geocoded building and asset information against live streaming hurricane updates. This provided senior management with rich geospatial visualizations that included real-time impact analysis of potential dangers to government employees and physical assets as well as the general public.
We are trusted talent, inspired to serve, partnered with government, to protect and improve the lives of Americans.
CMMI-DEV CMMI-SVC ML3
ISO 9001:2015 ISO/IEC 20000-1:2011 ISO/IEC 27001:2013