Data Engineering Vendor Evaluation Criteria: 35 Criteria for 2026
Most vendor evaluation guides give you a framework. This one gives you the criteria — 35 of them, with definitions, verification sources, red flags, and suggested weights, organized into seven groups.
The difference matters. Knowing how to run a scorecard process is covered in the data engineering vendor scorecard template and the CIO evaluation playbook. This article is the reference you pull up when you need to know exactly what criterion 3.3 means, how to test it, and what a failing response looks like.
It is a procurement reference. Use it that way.
What can be the criteria for vendor evaluation in 2026?
That question has a 2024 answer and a 2026 answer, and they are not the same.
In 2024, a reasonable data engineering vendor evaluation covered platform credentials, delivery process, team quality, contract terms, and basic security posture. AI/ML readiness was a soft plus — nice if present, not a disqualifier if absent.
In 2026, that changed. Three forces converged:
EU AI Act enforcement (full compliance required by August 2026 under Regulation EU 2024/1689) moved vendor AI risk management from a contractual afterthought to a legal requirement for any engagement touching EU data subjects.
Warehouse-native AI matured. Snowflake Cortex and Databricks Mosaic AI are now primary feature vectors — not experimental add-ons. A vendor who cannot articulate the trade-offs between the two cannot credibly architect a forward-looking platform.
Agentic pipelines entered production. LangGraph, CrewAI, and similar orchestration layers are running in live data platforms. Vendors without hands-on experience are not missing a nice-to-have; they are missing the direction the work is going.
The result: AI/ML Readiness is now a top-level evaluation group, not a sub-criterion buried under Innovation. That is the defining shift from 2024 to 2026 frameworks.
The 35 criteria below are organized into seven groups:
Each criterion below has four fields: what it measures, how to verify it (not trust it), the red flag that tells you the vendor is weak on it, and a suggested percentage weight within its group. The group weights and how to apply them across project archetypes are covered in the weighting section.
How is data engineering vendor evaluation different from generic IT vendor evaluation?
Generic IT evaluation criteria — responsiveness, SLA adherence, security posture, commercial flexibility — are not wrong. They are just incomplete for data work.
Three gaps matter most:
Platform fidelity is binary in data engineering. A consultancy either knows Snowflake’s workload isolation and Cortex well enough to make production-grade decisions or it doesn’t. There is no “good enough” equivalent to being a Snowflake Elite partner with named certified architects. Generic IT evaluations collapse this distinction into a single “technical expertise” score.
Data quality is operationally fragile in a way most IT deliverables are not. A misconfigured API or a slow application can be patched at runtime. A broken data pipeline silently delivers wrong numbers to executives, finance models, and machine learning features — often for weeks before anyone notices. This makes observability maturity a first-class criterion that generic IT evaluations ignore entirely.
Handoff quality determines whether the engagement has a business outcome. Code delivered without documented lineage, without runbooks, without reproducible environment definitions, is a liability handed to the client. Generic IT evaluations lightly score “documentation.” Data engineering evaluations should treat it as a proxy for everything the vendor understands about operational maintenance.
The 35 criteria below reflect these differences. They do not replace generic IT vendor evaluation — they extend it.
The 7 groups of vendor evaluation criteria for data engineering
Before the detailed tables, a one-line description of each group:
| Group | Focus | Criteria |
|---|---|---|
| G1: Platform Fidelity | Depth of partnership and hands-on mastery of specific tools | 1.1–1.5 |
| G2: Delivery Maturity | How they build, test, promote, and observe data products | 2.1–2.5 |
| G3: AI/ML Readiness 2026 | Warehouse-native AI, agents, EU AI Act compliance | 3.1–3.5 |
| G4: Industry Fit | Vertical knowledge, regulatory fluency, domain-specific SLAs | 4.1–4.5 |
| G5: Commercial Hygiene | Contract structure, IP ownership, off-ramp protections | 5.1–5.5 |
| G6: Team & Talent | Named team policy, retention, subcontractor transparency | 6.1–6.5 |
| G7: Continuity & Risk | Financial stability, BCDR, cyber posture, exit readiness | 7.1–7.5 |
Below are the full tables for each group. Then a measurement-source map, a weighting section for three project archetypes, and the 35-item checklist.
Group 1: Platform Fidelity (5 criteria)
Platform fidelity is the most verifiable cluster in the evaluation. Partner tiers, certified engineer counts, and published reference architectures are public or auditable. There is no excuse for taking vendor claims at face value here.
| Criterion | Definition | How to verify | Red flag | Suggested weight in group |
|---|---|---|---|---|
| 1.1 Partner tier (Snowflake / Databricks / dbt) | Official partnership classification: Snowflake Elite, Premier, or Registered; Databricks SI Premier or Registered; dbt Labs partner status | Check directly against Snowflake partner directory, Databricks partner listings, and dbt Labs partner directory. Do not accept screenshots. | Claims “Snowflake partner” but cannot confirm tier; or tier is Registered not Elite/Premier for a complex project | 30% |
| 1.2 Named certified engineers | Actual count of Snowflake SnowPro Advanced, Databricks Certified, or dbt-certified engineers assigned to your engagement — not total headcount across the firm | Request names and certification IDs. Validate via Snowflake/Databricks certification lookup portals. | ”We have 40 Snowflake-certified engineers” with no ability to name any assigned to your project | 25% |
| 1.3 Cortex AI / Mosaic AI hands-on (2026) | Demonstrated production use of warehouse-native LLM features: Snowflake Cortex Analyst, Cortex Search, or Databricks Mosaic AI in a live client environment — not a sandbox demo | Request case study with client name, approximate data volume, and the specific Cortex or Mosaic AI feature used. Verify via reference call. | Describes Cortex/Mosaic AI using only marketing-page language; or conflates it with general OpenAI API integration | 20% |
| 1.4 SDK and open-source contributions | Active use of or contributions to Snowpark, delta-rs, PySpark, or published dbt packages on dbt Hub — signals engineers who read source code, not just documentation | Search GitHub for the firm’s org handle; check dbt Hub for published packages. Look for commit recency, not just existence. | Open-source “contributions” are a stale fork from 2022 with no commits; or no GitHub org presence at all | 15% |
| 1.5 Published reference architectures | Publicly available architecture guides, blog posts, or technical documentation demonstrating how the firm designs systems on their claimed platforms — not white-label vendor content | Request URLs. Look for author attribution, technical specificity, and publication date within the last 18 months. | ”We have internal IP” with nothing public; or all published content is co-authored with the platform vendor’s marketing team | 10% |
Group 2: Delivery Maturity (5 criteria)
Delivery maturity is where most projects either succeed or accumulate invisible debt. A vendor who cannot separate development from production environments, or who treats observability as an afterthought, will hand you a platform that works in demos and fails in quarter-end runs.
| Criterion | Definition | How to verify | Red flag | Suggested weight in group |
|---|---|---|---|---|
| 2.1 CI/CD for data | Separate dev/staging/prod environments with automated tests (dbt tests, Great Expectations, or equivalent) gating promotion to production | Request a sample CI/CD pipeline diagram and ask how a failing test is handled before a merge. Ask for an example of a deployment they rolled back. | ”We use version control” without a gate between environments; or environments exist but promotion is manual with no automated test pass requirement | 30% |
| 2.2 Infrastructure-as-code coverage | Terraform or Pulumi for cloud infrastructure; dbt for transformation logic — with both committed to a repository and subject to review | Request a sample Terraform module or dbt project structure. Ask what percentage of their last three projects used IaC from day one. | ”We can do Terraform if you need it”; or IaC was retrofitted after initial build because it was “faster to click through the console” | 25% |
| 2.3 Orchestration sophistication | Demonstrated competence with a specific orchestrator (Airflow, Dagster, or Prefect) — including cross-DAG dependencies, SLA monitoring, alerting, and environment promotion — not just “we support multiple orchestrators” | Ask them to walk through a specific DAG design decision: why they chose a sensor over a trigger in a given scenario, or how they handle partial pipeline failures. | Generic “we use Airflow” without specific knowledge of its scheduler, executor configuration, or failure handling; or “we use whatever the client has” | 25% |
| 2.4 Observability tooling | Hands-on production experience with at least one dedicated data observability tool (Monte Carlo, Bigeye, Anomalo, Datafold) — not just logging and alerting | Ask for a specific anomaly they caught in production via observability tooling that would not have been caught by standard monitoring. | No hands-on observability tool experience; or conflates database monitoring (CloudWatch, Datadog) with data observability | 10% |
| 2.5 Change-management process | Documented PR review standards, deployment cadence, and release notes discipline — evidence that changes are reviewed and traceable | Request a sample PR template or deployment checklist. Ask how they manage a breaking schema change across dependent pipelines. | No PR template; deploys happen whenever an engineer finishes work; schema changes are communicated informally via Slack | 10% |
Group 3: AI/ML Readiness 2026 — the new fork in the road
This group did not exist in 2024 evaluation frameworks. It exists now because the delta between vendors who have production AI/ML experience and those who do not is widening fast. A 2026 data platform built by a vendor with no Cortex or Mosaic AI literacy will need redesign before it can support agentic use cases. That redesign is your problem, not theirs.
EU AI Act note: August 2026 marks full compliance enforcement for high-risk AI system providers. Any vendor operating in or serving EU markets must have a documented risk management process for AI systems. If they cannot describe it, they are operating without compliance infrastructure — a legal and operational liability you inherit.
| Criterion | Definition | How to verify | Red flag | Suggested weight in group |
|---|---|---|---|---|
| 3.1 LLM-on-warehouse experience | Production delivery of features using Snowflake Cortex (Analyst, Search, Complete) or Databricks Mosaic AI — with the ability to articulate specific trade-offs between the two (latency, cost, governance model, feature completeness) | Ask: “For a client on Snowflake, when would you recommend Cortex Analyst over a standalone RAG pipeline, and when would you not?” If they cannot answer without hedging, they have not built it in production. | Lumps Cortex and Mosaic AI together as “AI features”; cannot articulate any performance or governance difference; references only ChatGPT or Azure OpenAI for all LLM use cases | 25% |
| 3.2 Agent orchestration in production | Demonstrated delivery of agentic data workflows using LangGraph, CrewAI, or custom orchestration — in a live client environment handling real data | Request an architecture overview of an agentic pipeline they built: what triggered the agent, what tools it called, how it handled failure, and how human oversight was maintained | ”We’ve done agentic proofs of concept” or “we’re planning to launch an AI practice” — PoC work does not count as production experience | 25% |
| 3.3 EU AI Act preparedness | Vendor has a documented AI risk management process: risk classification methodology for AI systems, bias-testing pipeline, and a named compliance lead or external counsel responsible for AI Act adherence | Ask: “Walk me through your AI risk classification process for an EU engagement. Who owns it?” Verify by requesting a sample risk register or compliance framework document. | ”We’re keeping an eye on it”; or compliance is described as “the client’s responsibility”; or they have not heard of Regulation EU 2024/1689 | 20% |
| 3.4 Model governance | Documented approach to ML model lineage, drift detection, and retraining triggers — including use of Population Stability Index (PSI) thresholds or equivalent metrics — and integration with the data platform’s existing lineage tooling | Ask how they connect model performance degradation detection to the upstream data pipeline that feeds the model. Look for specificity about PSI thresholds, monitoring cadence, and alerting. | Model governance described as “we document in Confluence”; no connection between data pipeline monitoring and model retraining decisions; no mention of drift metrics | 15% |
| 3.5 AI coding agent integration | Articulated policy on AI coding assistant use (Cursor, Cody, GitHub Copilot, Aider) in their delivery workflow — including guardrails: what code is reviewed by humans, what is auto-merged, and how they handle hallucinated dependencies | Ask: “Do your engineers use AI coding assistants on client engagements? What guardrails govern that?” Either “yes with policy” or “no, by design” are acceptable. “Yes, freely” without policy is not. | Unrestricted use of AI coding assistants with no review policy; cannot describe what their engineers do with AI-generated code before it reaches client infrastructure | 15% |
Group 4: Industry Fit (5 criteria)
Industry fit is not about logos. A vendor can list healthcare clients and still not know what a claims adjudication cycle looks like in a data model. Depth matters more than breadth here.
| Criterion | Definition | How to verify | Red flag | Suggested weight in group |
|---|---|---|---|---|
| 4.1 Vertical case studies at your scale | Documented delivery in your industry at a comparable data volume, team size, and platform complexity — not just “we’ve worked in financial services” | Request two case studies with approximate data volumes, number of pipelines, and what happened when something went wrong. Call the reference client. | Case studies are from five years ago, a different sub-vertical, or a project significantly smaller than yours | 30% |
| 4.2 Regulatory familiarity | Working knowledge of the specific compliance regime governing your data: HIPAA (healthcare), PCI-DSS (payments), GDPR (EU data subjects), FINRA (broker-dealer), or sector-specific audit requirements | Present a scenario: “We have PII from EU users flowing into our Snowflake warehouse. Walk me through how you’d design access controls and data retention.” Score the specificity of the answer. | Regulatory knowledge described at the framework level only — “we’re familiar with GDPR” — without specific design decisions; or relies entirely on “the legal team handles compliance” | 25% |
| 4.3 Domain glossary fluency | Engineers on the proposed team can speak the domain language without a translator: insurance loss ratios, fintech reconciliation cycles, healthcare NDC codes, retail sell-through rates — whatever applies to your business | Test this in the discovery call. Introduce two or three domain-specific terms and watch whether the proposed team engages with them or asks the business stakeholder to explain | Cannot define basic domain terms; engineers redirect domain questions to the account manager or subject matter expert rather than engaging directly | 20% |
| 4.4 Client logos in your vertical | Named current or recent clients in your industry — not a list of industry categories — available for reference | Ask for the names of two current or recent clients in your vertical and request permission to contact them. A vendor who cannot provide a single named client in your industry has not earned the claim. | A list of industry categories (“we serve healthcare, financial services, retail”) with no named accounts willing to speak as references | 15% |
| 4.5 Industry-specific SLAs | Demonstrated understanding that your industry imposes non-standard timing requirements: month-end close in finance, claims turnaround in insurance, data freshness for algorithmic trading, HIPAA breach notification windows | Ask: “Given our industry, what SLAs would you recommend for pipeline freshness and incident response?” A strong answer cites industry norms without prompting. | SLA discussion defaults entirely to generic uptime metrics; no mention of domain-specific timing requirements | 10% |
Group 5: Commercial Hygiene (5 criteria)
A vendor’s commercial terms reveal their risk posture and partnership philosophy faster than any technical discussion. Resistance to change-order caps or IP clarity is not negotiable friction — it is a signal.
| Criterion | Definition | How to verify | Red flag | Suggested weight in group |
|---|---|---|---|---|
| 5.1 Change-order rate cap | Contract clause limiting the cumulative value of change orders without re-scoping — typically 10–15% of the original SOW value before a formal scope review is required | Review the MSA/SOW draft. Ask: “What triggers a formal scope review?” If the answer is “when we decide it’s necessary,” the clause is absent. | No cap on change orders; “we manage scope collaboratively” without a contractual floor; or change-order language that effectively allows open-ended billing | 25% |
| 5.2 IP ownership clause | Unambiguous statement that the client owns 100% of all work product: custom code, data models, pipeline logic, documentation, and any derivative works | Read the IP clause directly — do not accept a verbal confirmation. The contract should use language like “all work product is work made for hire and is the exclusive property of Client.” | Joint IP language; “vendor retains rights to methodologies and frameworks” without clear delineation from client deliverables; “subject to prior art” carve-outs that could encumber custom work | 25% |
| 5.3 Milestone payments + holdback | Payment structure tied to delivery milestones with a 10% holdback released only upon successful acceptance testing and documentation handover | Review the SOW payment schedule. Confirm that the final payment tranche is contingent on documented acceptance criteria. | Front-loaded payment schedule; no holdback; acceptance criteria defined by the vendor rather than the client; “net-30 on invoice” without milestone contingency | 20% |
| 5.4 Off-ramp and transition clause | Contractual right to exit for convenience plus an obligation on the vendor to provide transition assistance — knowledge transfer, handover documentation, and a minimum notice-period cooperation window | Confirm the termination-for-convenience clause exists and that it includes a cooperation period (typically 30–60 days) with specific transition deliverables. | No termination-for-convenience clause; or exit clause exists but vendor has no transition cooperation obligation; or transition assistance is billable at standard rates with no cap | 20% |
| 5.5 SLA penalty structure | Contractual penalties (service credits, payment deductions) tied to measurable SLA failures — pipeline availability, data freshness, incident response time | Ask for the SLA schedule and penalty table. Look for specific thresholds: “If P95 pipeline latency exceeds X minutes for more than Y hours in a calendar month, Vendor credits Z% of monthly fees.” | SLA commitments without financial consequences; “we take SLAs seriously” without a penalty structure; or penalty caps so low they remove any incentive for vendor compliance | 10% |
Group 6: Team & Talent (5 criteria)
Staffing risk is the most common source of project failure that evaluations fail to catch in time. The team you meet in the pitch is not always the team that shows up to build.
| Criterion | Definition | How to verify | Red flag | Suggested weight in group |
|---|---|---|---|---|
| 6.1 Named-team policy | Specific named individuals assigned to your project — lead architect, lead engineer, and engagement manager — included in the SOW before signature, not “TBD based on availability” | Require names in the SOW as a condition of contract. Ask each proposed person to participate in at least one technical session before signing. | ”We will confirm the team upon contract execution”; named architects in the proposal are partners who will not be doing delivery work; or the team changes between proposal and SOW | 30% |
| 6.2 Key personnel substitution clause | Contract clause requiring client approval for substitution of named key personnel — with a defined process (notice period, proposed replacement CV, client acceptance window) | Review the key personnel clause in the MSA. Confirm it requires advance notice (30 days minimum) and client approval before substitution takes effect. | No key personnel clause; or vendor retains unilateral right to substitute staff with only “equivalent experience” as the standard; or clause exists but approval is deemed granted if client doesn’t respond within 48 hours | 25% |
| 6.3 Geographic mix and time-zone overlap | Documented delivery team geography with enough overlap with your operating hours to support daily collaboration — typically a minimum of 4 hours of working-hour overlap | Ask for a team roster with locations and confirm which time zones have decision-making authority (not just execution capacity). | All senior architects are onshore; all execution is offshore with a 1-hour overlap window; or time zone overlap is described in terms of “asynchronous collaboration works fine for us” | 20% |
| 6.4 Retention rate of senior engineers | Annual retention rate for engineers at senior/staff level — a proxy for team stability and the likelihood that the people you evaluated will still be there at month six | Ask directly: “What is your annual retention rate for senior engineers over the past two years?” Acceptable answers include a specific percentage with context. | Cannot provide a number; deflects to general “we have a great culture” language; or Glassdoor reviews show a pattern of project-level churn | 15% |
| 6.5 Subcontractor disclosure | Full disclosure of which project work will be subcontracted, to what entity, in what geography, and why — with client approval rights over subcontractor use | Ask: “What percentage of this engagement do you expect to deliver via subcontractors, and will we have approval rights?” Require subcontractor disclosure in the SOW. | Subcontractor use is described as standard practice without disclosure; or the vendor treats this as confidential; or subcontractors are described as “extended team members” to obscure the arrangement | 10% |
Group 7: Continuity & Risk (5 criteria)
Continuity criteria are the ones organizations skip because they feel like low-probability events. They are also the ones that determine whether a vendor failure becomes a recoverable problem or a 12-month rebuild.
| Criterion | Definition | How to verify | Red flag | Suggested weight in group |
|---|---|---|---|---|
| 7.1 Financial stability | Confidence that the vendor will exist and remain solvent through the project lifecycle — assessed through ownership structure (private/PE-backed/public), revenue run-rate transparency, and customer concentration risk | Ask: “Who owns the business, and have there been any ownership changes in the last 24 months?” For PE-backed firms, ask about fund vintage and remaining hold period. | Recent acquisition by a PE fund with a short expected hold; high dependence on one or two anchor clients who could leave; or vendor declines to discuss ownership or financial health | 30% |
| 7.2 BCDR plan | Documented Business Continuity and Disaster Recovery plan covering key personnel loss, office disruption, and cloud infrastructure failure — specific to how the vendor delivers data engineering work | Request the BCDR plan summary or ask for a walkthrough: “If your lead architect is unavailable for two weeks at a critical project phase, what is your response?” | No documented BCDR plan; or BCDR documentation is generic (“we use cloud infrastructure”) without specifics on personnel succession and project continuity | 25% |
| 7.3 Cybersecurity posture | SOC 2 Type II certification, ISO 27001 certification, or equivalent — with a recent penetration test (within 12 months) and a disclosed vulnerability disclosure policy | Request the SOC 2 Type II report summary (not the full report — the summary is sufficient for procurement purposes). Confirm pen-test recency. | SOC 2 Type I only (point-in-time audit, not continuous controls); certification pending but not yet achieved; pen test more than 24 months old; no vulnerability disclosure policy | 20% |
| 7.4 Insurance coverage | Active cyber liability insurance and professional E&O (Errors & Omissions) coverage with limits commensurate with the project value — typically at least $5M each for enterprise engagements | Request certificates of insurance. Confirm coverage limits, policy expiry, and that your organization is named as an additional insured on the cyber policy. | Coverage limits of $1M or less for a multi-million-dollar engagement; policy expires before project completion; vendor cannot produce a certificate of insurance promptly | 15% |
| 7.5 Exit readiness | Evidence that the vendor has successfully transferred knowledge and platform ownership to a client at the end of a prior engagement — demonstrated through documented runbooks, pipeline catalogs, and reference client confirmation | Ask for a reference call specifically focused on the handover experience: what was delivered, was it sufficient, and what did the client need to build themselves. | No reference available for a completed handover; or transition materials described as “standard documentation” without specifics; or vendor has no history of disengagements — only ongoing managed service relationships | 10% |
How to measure each criterion: the source map
Every criterion has a measurement source. Knowing which source answers which question prevents evaluation theater — where vendors prepare polished responses to soft questions while the hard evidence never gets checked.
The practical rule: if a criterion can only be answered by the vendor themselves (RFP response or working session), it needs a second source — either a public signal or a reference call — before it can be scored at full confidence.
What is the 3 vendor rule? (And why data engineering needs 4–6, not 3)
The “3 vendor rule” is procurement folk wisdom: ask for proposals from 7 vendors, bring 3 to final evaluation. The logic is clean — 3 gives you genuine competition without overwhelming the evaluation team.
For standard IT procurement, the rule is defensible. For data engineering vendor selection, it is too narrow.
Here is the problem: data engineering vendors differ on dimensions that only become visible when you have enough comparison points. Platform fidelity is not evenly distributed. A vendor who is a Snowflake Elite partner may have weak Databricks depth. A vendor with excellent delivery maturity may have no AI/ML readiness. A vendor with strong AI credentials may have thin commercial hygiene. Three vendors rarely give you enough triangulation to see these trade-offs clearly.
The empirical sweet spot, based on analysis of DataEngineeringCompanies.com’s 86-vendor scored dataset, is 4–6 vendors in serious evaluation:
- 4 vendors: minimum for meaningful technical triangulation on platform fidelity
- 5–6 vendors: appropriate for complex multi-platform or regulated-industry projects where risk criteria need broader comparison
The practical implication: your longlist should be 8–12. Your shortlist should be 4–6, not 3. The additional 1–3 vendors cost roughly 15% more evaluation time and provide disproportionately more signal on criteria where vendors cluster tightly.
The one exception: if you have an incumbent with a strong track record and the project is a natural extension of existing work, a 3-vendor final evaluation (incumbent + 2 challengers) is defensible. But for new platform builds or significant capability gaps, go to 4–6.
What are the 4 pillars of data engineering?
The four pillars of data engineering are:
- Collection — acquiring data from source systems (APIs, databases, event streams, files) with appropriate reliability, latency, and schema management
- Cleaning — transforming, validating, deduplicating, and enriching raw data to a state fit for downstream use
- Analysis-readiness — modeling data in structures (star schemas, OBT, lakehouses) that support the query patterns of the business and its tools
- Operationalisation — running pipelines in production with monitoring, alerting, SLA management, and the governance controls required by the business and its regulators
Each pillar maps directly to specific criteria in the 35-criterion framework:
| Pillar | Primary criteria | Key verification questions |
|---|---|---|
| Collection | 1.2 (named certified engineers), 2.3 (orchestration sophistication), 2.1 (CI/CD for data) | Can they ingest from your specific sources reliably? What is their retry and dead-letter handling approach? |
| Cleaning | 2.5 (change-management process), 4.3 (domain glossary fluency), 3.4 (model governance) | How do they manage schema evolution? What constitutes an acceptable data quality SLA? |
| Analysis-readiness | 1.3 (Cortex/Mosaic AI), 4.1 (vertical case studies), 1.5 (published reference architectures) | Can they model for your business questions, not just generic star schemas? Have they built for your query volumes? |
| Operationalisation | 2.4 (observability tooling), 7.2 (BCDR), 7.5 (exit readiness), 5.5 (SLA penalty structure) | What does “production-ready” mean to them? How do they handle incidents? What does handover look like? |
The pillar framework is useful for scoping. Before you assign weights to the 35 criteria, confirm which pillars carry the most risk for your project. A greenfield platform build has roughly equal weight across all four. A migration project typically has disproportionate risk in Collection and Cleaning. An AI enablement project puts the heaviest weight on Analysis-readiness and — increasingly — Operationalisation of model pipelines.
For more on the evaluation process that surrounds these pillars, the how to evaluate data engineering vendors guide covers the full CIO decision workflow.
How should the 35 criteria be weighted for different project types?
The 35 criteria do not carry equal weight across all project types. The tables below show suggested group-level weights for three common archetypes. Criteria weights within each group stay constant (as shown in the group tables above) — only the group-level allocation shifts.
Archetype A: Net-new platform build — Greenfield Snowflake or Databricks platform, no existing modern infrastructure, organisation building from scratch.
Archetype B: Modernisation / migration — Moving from legacy warehouse, on-prem Hadoop, or first-generation cloud to a modern platform. Data exists; the goal is reliability, scalability, and reduced operational cost.
Archetype C: AI/ML enablement — Existing modern platform; the project’s goal is to add AI/ML capabilities: feature stores, model pipelines, Cortex/Mosaic AI integration, or agentic data workflows.
| Group | Archetype A: New Build | Archetype B: Migration | Archetype C: AI/ML |
|---|---|---|---|
| G1: Platform Fidelity | 20% | 20% | 15% |
| G2: Delivery Maturity | 20% | 25% | 15% |
| G3: AI/ML Readiness | 15% | 5% | 30% |
| G4: Industry Fit | 10% | 15% | 10% |
| G5: Commercial Hygiene | 10% | 10% | 10% |
| G6: Team & Talent | 15% | 15% | 10% |
| G7: Continuity & Risk | 10% | 10% | 10% |
| Total | 100% | 100% | 100% |
Notes on the weightings:
- For Archetype B (migration), Delivery Maturity carries the highest weight because migration risk is predominantly an execution problem — cutover readiness, rollback discipline, and reconciliation rigor.
- For Archetype C (AI/ML), Group 3 jumps to 30% — nearly double the next closest group. Vendors without hands-on warehouse-native AI experience should not be shortlisted for this work.
- Group 5 (Commercial Hygiene) and Group 7 (Continuity & Risk) stay flat across all archetypes at 10% each. These are not project-type-dependent; they are baseline requirements regardless of what is being built.
For the mechanics of applying these group weights and running the scoring sessions, see the vendor scorecard template.
Frequently asked questions
How many vendor evaluation criteria should an RFP contain?
More criteria create noise, not signal. An RFP issued to vendors should not ask about all 35 criteria — it should ask about the ten to twelve most differentiated ones for your project type.
The full 35-criterion framework is for your internal evaluation, not the vendor document. Use the RFP to elicit evidence. Use the scorecard to assess it. Sending a 35-question RFP to vendors produces 35 narrative answers that are difficult to compare and incentivize length over clarity.
A well-structured RFP for data engineering typically contains 8–12 open-ended scenario questions targeting your highest-weight criteria, plus structured fields for certifications, team composition, and commercial assumptions. For process guidance, see RFP process best practices.
Should we share our weighted scorecard with the vendors?
No. Sharing the scorecard — including your weightings — before evaluation is complete allows vendors to optimize responses toward your scoring model rather than showing you their actual capabilities and approach. That defeats the purpose.
After selection, sharing the scorecard results with the winning vendor is useful: it establishes shared expectations and creates an accountability framework for the engagement. Sharing with unsuccessful vendors as part of a debrief is reasonable and professionally courteous.
What evaluation criteria changed most between 2024 and 2026?
Three changes are material:
Group 3 (AI/ML Readiness) did not exist as a discrete evaluation group in 2024. Criteria like LLM-on-warehouse experience, agent orchestration in production, and EU AI Act preparedness were either absent from frameworks entirely or buried as sub-criteria under “Innovation.” In 2026, they are a standalone group worth 15–30% of total weight depending on project type.
Observability maturity (criterion 2.4) moved from a tie-breaker to a baseline requirement. In 2024, hands-on experience with dedicated observability tools (Monte Carlo, Bigeye, Datafold) was a differentiator. In 2026, it is an expected competency. Vendors without it are missing a production-readiness standard that has become industry norm.
EU AI Act compliance (criterion 3.3) became a procurement criterion. Prior to the Act’s enforcement, AI governance was a soft discussion topic. With full compliance required by August 2026, any vendor operating in EU markets who cannot produce a documented AI risk management process is operating outside legal requirements. That is a disqualifying condition, not a scoring deduction.
For a broader view of what changed, the data engineering due diligence checklist covers the validation layer that sits alongside this evaluation framework.
Can a smaller boutique meet all 35 criteria?
Not all 35, but that is not the right test.
A boutique with 15–25 engineers will typically score lower on financial stability (7.1), BCDR plan depth (7.2), and subcontractor disclosure scope (6.5) — simply because those criteria assume a certain organizational scale. That is expected and should be weighted accordingly.
What smaller boutiques can and should score well on: named-team policy (6.1), platform fidelity at the individual-engineer level (1.2), orchestration sophistication (2.3), and domain glossary fluency (4.3). A 20-person Snowflake specialist with three SnowPro Advanced certified engineers and five reference clients in your vertical will outperform a 500-person generalist IT firm on the criteria that most predict project success.
The practical adjustment: for boutiques, raise the weight of Criteria 6.4 (retention rate) and 7.1 (financial stability) in your internal discussion, and add explicit questions about bench depth for key-personnel coverage. Then evaluate against the adjusted model — not a standard designed for large systems integrators.
Next step
The 35 criteria above define what to measure. The question of how to structure the process around them — from initial longlist to final recommendation — is covered in the data engineering partner selection hub.
If you are actively issuing an RFP, the discovery call question library gives you the verbal-format versions of the verification questions above. The POC scoping guide covers how to structure a paid proof of concept that tests criteria 2.1–2.5 and 3.1–3.2 in a controlled setting. For common RFP errors that cause evaluations to reach the wrong conclusion, see RFP mistakes to avoid.
If you want a matched shortlist of vetted vendors scored against these criteria, get matched or download the RFP checklist.
The 35-criterion checklist (print or copy as a 1-pager)
Group 1: Platform Fidelity
- 1.1 Partner tier confirmed in official vendor directory (Elite/Premier, not self-reported)
- 1.2 Named certified engineers verified by certification ID, not headcount claim
- 1.3 Cortex AI / Mosaic AI production case study with client reference
- 1.4 Open-source or SDK contributions verified on GitHub/dbt Hub with recent commits
- 1.5 Published reference architecture with author attribution, dated within 18 months
Group 2: Delivery Maturity 6. [ ] 2.1 CI/CD pipeline with automated test gates between dev and production environments 7. [ ] 2.2 IaC coverage via Terraform/Pulumi and dbt, committed to repository from project start 8. [ ] 2.3 Orchestration depth: specific Airflow/Dagster/Prefect design decisions explained, not just claimed 9. [ ] 2.4 Production experience with Monte Carlo, Bigeye, Anomalo, or Datafold 10. [ ] 2.5 PR template and deployment checklist confirmed; breaking-change process documented
Group 3: AI/ML Readiness 2026 11. [ ] 3.1 Articulated trade-off between Snowflake Cortex and Databricks Mosaic AI for your use case 12. [ ] 3.2 Agentic workflow in production (LangGraph, CrewAI, or equivalent) — not PoC only 13. [ ] 3.3 Documented EU AI Act compliance process with named responsible owner 14. [ ] 3.4 Model governance with PSI-based drift detection tied to upstream pipeline monitoring 15. [ ] 3.5 AI coding assistant policy documented: scope, guardrails, human review requirements
Group 4: Industry Fit 16. [ ] 4.1 Vertical case study with comparable scale — reference client available to speak 17. [ ] 4.2 Regulatory scenario answered with specific design decisions (HIPAA/GDPR/PCI-DSS/FINRA) 18. [ ] 4.3 Domain terminology engaged directly by proposed delivery team, not routed to SME 19. [ ] 4.4 Named client in your vertical willing to take a reference call 20. [ ] 4.5 Industry-specific SLA recommendations given unprompted during discussion
Group 5: Commercial Hygiene 21. [ ] 5.1 Change-order rate cap (≤15% of SOW value) in the contract 22. [ ] 5.2 IP ownership clause: 100% work-for-hire language, no joint-IP or methodology carve-outs 23. [ ] 5.3 Milestone payment schedule with 10% holdback on final acceptance 24. [ ] 5.4 Termination-for-convenience clause with 30–60 day vendor cooperation obligation 25. [ ] 5.5 SLA penalty table with specific credit percentages tied to measurable failure thresholds
Group 6: Team & Talent 26. [ ] 6.1 Named team (lead architect, lead engineer, engagement manager) in SOW before signing 27. [ ] 6.2 Key personnel substitution clause: 30-day notice + client approval required 28. [ ] 6.3 Geographic mix confirmed: minimum 4-hour working-hour overlap with your team 29. [ ] 6.4 Senior engineer retention rate: specific percentage provided for last 2 years 30. [ ] 6.5 Subcontractor use disclosed: percentage, entity, geography, client approval right confirmed
Group 7: Continuity & Risk 31. [ ] 7.1 Ownership structure confirmed: PE/private/public, no undisclosed change-of-control in last 24 months 32. [ ] 7.2 BCDR plan reviewed: personnel succession and project continuity are explicit, not implied 33. [ ] 7.3 SOC 2 Type II or ISO 27001 confirmed; pen test within 12 months 34. [ ] 7.4 Certificate of insurance: cyber liability + E&O at ≥$5M, client named as additional insured 35. [ ] 7.5 Handover reference: prior client confirms runbooks, pipeline catalog, and knowledge transfer quality
Data-driven market researcher with 20+ years in market research and 10+ years helping software agencies and IT organizations make evidence-based decisions. Former market research analyst at Aviva Investors and Credit Suisse.
Previously: Aviva Investors · Credit Suisse · Brainhub · 100Signals
Top Enterprise Partners
Vetted firms whose specialty matches this article.
Related Analysis

50 Data Engineering Discovery Call Questions to Ask Vendors (2026)
Fifty data engineering discovery call questions to ask vendors — buyer-side, with good-answer signals, red flags, and a 60-minute timebox script.

Data Engineering Partner Selection: The 2026 Five-Stage Framework
A 2026 framework for data engineering partner selection: pre-RFP signal scan, sourcing, evaluation, paid pilot, contract, and 90-day handover.

The 10-Point Data Engineering Due Diligence Checklist for 2026
Don't hire a consultant without this data engineering due diligence checklist. Vet firms on architecture, cost, governance, and team skills before you sign.