Data Engineering Vendor Evaluation Criteria: 35 Criteria for 2026

By Peter Korpak · Chief Analyst & Founder Verified May 29, 2026
data engineering vendor evaluation criteria vendor evaluation rfp criteria scorecard vendor scoring procurement
Data Engineering Vendor Evaluation Criteria: 35 Criteria for 2026

Most vendor evaluation guides give you a framework. This one gives you the criteria — 35 of them, with definitions, verification sources, red flags, and suggested weights, organized into seven groups.

The difference matters. Knowing how to run a scorecard process is covered in the data engineering vendor scorecard template and the CIO evaluation playbook. This article is the reference you pull up when you need to know exactly what criterion 3.3 means, how to test it, and what a failing response looks like.

It is a procurement reference. Use it that way.

What can be the criteria for vendor evaluation in 2026?

That question has a 2024 answer and a 2026 answer, and they are not the same.

In 2024, a reasonable data engineering vendor evaluation covered platform credentials, delivery process, team quality, contract terms, and basic security posture. AI/ML readiness was a soft plus — nice if present, not a disqualifier if absent.

In 2026, that changed. Three forces converged:

EU AI Act enforcement (full compliance required by August 2026 under Regulation EU 2024/1689) moved vendor AI risk management from a contractual afterthought to a legal requirement for any engagement touching EU data subjects.

Warehouse-native AI matured. Snowflake Cortex and Databricks Mosaic AI are now primary feature vectors — not experimental add-ons. A vendor who cannot articulate the trade-offs between the two cannot credibly architect a forward-looking platform.

Agentic pipelines entered production. LangGraph, CrewAI, and similar orchestration layers are running in live data platforms. Vendors without hands-on experience are not missing a nice-to-have; they are missing the direction the work is going.

The result: AI/ML Readiness is now a top-level evaluation group, not a sub-criterion buried under Innovation. That is the defining shift from 2024 to 2026 frameworks.

The 35 criteria below are organized into seven groups:

35 Evaluation Criteria 7 groups · 5 criteria each G1: Platform Fidelity 5 criteria G2: Delivery Maturity 5 criteria G3: AI/ML Readiness 2026 — new group 5 criteria G4: Industry Fit 5 criteria G5: Commercial Hygiene 5 criteria G6: Team & Talent 5 criteria G7: Continuity & Risk 5 criteria

Each criterion below has four fields: what it measures, how to verify it (not trust it), the red flag that tells you the vendor is weak on it, and a suggested percentage weight within its group. The group weights and how to apply them across project archetypes are covered in the weighting section.

How is data engineering vendor evaluation different from generic IT vendor evaluation?

Generic IT evaluation criteria — responsiveness, SLA adherence, security posture, commercial flexibility — are not wrong. They are just incomplete for data work.

Three gaps matter most:

Platform fidelity is binary in data engineering. A consultancy either knows Snowflake’s workload isolation and Cortex well enough to make production-grade decisions or it doesn’t. There is no “good enough” equivalent to being a Snowflake Elite partner with named certified architects. Generic IT evaluations collapse this distinction into a single “technical expertise” score.

Data quality is operationally fragile in a way most IT deliverables are not. A misconfigured API or a slow application can be patched at runtime. A broken data pipeline silently delivers wrong numbers to executives, finance models, and machine learning features — often for weeks before anyone notices. This makes observability maturity a first-class criterion that generic IT evaluations ignore entirely.

Handoff quality determines whether the engagement has a business outcome. Code delivered without documented lineage, without runbooks, without reproducible environment definitions, is a liability handed to the client. Generic IT evaluations lightly score “documentation.” Data engineering evaluations should treat it as a proxy for everything the vendor understands about operational maintenance.

The 35 criteria below reflect these differences. They do not replace generic IT vendor evaluation — they extend it.

The 7 groups of vendor evaluation criteria for data engineering

Before the detailed tables, a one-line description of each group:

GroupFocusCriteria
G1: Platform FidelityDepth of partnership and hands-on mastery of specific tools1.1–1.5
G2: Delivery MaturityHow they build, test, promote, and observe data products2.1–2.5
G3: AI/ML Readiness 2026Warehouse-native AI, agents, EU AI Act compliance3.1–3.5
G4: Industry FitVertical knowledge, regulatory fluency, domain-specific SLAs4.1–4.5
G5: Commercial HygieneContract structure, IP ownership, off-ramp protections5.1–5.5
G6: Team & TalentNamed team policy, retention, subcontractor transparency6.1–6.5
G7: Continuity & RiskFinancial stability, BCDR, cyber posture, exit readiness7.1–7.5

Below are the full tables for each group. Then a measurement-source map, a weighting section for three project archetypes, and the 35-item checklist.


Group 1: Platform Fidelity (5 criteria)

Platform fidelity is the most verifiable cluster in the evaluation. Partner tiers, certified engineer counts, and published reference architectures are public or auditable. There is no excuse for taking vendor claims at face value here.

CriterionDefinitionHow to verifyRed flagSuggested weight in group
1.1 Partner tier (Snowflake / Databricks / dbt)Official partnership classification: Snowflake Elite, Premier, or Registered; Databricks SI Premier or Registered; dbt Labs partner statusCheck directly against Snowflake partner directory, Databricks partner listings, and dbt Labs partner directory. Do not accept screenshots.Claims “Snowflake partner” but cannot confirm tier; or tier is Registered not Elite/Premier for a complex project30%
1.2 Named certified engineersActual count of Snowflake SnowPro Advanced, Databricks Certified, or dbt-certified engineers assigned to your engagement — not total headcount across the firmRequest names and certification IDs. Validate via Snowflake/Databricks certification lookup portals.”We have 40 Snowflake-certified engineers” with no ability to name any assigned to your project25%
1.3 Cortex AI / Mosaic AI hands-on (2026)Demonstrated production use of warehouse-native LLM features: Snowflake Cortex Analyst, Cortex Search, or Databricks Mosaic AI in a live client environment — not a sandbox demoRequest case study with client name, approximate data volume, and the specific Cortex or Mosaic AI feature used. Verify via reference call.Describes Cortex/Mosaic AI using only marketing-page language; or conflates it with general OpenAI API integration20%
1.4 SDK and open-source contributionsActive use of or contributions to Snowpark, delta-rs, PySpark, or published dbt packages on dbt Hub — signals engineers who read source code, not just documentationSearch GitHub for the firm’s org handle; check dbt Hub for published packages. Look for commit recency, not just existence.Open-source “contributions” are a stale fork from 2022 with no commits; or no GitHub org presence at all15%
1.5 Published reference architecturesPublicly available architecture guides, blog posts, or technical documentation demonstrating how the firm designs systems on their claimed platforms — not white-label vendor contentRequest URLs. Look for author attribution, technical specificity, and publication date within the last 18 months.”We have internal IP” with nothing public; or all published content is co-authored with the platform vendor’s marketing team10%

Group 2: Delivery Maturity (5 criteria)

Delivery maturity is where most projects either succeed or accumulate invisible debt. A vendor who cannot separate development from production environments, or who treats observability as an afterthought, will hand you a platform that works in demos and fails in quarter-end runs.

CriterionDefinitionHow to verifyRed flagSuggested weight in group
2.1 CI/CD for dataSeparate dev/staging/prod environments with automated tests (dbt tests, Great Expectations, or equivalent) gating promotion to productionRequest a sample CI/CD pipeline diagram and ask how a failing test is handled before a merge. Ask for an example of a deployment they rolled back.”We use version control” without a gate between environments; or environments exist but promotion is manual with no automated test pass requirement30%
2.2 Infrastructure-as-code coverageTerraform or Pulumi for cloud infrastructure; dbt for transformation logic — with both committed to a repository and subject to reviewRequest a sample Terraform module or dbt project structure. Ask what percentage of their last three projects used IaC from day one.”We can do Terraform if you need it”; or IaC was retrofitted after initial build because it was “faster to click through the console”25%
2.3 Orchestration sophisticationDemonstrated competence with a specific orchestrator (Airflow, Dagster, or Prefect) — including cross-DAG dependencies, SLA monitoring, alerting, and environment promotion — not just “we support multiple orchestrators”Ask them to walk through a specific DAG design decision: why they chose a sensor over a trigger in a given scenario, or how they handle partial pipeline failures.Generic “we use Airflow” without specific knowledge of its scheduler, executor configuration, or failure handling; or “we use whatever the client has”25%
2.4 Observability toolingHands-on production experience with at least one dedicated data observability tool (Monte Carlo, Bigeye, Anomalo, Datafold) — not just logging and alertingAsk for a specific anomaly they caught in production via observability tooling that would not have been caught by standard monitoring.No hands-on observability tool experience; or conflates database monitoring (CloudWatch, Datadog) with data observability10%
2.5 Change-management processDocumented PR review standards, deployment cadence, and release notes discipline — evidence that changes are reviewed and traceableRequest a sample PR template or deployment checklist. Ask how they manage a breaking schema change across dependent pipelines.No PR template; deploys happen whenever an engineer finishes work; schema changes are communicated informally via Slack10%

Group 3: AI/ML Readiness 2026 — the new fork in the road

This group did not exist in 2024 evaluation frameworks. It exists now because the delta between vendors who have production AI/ML experience and those who do not is widening fast. A 2026 data platform built by a vendor with no Cortex or Mosaic AI literacy will need redesign before it can support agentic use cases. That redesign is your problem, not theirs.

EU AI Act note: August 2026 marks full compliance enforcement for high-risk AI system providers. Any vendor operating in or serving EU markets must have a documented risk management process for AI systems. If they cannot describe it, they are operating without compliance infrastructure — a legal and operational liability you inherit.

CriterionDefinitionHow to verifyRed flagSuggested weight in group
3.1 LLM-on-warehouse experienceProduction delivery of features using Snowflake Cortex (Analyst, Search, Complete) or Databricks Mosaic AI — with the ability to articulate specific trade-offs between the two (latency, cost, governance model, feature completeness)Ask: “For a client on Snowflake, when would you recommend Cortex Analyst over a standalone RAG pipeline, and when would you not?” If they cannot answer without hedging, they have not built it in production.Lumps Cortex and Mosaic AI together as “AI features”; cannot articulate any performance or governance difference; references only ChatGPT or Azure OpenAI for all LLM use cases25%
3.2 Agent orchestration in productionDemonstrated delivery of agentic data workflows using LangGraph, CrewAI, or custom orchestration — in a live client environment handling real dataRequest an architecture overview of an agentic pipeline they built: what triggered the agent, what tools it called, how it handled failure, and how human oversight was maintained”We’ve done agentic proofs of concept” or “we’re planning to launch an AI practice” — PoC work does not count as production experience25%
3.3 EU AI Act preparednessVendor has a documented AI risk management process: risk classification methodology for AI systems, bias-testing pipeline, and a named compliance lead or external counsel responsible for AI Act adherenceAsk: “Walk me through your AI risk classification process for an EU engagement. Who owns it?” Verify by requesting a sample risk register or compliance framework document.”We’re keeping an eye on it”; or compliance is described as “the client’s responsibility”; or they have not heard of Regulation EU 2024/168920%
3.4 Model governanceDocumented approach to ML model lineage, drift detection, and retraining triggers — including use of Population Stability Index (PSI) thresholds or equivalent metrics — and integration with the data platform’s existing lineage toolingAsk how they connect model performance degradation detection to the upstream data pipeline that feeds the model. Look for specificity about PSI thresholds, monitoring cadence, and alerting.Model governance described as “we document in Confluence”; no connection between data pipeline monitoring and model retraining decisions; no mention of drift metrics15%
3.5 AI coding agent integrationArticulated policy on AI coding assistant use (Cursor, Cody, GitHub Copilot, Aider) in their delivery workflow — including guardrails: what code is reviewed by humans, what is auto-merged, and how they handle hallucinated dependenciesAsk: “Do your engineers use AI coding assistants on client engagements? What guardrails govern that?” Either “yes with policy” or “no, by design” are acceptable. “Yes, freely” without policy is not.Unrestricted use of AI coding assistants with no review policy; cannot describe what their engineers do with AI-generated code before it reaches client infrastructure15%

Group 4: Industry Fit (5 criteria)

Industry fit is not about logos. A vendor can list healthcare clients and still not know what a claims adjudication cycle looks like in a data model. Depth matters more than breadth here.

CriterionDefinitionHow to verifyRed flagSuggested weight in group
4.1 Vertical case studies at your scaleDocumented delivery in your industry at a comparable data volume, team size, and platform complexity — not just “we’ve worked in financial services”Request two case studies with approximate data volumes, number of pipelines, and what happened when something went wrong. Call the reference client.Case studies are from five years ago, a different sub-vertical, or a project significantly smaller than yours30%
4.2 Regulatory familiarityWorking knowledge of the specific compliance regime governing your data: HIPAA (healthcare), PCI-DSS (payments), GDPR (EU data subjects), FINRA (broker-dealer), or sector-specific audit requirementsPresent a scenario: “We have PII from EU users flowing into our Snowflake warehouse. Walk me through how you’d design access controls and data retention.” Score the specificity of the answer.Regulatory knowledge described at the framework level only — “we’re familiar with GDPR” — without specific design decisions; or relies entirely on “the legal team handles compliance”25%
4.3 Domain glossary fluencyEngineers on the proposed team can speak the domain language without a translator: insurance loss ratios, fintech reconciliation cycles, healthcare NDC codes, retail sell-through rates — whatever applies to your businessTest this in the discovery call. Introduce two or three domain-specific terms and watch whether the proposed team engages with them or asks the business stakeholder to explainCannot define basic domain terms; engineers redirect domain questions to the account manager or subject matter expert rather than engaging directly20%
4.4 Client logos in your verticalNamed current or recent clients in your industry — not a list of industry categories — available for referenceAsk for the names of two current or recent clients in your vertical and request permission to contact them. A vendor who cannot provide a single named client in your industry has not earned the claim.A list of industry categories (“we serve healthcare, financial services, retail”) with no named accounts willing to speak as references15%
4.5 Industry-specific SLAsDemonstrated understanding that your industry imposes non-standard timing requirements: month-end close in finance, claims turnaround in insurance, data freshness for algorithmic trading, HIPAA breach notification windowsAsk: “Given our industry, what SLAs would you recommend for pipeline freshness and incident response?” A strong answer cites industry norms without prompting.SLA discussion defaults entirely to generic uptime metrics; no mention of domain-specific timing requirements10%

Group 5: Commercial Hygiene (5 criteria)

A vendor’s commercial terms reveal their risk posture and partnership philosophy faster than any technical discussion. Resistance to change-order caps or IP clarity is not negotiable friction — it is a signal.

CriterionDefinitionHow to verifyRed flagSuggested weight in group
5.1 Change-order rate capContract clause limiting the cumulative value of change orders without re-scoping — typically 10–15% of the original SOW value before a formal scope review is requiredReview the MSA/SOW draft. Ask: “What triggers a formal scope review?” If the answer is “when we decide it’s necessary,” the clause is absent.No cap on change orders; “we manage scope collaboratively” without a contractual floor; or change-order language that effectively allows open-ended billing25%
5.2 IP ownership clauseUnambiguous statement that the client owns 100% of all work product: custom code, data models, pipeline logic, documentation, and any derivative worksRead the IP clause directly — do not accept a verbal confirmation. The contract should use language like “all work product is work made for hire and is the exclusive property of Client.”Joint IP language; “vendor retains rights to methodologies and frameworks” without clear delineation from client deliverables; “subject to prior art” carve-outs that could encumber custom work25%
5.3 Milestone payments + holdbackPayment structure tied to delivery milestones with a 10% holdback released only upon successful acceptance testing and documentation handoverReview the SOW payment schedule. Confirm that the final payment tranche is contingent on documented acceptance criteria.Front-loaded payment schedule; no holdback; acceptance criteria defined by the vendor rather than the client; “net-30 on invoice” without milestone contingency20%
5.4 Off-ramp and transition clauseContractual right to exit for convenience plus an obligation on the vendor to provide transition assistance — knowledge transfer, handover documentation, and a minimum notice-period cooperation windowConfirm the termination-for-convenience clause exists and that it includes a cooperation period (typically 30–60 days) with specific transition deliverables.No termination-for-convenience clause; or exit clause exists but vendor has no transition cooperation obligation; or transition assistance is billable at standard rates with no cap20%
5.5 SLA penalty structureContractual penalties (service credits, payment deductions) tied to measurable SLA failures — pipeline availability, data freshness, incident response timeAsk for the SLA schedule and penalty table. Look for specific thresholds: “If P95 pipeline latency exceeds X minutes for more than Y hours in a calendar month, Vendor credits Z% of monthly fees.”SLA commitments without financial consequences; “we take SLAs seriously” without a penalty structure; or penalty caps so low they remove any incentive for vendor compliance10%

Group 6: Team & Talent (5 criteria)

Staffing risk is the most common source of project failure that evaluations fail to catch in time. The team you meet in the pitch is not always the team that shows up to build.

CriterionDefinitionHow to verifyRed flagSuggested weight in group
6.1 Named-team policySpecific named individuals assigned to your project — lead architect, lead engineer, and engagement manager — included in the SOW before signature, not “TBD based on availability”Require names in the SOW as a condition of contract. Ask each proposed person to participate in at least one technical session before signing.”We will confirm the team upon contract execution”; named architects in the proposal are partners who will not be doing delivery work; or the team changes between proposal and SOW30%
6.2 Key personnel substitution clauseContract clause requiring client approval for substitution of named key personnel — with a defined process (notice period, proposed replacement CV, client acceptance window)Review the key personnel clause in the MSA. Confirm it requires advance notice (30 days minimum) and client approval before substitution takes effect.No key personnel clause; or vendor retains unilateral right to substitute staff with only “equivalent experience” as the standard; or clause exists but approval is deemed granted if client doesn’t respond within 48 hours25%
6.3 Geographic mix and time-zone overlapDocumented delivery team geography with enough overlap with your operating hours to support daily collaboration — typically a minimum of 4 hours of working-hour overlapAsk for a team roster with locations and confirm which time zones have decision-making authority (not just execution capacity).All senior architects are onshore; all execution is offshore with a 1-hour overlap window; or time zone overlap is described in terms of “asynchronous collaboration works fine for us”20%
6.4 Retention rate of senior engineersAnnual retention rate for engineers at senior/staff level — a proxy for team stability and the likelihood that the people you evaluated will still be there at month sixAsk directly: “What is your annual retention rate for senior engineers over the past two years?” Acceptable answers include a specific percentage with context.Cannot provide a number; deflects to general “we have a great culture” language; or Glassdoor reviews show a pattern of project-level churn15%
6.5 Subcontractor disclosureFull disclosure of which project work will be subcontracted, to what entity, in what geography, and why — with client approval rights over subcontractor useAsk: “What percentage of this engagement do you expect to deliver via subcontractors, and will we have approval rights?” Require subcontractor disclosure in the SOW.Subcontractor use is described as standard practice without disclosure; or the vendor treats this as confidential; or subcontractors are described as “extended team members” to obscure the arrangement10%

Group 7: Continuity & Risk (5 criteria)

Continuity criteria are the ones organizations skip because they feel like low-probability events. They are also the ones that determine whether a vendor failure becomes a recoverable problem or a 12-month rebuild.

CriterionDefinitionHow to verifyRed flagSuggested weight in group
7.1 Financial stabilityConfidence that the vendor will exist and remain solvent through the project lifecycle — assessed through ownership structure (private/PE-backed/public), revenue run-rate transparency, and customer concentration riskAsk: “Who owns the business, and have there been any ownership changes in the last 24 months?” For PE-backed firms, ask about fund vintage and remaining hold period.Recent acquisition by a PE fund with a short expected hold; high dependence on one or two anchor clients who could leave; or vendor declines to discuss ownership or financial health30%
7.2 BCDR planDocumented Business Continuity and Disaster Recovery plan covering key personnel loss, office disruption, and cloud infrastructure failure — specific to how the vendor delivers data engineering workRequest the BCDR plan summary or ask for a walkthrough: “If your lead architect is unavailable for two weeks at a critical project phase, what is your response?”No documented BCDR plan; or BCDR documentation is generic (“we use cloud infrastructure”) without specifics on personnel succession and project continuity25%
7.3 Cybersecurity postureSOC 2 Type II certification, ISO 27001 certification, or equivalent — with a recent penetration test (within 12 months) and a disclosed vulnerability disclosure policyRequest the SOC 2 Type II report summary (not the full report — the summary is sufficient for procurement purposes). Confirm pen-test recency.SOC 2 Type I only (point-in-time audit, not continuous controls); certification pending but not yet achieved; pen test more than 24 months old; no vulnerability disclosure policy20%
7.4 Insurance coverageActive cyber liability insurance and professional E&O (Errors & Omissions) coverage with limits commensurate with the project value — typically at least $5M each for enterprise engagementsRequest certificates of insurance. Confirm coverage limits, policy expiry, and that your organization is named as an additional insured on the cyber policy.Coverage limits of $1M or less for a multi-million-dollar engagement; policy expires before project completion; vendor cannot produce a certificate of insurance promptly15%
7.5 Exit readinessEvidence that the vendor has successfully transferred knowledge and platform ownership to a client at the end of a prior engagement — demonstrated through documented runbooks, pipeline catalogs, and reference client confirmationAsk for a reference call specifically focused on the handover experience: what was delivered, was it sufficient, and what did the client need to build themselves.No reference available for a completed handover; or transition materials described as “standard documentation” without specifics; or vendor has no history of disengagements — only ongoing managed service relationships10%

How to measure each criterion: the source map

Every criterion has a measurement source. Knowing which source answers which question prevents evaluation theater — where vendors prepare polished responses to soft questions while the hard evidence never gets checked.

RFP Response Working Session Reference Call Public Signal Contract Review 1.1 Partner tier 1.3 Cortex/Mosaic AI 2.1 CI/CD for data 3.3 EU AI Act readiness 6.1 Named-team policy 7.5 Exit readiness 5.2 IP ownership 7.3 Cyber posture

The practical rule: if a criterion can only be answered by the vendor themselves (RFP response or working session), it needs a second source — either a public signal or a reference call — before it can be scored at full confidence.


What is the 3 vendor rule? (And why data engineering needs 4–6, not 3)

The “3 vendor rule” is procurement folk wisdom: ask for proposals from 7 vendors, bring 3 to final evaluation. The logic is clean — 3 gives you genuine competition without overwhelming the evaluation team.

For standard IT procurement, the rule is defensible. For data engineering vendor selection, it is too narrow.

Here is the problem: data engineering vendors differ on dimensions that only become visible when you have enough comparison points. Platform fidelity is not evenly distributed. A vendor who is a Snowflake Elite partner may have weak Databricks depth. A vendor with excellent delivery maturity may have no AI/ML readiness. A vendor with strong AI credentials may have thin commercial hygiene. Three vendors rarely give you enough triangulation to see these trade-offs clearly.

The empirical sweet spot, based on analysis of DataEngineeringCompanies.com’s 86-vendor scored dataset, is 4–6 vendors in serious evaluation:

  • 4 vendors: minimum for meaningful technical triangulation on platform fidelity
  • 5–6 vendors: appropriate for complex multi-platform or regulated-industry projects where risk criteria need broader comparison

The practical implication: your longlist should be 8–12. Your shortlist should be 4–6, not 3. The additional 1–3 vendors cost roughly 15% more evaluation time and provide disproportionately more signal on criteria where vendors cluster tightly.

The one exception: if you have an incumbent with a strong track record and the project is a natural extension of existing work, a 3-vendor final evaluation (incumbent + 2 challengers) is defensible. But for new platform builds or significant capability gaps, go to 4–6.


What are the 4 pillars of data engineering?

The four pillars of data engineering are:

  1. Collection — acquiring data from source systems (APIs, databases, event streams, files) with appropriate reliability, latency, and schema management
  2. Cleaning — transforming, validating, deduplicating, and enriching raw data to a state fit for downstream use
  3. Analysis-readiness — modeling data in structures (star schemas, OBT, lakehouses) that support the query patterns of the business and its tools
  4. Operationalisation — running pipelines in production with monitoring, alerting, SLA management, and the governance controls required by the business and its regulators

Each pillar maps directly to specific criteria in the 35-criterion framework:

PillarPrimary criteriaKey verification questions
Collection1.2 (named certified engineers), 2.3 (orchestration sophistication), 2.1 (CI/CD for data)Can they ingest from your specific sources reliably? What is their retry and dead-letter handling approach?
Cleaning2.5 (change-management process), 4.3 (domain glossary fluency), 3.4 (model governance)How do they manage schema evolution? What constitutes an acceptable data quality SLA?
Analysis-readiness1.3 (Cortex/Mosaic AI), 4.1 (vertical case studies), 1.5 (published reference architectures)Can they model for your business questions, not just generic star schemas? Have they built for your query volumes?
Operationalisation2.4 (observability tooling), 7.2 (BCDR), 7.5 (exit readiness), 5.5 (SLA penalty structure)What does “production-ready” mean to them? How do they handle incidents? What does handover look like?

The pillar framework is useful for scoping. Before you assign weights to the 35 criteria, confirm which pillars carry the most risk for your project. A greenfield platform build has roughly equal weight across all four. A migration project typically has disproportionate risk in Collection and Cleaning. An AI enablement project puts the heaviest weight on Analysis-readiness and — increasingly — Operationalisation of model pipelines.

For more on the evaluation process that surrounds these pillars, the how to evaluate data engineering vendors guide covers the full CIO decision workflow.


How should the 35 criteria be weighted for different project types?

The 35 criteria do not carry equal weight across all project types. The tables below show suggested group-level weights for three common archetypes. Criteria weights within each group stay constant (as shown in the group tables above) — only the group-level allocation shifts.

Archetype A: Net-new platform build — Greenfield Snowflake or Databricks platform, no existing modern infrastructure, organisation building from scratch.

Archetype B: Modernisation / migration — Moving from legacy warehouse, on-prem Hadoop, or first-generation cloud to a modern platform. Data exists; the goal is reliability, scalability, and reduced operational cost.

Archetype C: AI/ML enablement — Existing modern platform; the project’s goal is to add AI/ML capabilities: feature stores, model pipelines, Cortex/Mosaic AI integration, or agentic data workflows.

GroupArchetype A: New BuildArchetype B: MigrationArchetype C: AI/ML
G1: Platform Fidelity20%20%15%
G2: Delivery Maturity20%25%15%
G3: AI/ML Readiness15%5%30%
G4: Industry Fit10%15%10%
G5: Commercial Hygiene10%10%10%
G6: Team & Talent15%15%10%
G7: Continuity & Risk10%10%10%
Total100%100%100%

Notes on the weightings:

  • For Archetype B (migration), Delivery Maturity carries the highest weight because migration risk is predominantly an execution problem — cutover readiness, rollback discipline, and reconciliation rigor.
  • For Archetype C (AI/ML), Group 3 jumps to 30% — nearly double the next closest group. Vendors without hands-on warehouse-native AI experience should not be shortlisted for this work.
  • Group 5 (Commercial Hygiene) and Group 7 (Continuity & Risk) stay flat across all archetypes at 10% each. These are not project-type-dependent; they are baseline requirements regardless of what is being built.

For the mechanics of applying these group weights and running the scoring sessions, see the vendor scorecard template.


Frequently asked questions

How many vendor evaluation criteria should an RFP contain?

More criteria create noise, not signal. An RFP issued to vendors should not ask about all 35 criteria — it should ask about the ten to twelve most differentiated ones for your project type.

The full 35-criterion framework is for your internal evaluation, not the vendor document. Use the RFP to elicit evidence. Use the scorecard to assess it. Sending a 35-question RFP to vendors produces 35 narrative answers that are difficult to compare and incentivize length over clarity.

A well-structured RFP for data engineering typically contains 8–12 open-ended scenario questions targeting your highest-weight criteria, plus structured fields for certifications, team composition, and commercial assumptions. For process guidance, see RFP process best practices.

Should we share our weighted scorecard with the vendors?

No. Sharing the scorecard — including your weightings — before evaluation is complete allows vendors to optimize responses toward your scoring model rather than showing you their actual capabilities and approach. That defeats the purpose.

After selection, sharing the scorecard results with the winning vendor is useful: it establishes shared expectations and creates an accountability framework for the engagement. Sharing with unsuccessful vendors as part of a debrief is reasonable and professionally courteous.

What evaluation criteria changed most between 2024 and 2026?

Three changes are material:

Group 3 (AI/ML Readiness) did not exist as a discrete evaluation group in 2024. Criteria like LLM-on-warehouse experience, agent orchestration in production, and EU AI Act preparedness were either absent from frameworks entirely or buried as sub-criteria under “Innovation.” In 2026, they are a standalone group worth 15–30% of total weight depending on project type.

Observability maturity (criterion 2.4) moved from a tie-breaker to a baseline requirement. In 2024, hands-on experience with dedicated observability tools (Monte Carlo, Bigeye, Datafold) was a differentiator. In 2026, it is an expected competency. Vendors without it are missing a production-readiness standard that has become industry norm.

EU AI Act compliance (criterion 3.3) became a procurement criterion. Prior to the Act’s enforcement, AI governance was a soft discussion topic. With full compliance required by August 2026, any vendor operating in EU markets who cannot produce a documented AI risk management process is operating outside legal requirements. That is a disqualifying condition, not a scoring deduction.

For a broader view of what changed, the data engineering due diligence checklist covers the validation layer that sits alongside this evaluation framework.

Can a smaller boutique meet all 35 criteria?

Not all 35, but that is not the right test.

A boutique with 15–25 engineers will typically score lower on financial stability (7.1), BCDR plan depth (7.2), and subcontractor disclosure scope (6.5) — simply because those criteria assume a certain organizational scale. That is expected and should be weighted accordingly.

What smaller boutiques can and should score well on: named-team policy (6.1), platform fidelity at the individual-engineer level (1.2), orchestration sophistication (2.3), and domain glossary fluency (4.3). A 20-person Snowflake specialist with three SnowPro Advanced certified engineers and five reference clients in your vertical will outperform a 500-person generalist IT firm on the criteria that most predict project success.

The practical adjustment: for boutiques, raise the weight of Criteria 6.4 (retention rate) and 7.1 (financial stability) in your internal discussion, and add explicit questions about bench depth for key-personnel coverage. Then evaluate against the adjusted model — not a standard designed for large systems integrators.


Next step

The 35 criteria above define what to measure. The question of how to structure the process around them — from initial longlist to final recommendation — is covered in the data engineering partner selection hub.

If you are actively issuing an RFP, the discovery call question library gives you the verbal-format versions of the verification questions above. The POC scoping guide covers how to structure a paid proof of concept that tests criteria 2.1–2.5 and 3.1–3.2 in a controlled setting. For common RFP errors that cause evaluations to reach the wrong conclusion, see RFP mistakes to avoid.

If you want a matched shortlist of vetted vendors scored against these criteria, get matched or download the RFP checklist.


The 35-criterion checklist (print or copy as a 1-pager)

Group 1: Platform Fidelity

  1. 1.1 Partner tier confirmed in official vendor directory (Elite/Premier, not self-reported)
  2. 1.2 Named certified engineers verified by certification ID, not headcount claim
  3. 1.3 Cortex AI / Mosaic AI production case study with client reference
  4. 1.4 Open-source or SDK contributions verified on GitHub/dbt Hub with recent commits
  5. 1.5 Published reference architecture with author attribution, dated within 18 months

Group 2: Delivery Maturity 6. [ ] 2.1 CI/CD pipeline with automated test gates between dev and production environments 7. [ ] 2.2 IaC coverage via Terraform/Pulumi and dbt, committed to repository from project start 8. [ ] 2.3 Orchestration depth: specific Airflow/Dagster/Prefect design decisions explained, not just claimed 9. [ ] 2.4 Production experience with Monte Carlo, Bigeye, Anomalo, or Datafold 10. [ ] 2.5 PR template and deployment checklist confirmed; breaking-change process documented

Group 3: AI/ML Readiness 2026 11. [ ] 3.1 Articulated trade-off between Snowflake Cortex and Databricks Mosaic AI for your use case 12. [ ] 3.2 Agentic workflow in production (LangGraph, CrewAI, or equivalent) — not PoC only 13. [ ] 3.3 Documented EU AI Act compliance process with named responsible owner 14. [ ] 3.4 Model governance with PSI-based drift detection tied to upstream pipeline monitoring 15. [ ] 3.5 AI coding assistant policy documented: scope, guardrails, human review requirements

Group 4: Industry Fit 16. [ ] 4.1 Vertical case study with comparable scale — reference client available to speak 17. [ ] 4.2 Regulatory scenario answered with specific design decisions (HIPAA/GDPR/PCI-DSS/FINRA) 18. [ ] 4.3 Domain terminology engaged directly by proposed delivery team, not routed to SME 19. [ ] 4.4 Named client in your vertical willing to take a reference call 20. [ ] 4.5 Industry-specific SLA recommendations given unprompted during discussion

Group 5: Commercial Hygiene 21. [ ] 5.1 Change-order rate cap (≤15% of SOW value) in the contract 22. [ ] 5.2 IP ownership clause: 100% work-for-hire language, no joint-IP or methodology carve-outs 23. [ ] 5.3 Milestone payment schedule with 10% holdback on final acceptance 24. [ ] 5.4 Termination-for-convenience clause with 30–60 day vendor cooperation obligation 25. [ ] 5.5 SLA penalty table with specific credit percentages tied to measurable failure thresholds

Group 6: Team & Talent 26. [ ] 6.1 Named team (lead architect, lead engineer, engagement manager) in SOW before signing 27. [ ] 6.2 Key personnel substitution clause: 30-day notice + client approval required 28. [ ] 6.3 Geographic mix confirmed: minimum 4-hour working-hour overlap with your team 29. [ ] 6.4 Senior engineer retention rate: specific percentage provided for last 2 years 30. [ ] 6.5 Subcontractor use disclosed: percentage, entity, geography, client approval right confirmed

Group 7: Continuity & Risk 31. [ ] 7.1 Ownership structure confirmed: PE/private/public, no undisclosed change-of-control in last 24 months 32. [ ] 7.2 BCDR plan reviewed: personnel succession and project continuity are explicit, not implied 33. [ ] 7.3 SOC 2 Type II or ISO 27001 confirmed; pen test within 12 months 34. [ ] 7.4 Certificate of insurance: cyber liability + E&O at ≥$5M, client named as additional insured 35. [ ] 7.5 Handover reference: prior client confirms runbooks, pipeline catalog, and knowledge transfer quality

Peter Korpak · Chief Analyst & Founder

Data-driven market researcher with 20+ years in market research and 10+ years helping software agencies and IT organizations make evidence-based decisions. Former market research analyst at Aviva Investors and Credit Suisse.

Previously: Aviva Investors · Credit Suisse · Brainhub · 100Signals

Related Analysis