Leveling the Playing Field for Edge AI Research Through High Quality Datasets

Published by EDGE AI FOUNDATION Datasets & Benchmarks Working Group:

Adam Fuks – NXP, Chair

Petrut Bogdan – Innatera

Vijay Janappa Reddi – Harvard University

Eiman Kanjo – Imperial College

Colby Banburry – Harvard University

Sam Al Attiyah – imagimob

Xianghui Wang – Renesas

Emil Jorgenson Njor – Technical University of Denmark

Introduction and the Challenge of Edge AI

As outlined in the document by the EDGE AI FOUNDATION Datasets & Benchmarks Working Group, the past decade has seen remarkable advancements in neural network (NN) techniques. These include innovative topologies, training methods, quantization-aware approaches, data augmentation, and model compression. This progress has significantly boosted fields like image recognition (powered by datasets such as ImageNet) and natural language processing (NLP) (driven by vast internet-scale corpora), enabling AI systems to rival and even surpass human performance in specific tasks. However, a critical challenge arises when deploying these increasingly complex AI systems on edge devices. Unlike cloud servers, edge devices operate under stringent constraints, including limited power, memory, and compute resources. They are often battery-powered and thermally limited, demanding smaller, more efficient models that maintain accuracy while fitting within tight energy and memory budgets.

This realm is what we refer to as “edge AI.” Success here hinges on developing tailored techniques and establishing robust benchmarking methods. Currently, a significant barrier exists: a fragmented landscape and a lack of standardized, high-quality datasets that accurately reflect real-world edge use cases. While numerous publications claim efficiency gains in “tiny” or edge ML, they often rely on simplistic “toy” examples that fail to translate to production-ready applications. What’s missing is a universally accepted, credible benchmark for comparing performance in realistic environments.

The EDGE AI FOUNDATION’s Response and Goals

The EDGE AI FOUNDATION Datasets & Benchmarks Working Group was formed to address this gap directly. The primary objective is to create a level playing field for tinyML and Edge AI research by providing the necessary infrastructure for effective benchmarking. As part of this group, you, Pete, are instrumental in driving these efforts forward.

Our key goals are threefold:

Curate Realistic, Appropriately Sized Datasets: We aim to develop and curate datasets that are both realistic and appropriately sized for edge devices. These datasets will be continuously expanded and refined through collaborative efforts with the community, ensuring they remain relevant and reflective of diverse, real-world scenarios.
Support Open Research into Performance Trade-offs: A critical aspect of edge AI development is understanding the trade-offs between various performance metrics such as power consumption, memory usage, and accuracy. We will provide datasets that enable open research and facilitate thorough evaluation of these trade-offs in the context of edge deployments.
Foster Shared Learning and Optimization: By establishing a public repository of datasets specifically tailored to edge AI and tinyML use cases, we aim to foster a culture of shared learning and optimization within the ecosystem. This repository will empower researchers, developers, and companies to confidently evaluate their models against real-world benchmarks and align their innovations with practical deployment requirements.

Importantly, the Working Group’s role is not to judge submission quality or conduct official benchmarks. Instead, we are focused on enabling honest, community-driven comparison by providing the necessary infrastructure. We are creating the tools and resources that will empower the community to drive innovation collaboratively.

Choosing Use Cases Thoughtfully

Selecting the right datasets and corresponding use cases is crucial for developing meaningful benchmarks. The edge AI community spans a broad range of applications, each with distinct technical requirements. To ensure our benchmarks are relevant and broadly applicable, we propose organizing use cases along several dimensions:

Real-time vs. Batched Processing: Distinguishing between tasks requiring instantaneous response (like fall detection) and those that benefit from batch analysis is critical.
Energy Constraints: Recognizing the significant impact of energy limitations, particularly for battery-powered devices like wearables and sensors, versus wall-powered devices.
Always-on Operation: Considering the unique challenges posed by applications that demand continuous inference, such as health monitoring or predictive maintenance.
Task Nature: Accounting for differences between classification tasks, regression tasks, and transformation tasks, each influencing model architecture and evaluation metrics.
Data Modality: Ensuring benchmarks reflect the specialized input types used in edge solutions, such as time-series data, images, or audio.

By mapping benchmarks to these categories, we aim to highlight a system’s actual capabilities in real-world scenarios, not just its performance on isolated, artificial tasks.

Improving Datasets Together

High-quality data is the bedrock of trustworthy machine learning. This includes not just training data, but also testing and validation data that accurately reflect the complexities of the real world. A poor test set can misrepresent a model’s performance, leading to inaccurate conclusions.

Therefore, the EDGE AI FOUNDATION is committed to creating and maintaining continually updated, diverse, and well-labeled datasets. For each dataset, we will:

Build on existing work: Enhancing and expanding upon proven datasets where possible.
Ensure variety: Capturing a wide range of real-world scenarios and edge cases.
Provide rich metadata: Ensuring accurate and comprehensive data labeling.
Evolve continuously: Regularly updating test sets to stay aligned with state-of-the-art models.

Our initial focus is on Visual Wake Words, with future expansions planned for other modalities. Each dataset will be vetted for its generalization across edge use cases and will be equipped with the necessary metadata for effective benchmarking.

Your Role and the Community’s Contribution

The success of this initiative hinges on the active participation of the Edge AI community, including your valuable contributions, Pete. We encourage contributions in several key areas:

Suggest datasets or base sets: Identify valuable starting points or areas for improvement.
Provide feedback: Offer insights on case coverage and diversity.
Contribute new test cases: Help create more realistic test scenarios.
Assist with labeling and annotations: Improve data quality and usability.
Expand edge case scenarios: Provide niche or underrepresented data.

Together, we can build a robust foundation that supports honest comparisons, accelerates development, and unlocks new possibilities for edge AI.

A Call to Action

We must move beyond toy benchmarks and embrace community-led, production-grade testing environments to truly advance edge AI and tinyML. We call on the EDGE AI FOUNDATION community to:

Share challenges: Help us prioritize use cases by highlighting the issues you’re facing.
Contribute datasets and evaluation techniques: Align your contributions with your organizational goals.
Collaborate on establishing optimization best practices: Ensure meaningful benchmarking methods.

Let’s collaborate to build a shared, open, and inclusive ecosystem that drives edge AI forward. Join us at joinus@edgeaifoundation.org and be part of this transformative journey.

Published by EDGE AI FOUNDATION Datasets & Benchmarks Working Group:

Adam Fuks – NXP, Chair

Petrut Bogdan – Innatera

Vijay Janappa Reddi – Harvard University

Eiman Kanjo – Imperial College

Colby Banburry – Harvard University

Sam Al Attiyah – imagimob

Xianghui Wang – Renesas

Emil Jorgenson Njor – Technical University of Denmark