About

Model Card Explorer is an open research platform built by Free Systems Lab to bring transparency to AI model governance and safety documentation.

What We Do

We systematically collect, version, and analyze the model cards and safety documentation published by major AI laboratories. Our platform extracts structured evaluation data from each card and enables cross-lab and cross-generation comparisons.

By tracking what AI companies disclose about their models' capabilities, limitations, and safety evaluations, we make it easier for researchers, policymakers, and the public to understand the state of AI governance.

Methodology

Safety Coverage Analysis

We collect policy documents from each AI lab (model cards, usage policies, responsible scaling frameworks, constitutional AI documents) and measure how well they cover 15 safety categories using semantic similarity.

  • 1.Documents are embedded using all-mpnet-base-v2, a 768-dimensional sentence embedding model
  • 2.Each safety category has a description that is also embedded
  • 3.Cosine similarity between document and category embeddings determines coverage strength
  • 4.Scores above 0.35 indicate meaningful coverage; above 0.70 indicates strong, dedicated policy

Eval Extraction

We use LLM-based extraction to identify every benchmark result, safety evaluation, and capability assessment reported in each model card. Extracted data is structured into a searchable, comparable format.

  • 1.Model card content is sent to Claude Sonnet for structured extraction
  • 2.Benchmark names, scores, variants, and metrics are normalized against a known benchmark registry
  • 3.Results are linked to model families and generations for cross-generation comparison

Data Sources

We track 6 frontier AI laboratories and collect their publicly available documentation:

Anthropic
OpenAI
Google DeepMind
Meta AI
Mistral AI
xAI

Limitations

  • 1.Document-level embeddings may dilute brief mentions of a topic within longer documents.
  • 2.The 0.35 cosine similarity threshold for coverage is a judgment call based on empirical validation.
  • 3.LLM-based eval extraction may miss or misinterpret results in unusual card formats.

Open Source

This project is fully open source. The code, data pipeline, and analysis methodology are available on GitHub. Contributions and feedback are welcome.