Good shape overall. A few tweaks would push it into the top tier.

Evaluate and improve models and agents using environments

Documentation

91

Contributing guide5pt63

Contributing guide is too short for full depth credit (−6 pts). 400+ words earns the full +12 pts.

Add setup instructions, code style notes, and how to run tests.

Install and run instructions9pt90

README documents how to install the project.

README12pt100

README is present.

License6pt100

Licensed under Apache-2.0.

Engineering

72

Linting and formatting5pt0

No linter or formatter config found.

Add a linter config such as .eslintrc.json, .prettierrc, ruff.toml, or .golangci.yml to enforce consistent code style.

CI/CD14pt72

CI is configured (.github/workflows/_build_container.yml).

Tests18pt80

Test files detected (benchmarks/hotpotqa_closedbook/tests).

Reproducibility6pt80

Lockfile present (uv.lock). Installs are reproducible.

Issue and PR templates6pt100

Issue or PR templates present.

Project health

100

Dependency manifest6pt100

Dependency manifest found (pyproject.toml).

Repository metadata5pt100

Repository has a description.

Activity5pt100

Actively maintained (pushed within the last month).

Housekeeping3pt100

.gitignore present.

Repository health signals

Activity, community, and responsiveness at scan time

Activity

  • Commits (30d / 90d)
  • 195
    Forks
  • 5
    Releaseslatest 20d ago

Community

  • Community health
  • authors own >50% of commits
  • 996
    Watchers

Responsiveness

  • 17d 4h
    Median issue response
  • 3h
    Median PR merge time
  • 446
    Open issues
Repository files32 root entries
  • .claude
  • .codex
  • .github
    Good: CI is configured (.github/workflows/_build_container.yml).
    Good: Issue or PR templates present.
  • benchmarks
    Good: Test files detected (benchmarks/hotpotqa_closedbook/tests).
  • cache
  • data
  • docs
  • environments
  • example_environments
  • fern
  • nemo_gym
  • resources_servers
  • responses_api_agents
  • responses_api_models
  • results
  • scripts
  • tests
  • .dockerignore
  • .gitignore
    Good: .gitignore present.
  • .pre-commit-config.yaml
  • .python-version
    Good: Environment pinned via .python-version.
  • ATTRIBUTIONS.md
  • CLAUDE.md
  • CODE_OF_CONDUCT.md
    Good: Code of conduct present.
  • codecov.yml
  • CONTRIBUTING.md
    Issue: Contributing guide is too short for full depth credit (−6 pts). 400+ words earns the full +12 pts.Fix: Add setup instructions, code style notes, and how to run tests.
    Issue: Contributing guide lacks a setup section (−12 pts).Fix: Show new contributors how to get a local dev environment running.
    Issue: Contributing guide lacks a code style section (−8 pts).Fix: Describe your linting/formatting rules and how to run them.
    Issue: Contributing guide lacks a testing section (−8 pts).Fix: Show contributors how to run the test suite (e.g. npm test, pytest, cargo test).
    Issue: Contributing guide lacks a PR workflow section (−8 pts).Fix: Explain how to fork, branch, and open a pull request so contributors know what to expect.
    Good: Contributing guide includes code examples.
  • LICENSE
    Good: Licensed under Apache-2.0.
  • Makefile
  • pyproject.toml
    Good: Dependency manifest found (pyproject.toml).
  • README.md
    Good: README is present.
    Good: README is well structured with multiple sections.
    Good: README includes screenshots or visuals. Great for first impressions.
    Good: README has code examples.
    Good: README links to a live demo or deployed app.
    Good: README includes status badges.
    Good: README documents how to install the project.
    Good: README documents how to run the project.
  • SECURITY.md
    Good: Security policy present.
  • uv.lock
    Good: Lockfile present (uv.lock). Installs are reproducible.