Najnowszy raport porównawczy: Raport porównawczy modeli LLM
The Evopy testing system provides comprehensive testing capabilities for both the text2python and python2text modules. It allows you to test different language models (LLMs) and compare their performance, accuracy, and capabilities.
flowchart TD
subgraph "Test Scripts"
test["test.sh\nSingle Model Testing"] --> model_selection["Model Selection"]
report["report.sh\nMulti-Model Comparison"] --> model_selection
report_debug["report_debug.sh\nDetailed Diagnostics"] --> model_selection
end
model_selection --> ollama["Ollama API"]
ollama --> available_models["Available Models"]
subgraph "Test Types"
query_tests["Basic Query Tests"]
correctness_tests["Correctness Tests"]
performance_tests["Performance Tests"]
end
available_models --> query_tests
available_models --> correctness_tests
available_models --> performance_tests
query_tests --> results["Test Results"]
correctness_tests --> results
performance_tests --> results
results --> report_generation["Report Generation"]
report_generation --> md["Markdown Report"]
report_generation --> html["HTML Report"]
report_generation --> pdf["PDF Report"]
md --> latest_link["Latest Report Link"]
test.sh - Single Model TestingThis script runs tests for a single model on all Evopy components.
./test.sh [--model=MODEL_NAME]
test_results directoryreport.sh - Multi-Model ComparisonThis script generates a comprehensive comparison report across multiple LLM models.
./report.sh [options]
--model=NAME Run tests only for the specified model
--format=FORMAT Report format: all, md, html, pdf (default: all)
--trend=DAYS Number of days for trend analysis (default: 30)
--compare=MODEL1,MODEL2 Compare only specified models
--metrics=METRICS Selected metrics for analysis (default: all)
--only-report Generate report without running tests
--help Display help information
To generate a comprehensive comparison report:
# Generate a standard report for all models
./report.sh
# Compare only specific models
./report.sh --compare=llama,bielik
# Generate only HTML report without running tests
./report.sh --format=html --only-report
# Analyze trends for the last 60 days
./report.sh --trend=60
1 3 5)all to test all available modelssequenceDiagram
participant User
participant ReportScript as report.sh
participant TestSystem as Test System
participant Ollama
participant ReportGen as Report Generator
User->>ReportScript: Execute ./report.sh
ReportScript->>Ollama: Query available models
Ollama-->>ReportScript: Return model list
ReportScript->>User: Display available models
User->>ReportScript: Select models to test
loop For each selected model
ReportScript->>TestSystem: Run tests with model
TestSystem->>Ollama: Execute queries
Ollama-->>TestSystem: Return responses
TestSystem-->>ReportScript: Save test results
end
ReportScript->>ReportGen: Generate comparison report
ReportGen->>ReportGen: Create MD, HTML, PDF
ReportGen-->>ReportScript: Update latest report link
ReportScript-->>User: Display report location
Evopy includes an automatic dependency repair system that detects and fixes missing imports in code executed in Docker containers.
flowchart LR
code["Python Code"] --> dependency_manager["dependency_manager.py"]
dependency_manager --> analysis["Code Analysis"]
analysis --> missing["Detect Missing Imports"]
missing --> repair["Auto-Repair Code"]
repair --> docker["docker_sandbox.py"]
docker --> execution["Code Execution"]
subgraph "Auto-Import Mechanism"
std_modules["Standard Modules"]
dynamic_import["Dynamic Import"]
end
execution --> error{"Error?"}
error -->|Yes| auto_import["Auto-Import"]
error -->|No| results["Results"]
auto_import --> std_modules
auto_import --> dynamic_import
std_modules --> execution
dynamic_import --> execution
Wait for all tests to complete. This may take some time depending on the number of models selected.
The report will be generated in the reports directory with a filename like comparison_report_YYYYMMDD_HHMMSS.md.
The enhanced report includes:
Executive Summary: Overview of the best performing models and key findings
Evopy generates reports in multiple formats:
The primary report format with full formatting and links to all test results.
An HTML version of the report is automatically generated for web viewing. To ensure proper rendering of Mermaid diagrams in HTML, the system uses the following approach:
<script src="https://cdn.jsdelivr.net/npm/mermaid/dist/mermaid.min.js"></script>
<script>
mermaid.initialize({ startOnLoad: true, theme: 'default' });
</script>
A PDF version is generated using wkhtmltopdf with landscape orientation for better readability of tables and diagrams.
For the best experience viewing reports with Mermaid diagrams:
The HTML version is recommended for the best interactive experience with diagrams.
If a model isn’t available in your Ollama installation:
If you encounter permission issues with the .env file:
chmod u+w config/.envIf the report doesn’t include all models:
The system supports the following models:
Only models available in your Ollama installation will be listed for testing.
To add new test cases or modify existing ones:
test_queries.py for basic query teststests/correctness/ for correctness teststests/performance/ for performance testsEvopy now supports generating reports in multiple formats using the generate_report.py script:
generate_report.py - Multi-Format Report GeneratorThis script generates comparison reports in multiple formats from existing test results.
python generate_report.py [--format=all|md|html|pdf] [--input=<results_dir>] [--output=<output_dir>]
To use all report formats, you’ll need:
pandoc - For HTML conversionwkhtmltopdf - For PDF generationInstall these dependencies with:
sudo apt-get install pandoc wkhtmltopdf
Reports can be generated and viewed in multiple formats:
# Generate only markdown report
python generate_report.py --format=md
# View in terminal
less reports/comparison_report_YYYYMMDD_HHMMSS.md
# or with a markdown viewer
glow reports/comparison_report_YYYYMMDD_HHMMSS.md
# Generate only HTML report
python generate_report.py --format=html
# Open in web browser
xdg-open reports/comparison_report_YYYYMMDD_HHMMSS.html
# Generate only PDF report
python generate_report.py --format=pdf
# Open in PDF viewer
xdg-open reports/comparison_report_YYYYMMDD_HHMMSS.pdf
# Generate reports in all formats
python generate_report.py --format=all