Table of Contents - F.748.44 (03/2025) - Assessment criteria for foundation models – Benchmark
1 Scope 2 References 3 Definitions 3.1 Terms defined elsewhere 3.2 Terms defined in this Recommendation 4 Abbreviations and acronyms 5 Conventions 6 Overview of benchmark for foundation models 6.1 General 6.2 Testing capabilities 6.3 Testing datasets 6.4 Testing method 6.5 Testing tool 7 Requirements of foundation models 7.1 Understanding ability 7.2 Generation ability 7.3 Reasoning ability 7.4 Knowledge ability 7.5 Reliability 7.6 Robustness 8 Evaluation methods of foundation models 8.1 Automated evaluation 8.2 Manual evaluation Appendix I – Use cases of benchmark testing
|