Benchmark

[Preview] Southeast Asia LLM Function call benchmark.

Matichon Maneegard

19 Apr 2024 — 2 min read

What is function call

Function call capability is crucial for creating agent-based systems. Function calls allow LLMs to use 'the tools' when 'the tools' are functions (methods) or APIs.

This capability increases the reliability of LLM applications when LLM applications MUST interact with non-LLM systems.

Method

I used a part of Gorilla-eval (https://github.com/ShishirPatil/gorilla/tree/main/eval) to evaluate all models. This evaluation method measures two metrics: functionality score and hallucination score.

First, the functionality score (higher is better) means the LLM responds in the right format and with the right parameters.

Second, the hallucination score (lower is better) means the LLM responds in the right format but with incorrect parameters.

If the LLM responds in an incorrect format, it would not be counted for any score.

And I am going to run the berkeley-function-call-leaderboard (https://github.com/ShishirPatil/gorilla/tree/main/berkeley-function-call-leaderboard) later.

Score

Model	Method	Task	Functionality score	Hallucination score
Typhoon-7b	get_llm_responses_retriever	torchhub	0.54%	71.51%
SeaLLM-7b-v2.5	get_llm_responses_retriever	torchhub	46.77%	23.66%
SeaLLM-7b-v2	get_llm_responses_retriever	torchhub	2.69%	0.54%
OTG-7b	get_llm_responses_retriever	torchhub	3.22%	20.43%
OTG-13b	get_llm_responses_retriever	torchhub	2.15%	12.90%
OTG-70b	get_llm_responses_retriever	torchhub	27.96%	0.27%
Mixtral-8x22b	get_llm_responses_retriever	torchhub	53.23%	19.89%
Haiku	get_llm_responses_retriever	torchhub	64.52%	20.97%
GPT-3.5-turbo-0125	get_llm_responses_retriever	torchhub	62.27%	13.98%

What's next

I am going to run the berkeley-function-call-leaderboard and continue to add new models to the list.

And I am going to make my benchmark script public on my GitHub. Stay tuned!!

P.S. My waiting list includes Sailor-7b, Vistral-7b, and Komodo-7b. If your model is not in our list, please contact me and send me the model card.

Self-Host LLM ใช้การ์ดจอใบไหนดี

ทีม Float16 ได้ทำการ Benchmark ตัวเลขออกมาและสรุปอย่างรวดเร็วได้ดังนี้ GPT-OSS 120B GPU Model Card Max Conccurent Min Concurrent H100 1 32 2 H100 2 128 16 B200 1 64 4 B200 2 256 32 PRO 6000 Blackwell 1 24 2 PRO 6000 Blackwell 2 96 16

Float16 @ Techsauce Global Summit 2025

Techsauce Global Summit 2025 has concluded on August 4-6, 2025, bringing together leading tech companies from Thailand and around the world to showcase their latest innovations and breakthroughs. Float16 participated in this event for the second consecutive year, and over these three days, we had numerous engaging conversations with interested

Float16 @ Techsauce Global Summit 2025

ผ่านไปแล้วกับงาน Techsauce Global Summit 2025 ในวันที่ 4-6 สิงหาคม 2025 ซึ่งเป็นงานที่รวบรวมบริษัท Tech ชั้นนำในไทยและต่างประเทศ มาออก Showcase นำเสนอผลงานและนวัตกรรมใหม่ๆ โดย Float16 ก็ได้เข้าร่วมงานนี้เป็นปีที่ 2 ซึ่ง 3 วันที่ผ่านมาก็มีทั้งคนเข้

Typhoon-OCR-7b พร้อมใช้แล้ว !!

Typhoon-OCR-7b สามารถใช้ผ่าน AI as a Service ของ Float16 ได้แล้ววันนี้ รายละเอียด Typhoon-OCR-7b Typhoon-OCR-7b เป็น Model จากทีม Typhoon (SCB10X) โดยเป็นการต่อยอดจาก Model Qwen-2.5-vl-7b Typhoon-OCR-7b มีประสิทธิภาพ OCR ได้ดีกว่า GPT-4o และ Gemini 2.5 ซึ่งสามารถนำไปใช้ได้อย่

What is function call

Method

Score

What's next

Read more

Self-Host LLM ใช้การ์ดจอใบไหนดี

Float16 @ Techsauce Global Summit 2025

Float16 @ Techsauce Global Summit 2025

Typhoon-OCR-7b พร้อมใช้แล้ว !!