English

Serverless GPU, GPU Grants and Much More 🤯

Matichon Maneegard

20 May 2025 — 3 min read

Hi Everyone. Mati is Here 👋👋

It have been a while since latest update. (One-click deployment)

Today, I have a very big update about Float16.cloud.

Serverless GPU

Firstly, we are proud to announce our "Serverless GPU" service, powered by H100.

Key Features

Zero code changes required—say goodbye to Docker images 👋
The world's fastest cold start, under 100ms
Deployment mode for AI inference (Please see the examples)
Spot mode for AI training

The Main Differentiator Between Our Serverless GPU and Others

Design Principle, We designed our Serverless GPU to be compatible with traditional server scripts, such as FastAPI server scripts.

Developers don't need to change their code to use our Serverless GPU.

In addition to zero code changes, you can say goodbye to Docker images. While some serverless GPU platforms also offer zero code changes, they require you to ship the Docker image to their server. This process can be very cumbersome and slow down deployment speed. We recognized this problem and have already eliminated this trade-off.

Our service requires only the script and a requirements.txt file to perform the tasks for you. The secret to this process is our base environment, which comes pre-built with the necessary libraries and dependencies, such as:

Transformer Engine (for NVIDIA NeMo and mixed-precision training)
PyTorch-based NVIDIA NGC
NVIDIA Triton Inference Server, NVIDIA Rapids Stack, NVIDIA Curator
Transformers (Hugging Face)
LlamaCPP, ExllamaV2, OpenCV

This setup provides an experience similar to using Google Colab in terms of pre-built dependencies.

Serverless GPU Examples

Examples are essential for getting started with any service. We have created a public GitHub repository to demonstrate how to use our service.

In addition to our "official" examples, we welcome "contributors" who want to demonstrate how to use our Serverless GPU. We also offer GPU Grants for "contributors" (see next section).

GPU Grants

We aim to accelerate impact within the ecosystem, benefiting both researchers and engineers.

We have decided to announce a Request for Grants (RFG) to provide GPU credits for "ANY" type of research. This also covers developers who want to create blog posts, content, benchmarks, etc.

Discords link : https://discord.gg/j2DVTMjr67

Call for Partners

We are looking for partners in 3 categories:

Consultant Agencies

How we can help:

Use cases
Reference architecture
Demo tools

Software Houses and System Integrators (SI)

How we can help:

GPU credits
Reference architecture
Developer relations

Learning Platforms

How we can help:

Matching discounts
Course outlines
Domain expertise

Please direct contact to me : matichon[dot]man[at]float16[dot]cloud

Trust center

We already accomplish SOC2 and Trust Center with Vanta.

Contact Float16

Medium : Float16.cloud
Facebook : Float16.cloud
X : Float16.cloud
Discord : Float16.cloud
Youtube : Float16.cloud

AI Bootcamp: LLM Finetuning & Deployment

On Friday, July 4th, 2025, Float16 in collaboration with the Typhoon SCB 10X team organized the AI Bootcamp: LLM Finetuning & Deployment at DistrictX, FYI Building. This event marked a significant milestone in promoting AI technology development in Thailand. The bootcamp received overwhelming interest and was successfully completed beyond expectations.

AI Bootcamp: LLM Finetuning & Deployment

เมื่อวันศุกร์ที่ 4 กรกฎาคม 2025 ที่ผ่านมา Float16 ร่วมกับทีม Typhoon SCB 10X จัดงาน AI Bootcamp: LLM Finetuning & Deployment ขึ้นที่ DistrictX ตึก FYI ซึ่งถือเป็นก้าวสำคัญในการส่งเสริมการพัฒนาเทคโนโลยี AI ในประเทศไทย งานนี้ได้รับความสนใจอย่างล้นหลาม

LLM Arena: No More Guessing Games When Choosing AI Models

หลายคนคงเจอปัญหาเดียวกับเรา ตอนที่ต้องเลือก LLM model มาใช้งาน ไม่รู้ว่าควรเลือก model ไหนดี อ่านสเปคก็ดูเหมือนจะดีทุกตัว แต่พอไปใช้งานจริงไม่ตอบโจทย์งานนั้น ๆ เลยคิดว่าทำไมเราไม่สร้างตัวช่วยขึ้นมาล่ะ เอาโมเดลหลายๆ ตัวมาเปรียบเทียบกันแบบเห็

GPU monitoring dashboard

บทความนี้ผมจะพาทุกคนมาเรียนรู้การทำ monitoring dashboard ของ GPU ด้วย grafana กันนะครับ โดยจะเริ่มกันตั้งแต่วิธีการติดตั้ง grafana จนไปถึงการตั้งค่าให้รับค่าการทำงานจาก gpu โดยใช้ dcgm-exporter ผ่าน prometheous จนสามารถสร้างเป็น dashboard ที่ดูการทำงานต่างๆของ GPU ได้ และทั้งหมดเราจะทำการ

Serverless GPU

The Main Differentiator Between Our Serverless GPU and Others

Serverless GPU Examples

GPU Grants

Call for Partners

Consultant Agencies

Software Houses and System Integrators (SI)

Learning Platforms

Trust center

Contact Float16

Read more

AI Bootcamp: LLM Finetuning & Deployment

AI Bootcamp: LLM Finetuning & Deployment

LLM Arena: No More Guessing Games When Choosing AI Models

GPU monitoring dashboard