

This study explores the adaptation of Large Language Models (LLMs) for discrete classification tasks by leveraging native token probability distributions instead of traditional architectural modifications, such as custom classification heads. By mapping class labels to specific vocabulary tokens, the authors demonstrate that standard supervised fine-tuning (SFT) implicitly calibrates class probabilities. Empirical validation using a Qwen3-4B model on the AG News dataset reveals that fine-tuning naturally concentrates probability mass on target tokens, rendering explicit renormalization across the vocabulary unnecessary. The results establish a cost-effective (~$2) and scalable methodology for deploying well-calibrated classifiers within existing LLM infrastructures, maintaining compatibility with standard inference stacks while ensuring high accuracy and reliable confidence estimates.