Few-shot font generation refers to the task of synthesizing font images with accurate content and consistent style based on a limited set of reference samples. Although existing font generation methods demonstrate competitive performance in modeling both style and content, they still face limitations in global-local collaborative processing of style features and in the precise encoding of font structures, making it challenging to meet the stringent requirements for style detail preservation and structural accuracy in few-shot scenarios. To address these issues, this paper proposes the DBML-Font framework, which is built upon a conditional diffusion model. The proposed Dual-Branch Multi-Level Feature Fusion Style Encoder (DMFF-SE) employs a parallel global-local dual-branch architecture to hierarchically extract style features. To integrate global and local style features, the Multi-Level Feature Fusion Block (MLFF-Block) establishes a nonlinear cross-layer fusion mechanism by combining a dual-attention strategy with the Cross-layer Inverted Residual Mixer (CLIRM), thereby enhancing the multi-level representation of global-local features. Furthermore, the proposed Geometric Structure Content Encoder (GeoStruct-CE) effectively captures the intrinsic geometric characteristics of the source font, thereby achieving efficient and precise encoding of glyph content. Experimental results demonstrate that DBML-Font consistently outperforms competing methods across multiple benchmark datasets, validating that the proposed dual-branch architecture and cross-layer fusion mechanism jointly enable effective modeling of global style consistency and local detail diversity.