Banner

PAK MING CHEUNG

Head of Quants · AI-Native HFT · Institutional Trading Systems Architect

school MSc Computer Science, AI & Data Science (Merit)
location_on London, United Kingdom
phone +44 7920 800830
language www.PakCV.com
person LinkedIn
event_available Book a 30-min chat
Philip Cheung
person

Professional Summary

Head of Quants at Quant Sigma (London), running institutional-grade HFT operations across Forex, CFDs, indices and cryptocurrency. MSc Computer Science (AI & Data Science, Merit) from the University of Wolverhampton, with a dissertation on LLM-Augmented High-Frequency Trading, fusing transformer-based market understanding with sub-millisecond execution.

Proven track record building and operating AI-native trading infrastructure: multi-provider LLM routing across OpenAI, Anthropic Claude, Google Gemini, DeepSeek, Qwen, OpenRouter, NVIDIA and Cloudflare, benchmarked with TTFT p50/p95 analytics for latency-critical alpha research; production RAG systems with policy guardrails and human-review escalation for regulated decision flows; distributed ML inference pipelines serving millions of predictions daily at sub-100ms latency; transformer-based predictive models, reinforcement-learning execution, and full-stack platform delivery across AWS, GCP and Cloudflare.

Led the design, training and deployment of our in-house Qorinix LLM, a domain-tuned language model engineered for trading-grade decision support. Qorinix delivers 5x to 60x faster TTFT and response latency than mainstream frontier LLMs on comparable workloads, enabling sub-millisecond alpha annotation, guardrailed reasoning and evidence-packed signal generation fully inside the firm's own runtime.

Over a decade of systematic trading, scaled capital 20x+ with ~100% annualised returns through disciplined volatility-based sizing and cross-asset arbitrage. Securities & Futures Commission of Hong Kong Exams (Papers 1/7/8/12). Harvard CS50x, Google MLOps and Gemini Certified Educator. The rare engineer who builds institutional-grade HFT infrastructure and production LLM runtimes from the same keyboard.

emoji_events

Notable Achievements

trending_up Systematic Trading (2013 – Present)

  • Scaled trading capital 20x+ over a decade of disciplined systematic trading
  • ~100% annualised returns; controlled drawdowns via volatility-based sizing
  • Forex, CFDs, indices, crypto, derivatives · Early Bitcoin investor since $80 (2013)

hub LLM Infrastructure & AI Platforms

  • Built LLM Arena multi-provider fast-inference benchmark with TTFT p50/p95 analytics
  • Shipped audit-grade AI decision runtime with policy engine, guardrails & human-review gate
  • Production LLM routing at sub-100ms TTFT, automatic fail-over & cost ledger

engineering Technical Leadership

  • Led tech transformation from traditional to AI-augmented trading
  • Architected distributed ML inference pipelines serving 5M+ daily predictions
  • Mentored 6+ engineers · multi-cloud (AWS · GCP · Cloudflare) with finance-compliance rigour

school Research & Publications

  • MSc dissertation: LLM-Augmented High-Frequency Trading, novel LLM×HFT fusion
  • 65% Sharpe-ratio improvement over baseline achieved in MSc research
  • Investment-grade quant performance reports for family-office mandates
psychology

Core Technical Competencies

smart_toy LLM & Generative AI
GPT-5 Claude Gemini OpenRouter DeepSeek Qwen NVIDIA NIM LangChain RAG Hugging Face
hub AI Platform Architecture
LLM Routing Multi-Provider Fallback Prompt Registry Policy Engine Guardrails State Machines Human-Review Gate Audit Ledger Evidence Pack TTFT p50/p95
trending_up Quantitative Trading & HFT
HFT Market Making FIX API iLink CME Aurora Equinix LD4 Colocation QuantConnect MT4/MT5 CCXT Transformer Alpha
memory ML & Deep Learning
PyTorch TensorFlow Transformers XGBoost LightGBM LSTM/GRU RL MLOps A/B Testing
code Programming & Frameworks
Python C++ TypeScript Rust Next.js 14/15 React 18/19 FastAPI Node.js MQL5
storage Data & Vector Stores
PostgreSQL TimescaleDB Redis MongoDB InfluxDB Pinecone Weaviate pgvector
cloud Cloud & DevOps
AWS GCP Cloudflare Docker Kubernetes CI/CD Vercel Wrangler OpenTelemetry
currency_bitcoin Blockchain & Billing
DeFi Binance Bybit Glassnode Solidity Arbitrage Stripe Entitlements Usage Ledger
Unified LLM × HFT Tech Stack
work

Professional Experience

Head of Quants · AI-Native HFT & Systems Architect

Quant Sigma, London
location_on London, UK calendar_today December 2023 – Present

Leading quantitative research, AI-powered trading systems and execution infrastructure for institutional-grade HFT across traditional and crypto markets. Built an AI-native alpha-research and decision-runtime stack alongside the core low-latency execution platform.

Quantitative Strategy & Research
  • Multi-signal HFT models across Forex, CFDs, indices and crypto: trend following, mean reversion, volatility breakout, queue-based market making, XGBoost / LightGBM ensembles for microstructure signal blending
  • Implemented transformer-based predictive models for market-direction forecasting, material Sharpe-ratio uplift over baseline
  • Reinforcement-learning execution for adaptive position sizing and inventory control
  • Achieved ~100% annualised returns via volatility-based sizing, regime switching and scenario stress tests
Low-Latency Trading Infrastructure
  • Low-latency execution infra with millisecond performance, colocation at Equinix LD4, LD6 & AWS London region; real-time WebSocket market-data pipeline with deterministic ingestion
  • Distributed ML inference pipeline: <100ms latency for real-time signal generation
  • Event-driven order-book reconstruction with lock-free ring buffers and kernel-bypass networking; sub-microsecond jitter budget on the hot path
  • Backtesting framework: 5+ years tick-level data, transaction-cost modelling, walk-forward optimisation; real-time portfolio monitoring, volatility targeting, drawdown kill-switches
LLM & AI Platform Engineering
  • Built an internal multi-provider LLM routing layer with cost / latency / risk policy and automatic fail-over across OpenAI, Anthropic, Google Gemini, DeepSeek, Qwen, OpenRouter, NVIDIA and Cloudflare
  • Designed a prompt registry with version pinning, policy prompts and evidence-pack assembly for alpha research and decision support
  • Implemented a guardrail & rules engine with risk scoring and human-review escalation for regulated-grade decision workflows
  • Built the LLM Arena multi-provider fast-inference benchmark (TTFT p50/p95 analytics) and an audit ledger tying every AI call to workflow, policy, cost and support lineage
  • NLP sentiment engine ingesting 500K+ articles / social / alt-data daily; RAG over proprietary research and earnings corpora with grounded citation
  • MLOps discipline: model registry, evaluation suites, canary deployment, drift monitoring, per-strategy cost ledger
Qorinix LLM (In-House Frontier Model)
  • Led end-to-end design, training and deployment of the in-house Qorinix LLM, a domain-tuned language model for trading-grade decision support
  • Curated a proprietary trading-intelligence corpus (multi-billion tokens) spanning order-book transcripts, earnings calls, research notes and regulatory filings; mixture-of-experts routing with task-specialised heads for alpha reasoning, risk critique and compliance review
  • Achieved 5x to 60x faster TTFT and response latency vs mainstream frontier LLMs on comparable workloads through custom inference kernels, speculative decoding and trading-corpus fine-tuning
  • Integrated Qorinix into the firm's runtime for sub-millisecond alpha annotation, guardrailed reasoning and evidence-packed signal generation fully inside the firm's own perimeter
PythonC++PyTorchTensorFlowFastAPIPostgreSQLTimescaleDBRedisDockerAWSQuantConnectMT5CCXTFIXLLM RouterRAGLangChainOpenTelemetry

Lead AI Engineer & Technical Architect · Quant & Algorithmic Trader

Pacific Cloud Computing Ltd.
location_on Hong Kong & Remote UK calendar_today January 2015 – December 2024

Dual-track: spearheaded AI transformation (Dec 2021 – Dec 2024) establishing the firm's AI/ML practice and intelligent systems processing millions of predictions daily; and throughout the full tenure ran a personal systematic-trading programme generating institutional-grade returns.

AI / ML Platform Leadership
  • Designed and deployed distributed ML inference system achieving <100ms latency for real-time predictions, serving 5M+ requests / day
  • Built comprehensive RAG system (LangChain + vector DB) reducing information retrieval time by 85% while maintaining 94% accuracy
  • Developed end-to-end MLOps pipeline with automated retraining + A/B testing, improving model performance 40% Q-over-Q
  • Implemented transformer-based sentiment analysis processing 500K+ documents daily for market intelligence
  • Established AI/ML best practices, conducted architecture reviews and mentored a team of 6 engineers
Systematic & Algorithmic Trading
  • Achieved Sharpe 1.5+ (Forex/CFDs) and 2–6 (crypto); generated 20x+ capital growth via strategic directional trades, arbitrage and volatility harvesting
  • Python-based predictive models and portfolio optimisation with rigorous volatility-based risk sizing
  • Backtesting with MT4/MT5 + Python automation + AI-driven feature discovery
  • Comprehensive blockchain / cryptocurrency expertise: DeFi protocols, yield farming, liquidity provision, cross-chain arbitrage
  • Authored investment-grade quantitative performance reports for family-office investments
Enterprise Platforms (2015 – 2021)
  • Enterprise Document Management System: multi-tenant SaaS, 100+ clients, 1M+ daily API requests, 99.9% uptime
  • Analytics Dashboard Platform: real-time visualisation, 10GB+ daily data, 60% reduction in report generation time
  • E-commerce Integration Suite: multi-platform API layer, automated inventory sync, 5+ payment-gateway integrations
PythonPyTorchLangChainPineconeFastAPIReactNode.jsMongoDBPostgreSQLDockerAWSMT4/MT5Binance APIKraken APIGlassnode

Senior Product Manager (Technical) / Web Manager

Groupon.com
location_on Hong Kong calendar_today April 2013 – December 2014
  • Led strategic account management and coordinated with international merchants for brand positioning
  • Developed and maintained project plans, cost estimation and resource allocation
  • Oversaw inventory management, supply chain coordination, and pricing structure
  • Conducted comprehensive market analysis to support business operations

Senior Operations Manager

SoManyCall Telecom
location_on Hong Kong calendar_today March 2008 – April 2013
  • Oversaw strategic and operational aspects within the telecommunications software sector
  • Achieved 25% annual growth rate through strategic leadership
  • Led multi-disciplinary team, implementing development programs
  • Managed entire project lifecycle for custom software solutions
rocket_launch

Research & Selected Projects

LLM Arena Benchmark
zoom_in Expand
LLM Arena, Multi-Provider Fast-Inference Benchmark LLM Infra

Side-by-side real-time benchmark across OpenAI, Anthropic Claude, Google Gemini, DeepSeek, Qwen, OpenRouter, NVIDIA and Cloudflare, streaming token-by-token with live comparison charts. Tracks TTFT, total time, tokens/sec with p50/p95 analytics, model-cost ledger and per-prompt leaderboard. Edge deployment on Cloudflare Pages; Workers-backed API proxy with per-user key isolation. Next.js 14 · TypeScript · streaming-SSE · D1.

Audit-grade Decision Loop
zoom_in Expand
Audit-Grade AI Decision Loop Governance

Closed-loop flow: policy check → evidence-pack retrieval → prompt control → model router → action runtime → audit ledger → policy feedback. Every call versioned, costed, support-traceable and replayable, applied to trading research & risk sign-off. Human-in-the-loop escalation for regulated decisions, tamper-evident audit trail with content-addressed evidence packs, and per-strategy cost attribution.

Sub-100ms Inference Mesh
zoom_in Expand
Sub-100ms Inference Mesh ML Infra

Distributed ML inference pipeline serving millions of predictions / day at <100ms TTFT across signal generation, sentiment scoring and anomaly detection. Shard-aware request routing, warm-pool autoscaling, ONNX-compiled models with fp16 quantisation, and cache-through Redis tiering. Multi-region active-active deployment with health-weighted failover. FastAPI · Redis · TimescaleDB · Triton.

Trading Ops
zoom_in Expand
Real-Time Risk & P&L Dashboard Private

Live portfolio monitoring with volatility targeting, drawdown kill-switches, per-strategy P&L attribution and cost-ledger reconciliation. Tick-level exposure dashboard across venues, automated breach alerts, scenario stress tests and VaR/ES by strategy with margin projection. Heatmaps for correlation drift and regime-change signals. FastAPI · React · TimescaleDB · WebSocket.

Quant Backtest Framework
zoom_in Expand
Quantitative Backtesting Framework Quant

Multi-asset backtesting engine with 5+ years tick-level data, transaction-cost modelling, slippage curves and walk-forward optimisation. Parallel parameter-sweeps across Ray workers, regime-segmented attribution, Monte-Carlo bootstrap on trade returns and overfit-aware model selection. Deterministic replays and hash-pinned data snapshots for audit. Python · QuantConnect · Ray · Parquet.

Ethereum Prediction Ensemble
zoom_in Expand
Ethereum Price Prediction Ensemble AI/ML

LSTM · GRU · XGBoost stacked ensemble over 50+ technical indicators plus on-chain features (exchange flows, gas trends, active addresses). 78% directional accuracy on out-of-sample data; confidence-gated signal release and automatic retraining on regime-drift detection. Notebook-to-production pipeline with shadow trading prior to capital allocation.

school

Education

psychology
MSc Computer Science, AI & Data Science
University of Wolverhampton, UK
2023 – 2025 | Grade: Merit
Dissertation: LLM-Augmented High-Frequency Trading Strategy Development
Coursework: Deep Learning, NLP, Machine Learning, Neural Networks, Distributed Systems, MLOps
account_balance
Bachelor of Business Administration
Hong Kong University of Science & Technology
1995
Marketing with Information Systems minor
verified

Certifications

auto_awesome
Gemini Certified Educator
Google | 2025–2028
account_balance
HK Securities License (1,7,8,12)
HK SFC / HKSI | 2021
terminal
CS50x Computer Science
Harvard | 2024
cloud
MLOps Specialization
Google Cloud | 2024
translate

Languages

English
Native proficiency
Cantonese
Native proficiency
Mandarin
Professional working proficiency