Mohd Ibrahim Afridi Independent AI/ML Engineer & Entrepreneur

Verifiable AI Evaluation & Safety Human-Data Pipelines Production Infra

I build evidence-bound AI systems that don't just answer — they show their work. My portfolio includes retrieval-augmented answering with verification, prompt contracts with CI gates, intent resolution middleware, and freshness-aware routers. I also publish research on scalable, safety-aware architectures (Grok-3 / Grok-3+) and prototype human-feedback safety tooling.

About Me

Building the future of AI with safety and verification at its core

I run small businesses to stay sharp on ops and execution: XCL3NT (AI-aware commerce) and Velroy Eagle Ventures (trade/logistics). When code ships with dashboards, tests, and rollback buttons — that's my happy place.

Available For:

Remote roles in model behavior, evaluation, safety, and evidence-bound AI. Also open to select consulting and collaborations.

Hyderabad, India (Remote-ready; open to relocate)
+91-81217-66711
mohdibrahimaiml@outlook.com

Skills Snapshot

Comprehensive technical expertise across the AI/ML stack

AI/ML

PyTorch Transformers Sparse MoE FP8/BF16 RL/DQN RAG embeddings

Reasoning & Verification

Evidence graphs entailment/NLI Z3/SMT checks

Eval & Safety

H/T/H citation precision/coverage refusal & tone CoT verification

Backend & Infra

FastAPI/Flask Docker Kubernetes/Helm Prometheus/Grafana GitHub Actions

Frontend

React/Next.js Tailwind Streamlit/Gradio

Data & DB

PostgreSQL Pandas FAISS/BM25

Practices

Contracts in CI canary/shadow deploys observability-first design

Research Publications

Advancing the field of AI with published research and innovations

Grok-3: Architecture Beyond GPT-4

DOI: 10.5281/zenodo.15227014

Focus: Scalable architecture using Sparse MoE, FP8-optimized inference, memory efficiency, and hooks for formal safety (Z3/Lean).

Takeaway for industry: Better throughput/latency trade-offs with verifiability in mind; practical deployment considerations.

Grok-3+: Scalable, Safe & Energy-Optimized Architecture

DOI: 10.5281/zenodo.15341810

Focus: Extends Grok-3 with energy-aware deployment, hybrid precision (FP8/BF16), adaptive expert routing, and inline symbolic safety checks.

Takeaway: Deployment-first LLM design for edge/robotics and cost-sensitive workloads.

Dynamic Chains of Thought (D-CoT Reward Models) — Preprint

Zenodo Record

Focus: Reward modeling that scores both process and outcome, using critiques + verifier filtering to improve reasoning quality and latency.

Flagship Projects

Production-ready AI systems showcasing cutting-edge research implementation

Evidence-Bound Answering System

Retrieval → Answer → Verification

Retrieval → answer → verification, with inline citations and a UI that surfaces support/contradiction.

✓ FastAPI services (retriever/answer/verifier)
✓ Next.js UI
✓ Prometheus/Grafana dashboards
✓ nightly eval report (SWA/NER/CR/ECE)
✓ Docker/Helm for canary & rollback
AI Systems

Prompt Contracts + Fuzzing CI

for Answer Engines

Treat prompts like code: YAML contracts, stress packs, diffs and CI gates to block behavior regressions.

✓ Disambiguation & citation contracts
✓ local/no-key MVP
✓ TF-IDF retriever
✓ TruthLens stub
✓ example GitHub Action
Testing & CI

Proof-Answers

Proof-Carrying Answers

Each answer carries a machine-checkable evidence graph; a deterministic verifier validates sources and operations.

✓ Evidence Graph Language (EGL)
✓ proof planner stubs
✓ Streamlit demo
✓ tests/evals scaffolds
Verification

UIRE

Universal Intent Resolution Engine

Detect ambiguity, ask micro-clarifiers, apply policy, and return a structured intent + final prompt.

✓ FastAPI /v1/*
✓ HTML demo UI
✓ privacy (hashed IDs)
✓ rate limiting
✓ Docker/Helm, pytest CI
Intent Processing

Human-Guided Parametric-vs-Retrieval

Gating System

Decide per query: retrieve & cite, compute, clarify, or abstain — optimizing truthfulness/citations vs. latency/cost.

✓ Label → train → serve → eval loop
✓ Streamlit data studio
✓ FastAPI orchestrator
Routing

TRUTHLENS

Claim → Evidence, No Hallucination

Turn a claim into verbatim, cited evidence grouped as support / contradict / neutral.

✓ Wikipedia retriever
✓ MiniLM ranker
✓ NLI classifier (fallback heuristics)
✓ Gradio UI
✓ ready for HF Spaces
Fact Checking

Additional Projects

Comprehensive portfolio of AI/ML tools and research implementations

Model-behavior-writing-pack

Designer Proof Pack

Reviewer-first pack with taste rules, multilingual transcripts, ambiguity briefs, prereg/results CSVs, dashboards, and keep/iterate/kill gates.

Evaluation

Generative-Output-Comparison-Suite

H/T/H Eval Studio

Streamlit side-by-side evaluation of Helpfulness/Truthfulness/Harmlessness with Plotly dashboards and exports.

Evaluation

Human-Feedback-Safety-Simulator

RLHF + Safety Lab

Manage harm taxonomies, generate/label outputs, train preference models, and verify CoT; Streamlit UI with simulation mode.

Safety

Language-Model-Quality-Auditor

Human-in-the-Loop Eval

Flask app for structured ratings (helpfulness, correctness, coherence, empathy/tone, safety), JSON export, Postgres support.

Evaluation

LLM-PROMPT-CRAFTER

Streamlit Prompt Studio

Manage prompts with tone presets, YAML storage, optional GPT-4o integration; works offline with static templates.

Tools

DataLoaderSpeedrun

I/O Optimizations for PyTorch

Cached datasets, async prefetch, optional direct I/O, and a C++ zero-copy path; benchmark & Docker compose.

Performance

SparkETLPipeline

Flask UI for CSV ETL

One-click CSV ETL (filter nulls, write timestamped outputs) with a live UI; stubs for Spark, Great Expectations, Prefect.

Data Pipeline

K8sMultimodalInference

GPU BLIP on Kubernetes

Flask microservice for BLIP captioning with K8s manifests (Deployment/Service/HPA), Prometheus metrics, Grafana dashboard.

Infrastructure

Python Code Refactor

Streamlit Refactoring Tool

Clean up & refactor Python (PEP8, imports, AST hooks) with before/after diffs; batch mode & export.

Tools

BreezeMind-Pro

Privacy-First Productivity OS

Tasks, calendar, bills, errands, and ROI tracking with a local AI assistant (Ollama / Llama-3). Live demo noted in README.

Productivity

Career-Path-AI

DQN Career Recommender

Flask + PyTorch DQN app recommending career paths; learns from 1–5 star user feedback; Postgres backend.

AI/ML

Grok-3 (Research + Artifacts)

Research Implementation

Paper + notebooks (FP8 vs FP16, MoE routing, token-gen benchmarks, Z3 demo) and minimal inference script.

Research

Grok-3+ (Code + Paper)

Extended Architecture

Code scaffolding + demo UI + paper PDF for the energy- and safety-optimized extension of Grok-3.

Research

Business Ventures

Entrepreneurial experiences driving operational discipline

XCL3NT

Founder & CTO

AI-aware e-commerce operations for phone accessories. Building intelligent supply chain and customer experience systems.

View on Amazon

Velroy Eagle Ventures

Owner

Imports/exports and supply chain operations. Exploring innovative approaches to international trade and logistics optimization.

Get in Touch

Let's collaborate on the future of AI

Available For:

Remote roles in model behavior, evaluation, safety, and evidence-bound AI. Also open to select consulting and collaborations.

Location

Hyderabad, India (Remote-ready; open to relocate)