MarkTechPost
Every summary, chronological. Filter by category, tag, or source from the rail.
VibeThinker-3B: High-Performance Reasoning at 3B Parameters
VibeThinker-3B is a compact, open-source reasoning model that achieves performance comparable to massive models on math and coding tasks by using a specialized 'Spectrum-to-Signal' post-training pipeline.
SpatialClaw: Using Code as an Action Interface for Spatial Reasoning
SpatialClaw is a training-free agent framework that improves spatial reasoning in VLMs by treating Python code—rather than structured tool calls—as the primary interface for perception and geometric tasks.
Building End-to-End Forecasting Pipelines with TimeCopilot
TimeCopilot provides a unified interface for forecasting that integrates statistical models, foundation models, anomaly detection, and LLM-driven interpretation into a single workflow.
Building Reliable AI Code Generation Pipelines with Salesforce CodeGen
To move AI-generated code from prototype to production, implement a multi-stage pipeline that includes automated unit testing, safety sandboxing, and model-based reranking to filter out hallucinated or insecure outputs.
Liquid AI's New 350M Multilingual Retrieval Models
Liquid AI has released LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M, two efficient, bidirectional retrieval models optimized for multilingual search across 11 languages.
Perplexity Brain: Self-Improving Memory for AI Agents
Perplexity's 'Brain' system shifts AI memory from user-centric profiles to agent-centric performance, using an overnight context graph to learn from past tasks, failures, and corrections to improve future efficiency.
Vercel's Eve: A Filesystem-First Framework for AI Agents
Vercel has released Eve, an open-source framework that treats AI agents as directories of files, mapping specific capabilities like tools, skills, and schedules to file paths to eliminate boilerplate and production plumbing.
The KV Cache Compression Race: TurboQuant vs OSCAR vs EpiCache
KV cache compression is the new frontier for scaling LLM inference, with TurboQuant, OSCAR, and EpiCache offering distinct strategies to balance memory footprint against model accuracy.
Qwen-RobotSuite: Three Foundation Models for Embodied AI
The Qwen team has released a suite of three specialized foundation models—RobotManip, RobotWorld, and RobotNav—designed to address data fragmentation in robotics through unified action representations, language-conditioned world modeling, and scalable navigation interfaces.
Building Memory-Efficient Transformers with xFormers
xFormers provides specialized kernels that avoid materializing large attention matrices, enabling linear memory scaling and efficient handling of variable-length sequences, GQA, and custom positional biases.
OpenAI's Deployment Simulation for Agentic Coding Risk Assessment
OpenAI has introduced a deployment simulation framework that uses simulated tool calls to evaluate the safety and reliability of agentic coding systems before they are deployed in real-world environments.
MiniMax Sparse Attention: Scaling Long Context with Block-Sparsity
MiniMax Sparse Attention (MSA) reduces the quadratic cost of long-context attention by using a two-branch, block-sparse approach that selects key-value blocks via a learned indexer, maintaining performance while fixing compute costs at O(kBk).
Sakana Marlin: Autonomous Enterprise Research via AB-MCTS
Sakana AI's Marlin is an enterprise research agent that uses Adaptive Branching Monte Carlo Tree Search (AB-MCTS) to autonomously generate 60–100 page research reports over 8-hour sessions.
Building Layout-Aware Parsing Pipelines with Docling Parse
Docling Parse enables fine-grained PDF extraction by providing character, word, and line-level coordinates, allowing developers to reconstruct document structure for advanced RAG and AI applications.
Standardizing AI Context with the Open Knowledge Format (OKF)
Google Cloud's Open Knowledge Format (OKF) provides a vendor-neutral, markdown-based specification for organizing internal knowledge, enabling AI agents to consume curated, portable context without proprietary APIs.
Atoms: Moving Beyond Code Generation to Full-Lifecycle AI Agents
Atoms shifts the 'vibe coding' paradigm from simple code generation to a multi-agent system that handles the entire product lifecycle, including research, development, deployment, and marketing.
Hermes Agent Enables Non-Blocking Asynchronous Subagents
Nous Research updated the Hermes Agent to support asynchronous subagent delegation, allowing parent agents to continue working while child agents execute tasks in the background.
Flash-KMeans: Accelerating Exact Clustering on GPUs
Flash-KMeans optimizes Lloyd's k-means algorithm for GPUs by restructuring dataflow to eliminate HBM bottlenecks, achieving up to 200x speedups over FAISS without sacrificing mathematical accuracy.
Hands-On Guide to FineWeb Corpus Processing and Analytics
Learn to stream, filter, deduplicate, and analyze large-scale web datasets like FineWeb using Python, MinHash, and tiktoken to prepare high-quality data for LLM training.
Z.ai Releases GLM-5.2 with 1M-Token Context for Coding Agents
Z.ai's new GLM-5.2 model introduces a 1M-token context window and variable 'thinking-effort' levels, enabling coding agents to process entire mid-sized repositories without needing constant summarization.
Building a QwenPaw Agent Workspace in Google Colab
A practical guide to deploying QwenPaw in Google Colab, featuring automated model provider configuration, custom skill development, and streaming API integration for agentic workflows.
Omnigent: A Meta-Harness for Composing and Governing AI Agents
Omnigent is an open-source meta-harness that standardizes the interface for diverse AI agents, enabling developers to compose, govern, and share agent sessions across terminal, web, and mobile environments.
Google's Gemini-SQL2 Sets New BIRD Benchmark Record
Google's Gemini-SQL2, powered by Gemini 3.1 Pro, achieved an 80.04% execution accuracy on the BIRD text-to-SQL benchmark, outperforming all other single-model entries.
Spatial Graph Neural Networks for Urban Function Inference
A practical pipeline for urban function inference using city2graph, OSMnx, and PyTorch Geometric to classify POIs based on spatial relationships and graph topology.
Moonshot AI Releases Kimi K2.7-Code: Agentic Coding Model
Moonshot AI's new K2.7-Code model improves coding benchmarks by up to 31.5% over its predecessor while reducing reasoning-token usage by 30%, optimizing both performance and cost for long-horizon software engineering tasks.
xAI Launches Grok Build Plugin Marketplace for Terminal Agents
xAI has introduced a plugin marketplace for its Grok Build terminal agent, allowing developers to bundle skills, commands, and MCP/LSP configurations into installable packages with SHA-pinning for security.
Building 3D Medical Segmentation Pipelines with MONAI
This tutorial demonstrates an end-to-end 3D spleen segmentation pipeline using MONAI and a 3D UNet, covering data preprocessing, patch-based training, and sliding-window inference.
Zamba2-VL: Hybrid Mamba2-Transformer Vision-Language Models
Zyphra's Zamba2-VL models use a hybrid Mamba2-Transformer architecture to achieve near-linear time prefill and significantly lower time-to-first-token compared to dense Transformer-based VLMs.
Perplexity Integrates Deep Research into 'Computer' Orchestration
Perplexity has moved its Deep Research feature into 'Computer,' a multi-model orchestration system that breaks complex queries into subtasks and routes them across 20+ frontier models to generate reports, decks, and dashboards.
Moonshot AI Launches Kimi Work: A Local Desktop Agent
Kimi Work is a local desktop AI agent that automates tasks by accessing local files and your browser, powered by the Kimi K2.6 model and a 300-sub-agent swarm.
Showing 30 of 174