Data Engineering for Large Foundation Models / Najlacnejšie knihy
Data Engineering for Large Foundation Models

Code: 52898120

Data Engineering for Large Foundation Models

by Jun Yu, Chang Wen Chen

Data quality has become a decisive foundation for large foundation models, shaping their capability, reliability, alignment, and real-world applicability. Data Engineering for Large Foundation Models: A Handbook provides a systema ... more

216.55


Forthcoming
Expected 14. 12. 2026

Availability alert

Add to wishlist

You might also like

Give this book as a present today
  1. Order book and choose Gift Order.
  2. We will send you book gift voucher at once. You can give it out to anyone.
  3. Book will be send to donee, nothing more to care about.

Book gift voucher sampleRead more

Availability alert

Availability alert


Your agreement - Submiting you agree to the Terms and Condtions.

We will watch availability for you

Enter your e-mail address and once book will be available,
we will send you a message. It's that simple.

More about Data Engineering for Large Foundation Models

You get 524 loyalty points

Book synopsis

Data quality has become a decisive foundation for large foundation models, shaping their capability, reliability, alignment, and real-world applicability. Data Engineering for Large Foundation Models: A Handbook provides a systematic and practice-oriented guide to data engineering for foundation models. Moving beyond a narrow focus on large language models, the book covers the data lifecycle behind language models, vision-language models, multimodal understanding systems, text-to-image and text-to-video generative models, reasoning models, agentic systems, and domain-specific AI applications.

The book presents a full-stack framework for building high-quality data pipelines for foundation-model development. It covers large-scale pre-training data engineering, including data sourcing, acquisition, cleaning, deduplication, decontamination, tokenization, serialization, efficient loading, and quality evaluation. It also addresses multimodal data engineering for image-text, document, video, and audio data, as well as post-training and alignment data construction, including SFT, preference data, RLHF, Chain-of-Thought reasoning data, tool-use data, agent memory, and multi-turn interaction data.

The book further examines data-centric AI systems, including synthetic data factories, knowledge distillation, enterprise-grade RAG and multimodal RAG pipelines, online feedback loops, knowledge updating, DataOps platforms, data governance, privacy protection, federated learning, and compliance-aware data engineering. Through end-to-end projects and reproducible system designs, readers gain hands-on experience with distributed pre-training data pipelines, domain-specific SFT datasets, multimodal instruction data factories, reasoning data flywheels, agent tool-use data factories, enterprise DataOps platforms, privacy-preserving pipelines, open-source model reproduction, and text-to-video training data pipelines. Using modern tools such as Ray, Spark, Dask, Parquet, WebDataset, vector databases, DVC, MLflow, and Airflow, this handbook equips data engineers, MLOps and DataOps professionals, AI researchers, and technical product teams to build reliable, scalable, and continuously improving foundation-model systems.

Book details

216.55



Collection points Bratislava a 12849 dalších

Copyright ©2008-26 najlacnejsie-knihy.sk All rights reservedPrivacyCookies


Account: Log in
Všetky knihy sveta na jednom mieste. Navyše za skvelé ceny.

Shopping cart ( Empty )

For free shipping
shop for 59,99 € and more

You are here: