Toward Data-Centric Deep Learning for Adaptive Optimization in Large-Scale Systems

Main Article Content

Alistair Pembroke

Abstract

With the rapid expansion of large-scale data-driven systems, optimizing performance under dynamic and heterogeneous environments has become increasingly challenging. Traditional model-centric approaches primarily emphasize architecture design while overlooking the intrinsic properties of data distributions, leading to limited generalization and robustness. To address this issue, this paper proposes a data-centric deep learning framework that focuses on data preprocessing, feature alignment, and self-supervised optimization. By integrating distribution-aware representation learning and adaptive feedback mechanisms, the proposed framework significantly improves performance under diverse conditions. Extensive experiments demonstrate superior accuracy, robustness, and efficiency compared with baseline methods. The results highlight that data-centric design is a critical paradigm for next-generation intelligent systems.

Article Details

How to Cite
Pembroke, A. (2024). Toward Data-Centric Deep Learning for Adaptive Optimization in Large-Scale Systems. Journal of Computer Science and Software Applications, 4(8). Retrieved from https://mfacademia.org/index.php/jcssa/article/view/266
Section
Articles

References

Y. Gong, G. Liu, Y. Xue, R. Li, and L. Meng, “A survey on dataset quality in machine learning,” Inf. Softw. Technol., vol. 162, Art. no. 107268, 2023.

N. Sambasivan, S. Kapania, H. Highfill, D. Akrong, P. Paritosh, and L. M. Aroyo, “Everyone wants to do the model work, not the data work: Data cascades in high-stakes AI,” in Proc. CHI, pp. 1-15, 2021.

L. Aroyo, M. Lease, P. Paritosh, and M. Schaekermann, “Data excellence for AI,” Interactions, vol. 29, no. 2, pp. 66-69, 2022.

N. Polyzotis, S. Roy, S. E. Whang, and M. Zinkevich, “Data management challenges in production machine learning,” in Proc. SIGMOD, pp. 1723-1726, 2017.

N. Polyzotis, S. Roy, S. E. Whang, and M. Zinkevich, “Data lifecycle challenges in production machine learning,” ACM SIGMOD Rec., vol. 47, no. 2, pp. 17-28, 2018.

M. Noroozi and P. Favaro, “Unsupervised learning of visual representations by solving jigsaw puzzles,” in Proc. ECCV, pp. 69-84, 2016.

S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” 2018.

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in Proc. ICML, pp. 1597-1607, 2020.

M. Caron et al., “Unsupervised learning of visual features by contrasting cluster assignments,” in Proc. NeurIPS, vol. 33, pp. 9912-9924, 2020.

J. Zbontar et al., “Barlow Twins: Self-supervised learning via redundancy reduction,” in Proc. ICML, pp. 12310-12320, 2021.

B. Barlocker and X. Yan, "Contrastive Representation Learning for Anomaly Detection in Cloud-Based Backend Services," Artificial Intelligence and Computing Innovations, vol. 1, no. 2, 2021.

Y. Ganin et al., “Domain-adversarial training of neural networks,” J. Mach. Learn. Res., vol. 17, no. 59, pp. 1-35, 2016.

N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,” IEEE TPAMI, vol. 39, no. 9, pp. 1853-1865, 2016.

K. Bousmalis et al., “Domain separation networks,” in Proc. NeurIPS, 2016.

M. Wang and W. Deng, “Deep visual domain adaptation: A survey,” Neurocomputing, vol. 312, pp. 135-153, 2018.

Z. Zhu, Y. Yan, R. Xu, Y. Zi and J. Wang, "Attention-Unet: A Deep Learning Approach for Fast and Accurate Segmentation in Medical Imaging," 2022.

Z. Qiu, "A Multi-Scale Deep Learning and Uncertainty Estimation Framework for Comprehensive Anomaly Detection in Cloud Environments," 2023.

X. Sun, Y. Yao, X. Wang, P. Li and X. Li, "AI-Driven Health Monitoring of Distributed Computing Architecture: Insights from XGBoost and SHAP," Proceedings of the 2024 4th International Conference on Communication Technology and Information Technology (ICCTIT), pp. 480-484, 2024.

Q. Gan, "Large Language Model Framework for Multi-Document Financial Anomaly Detection in Intelligent Auditing via Semantic Mapping and Risk Reasoning," 2024.

Y. Li, "Task-Aware Differential Privacy and Modular Structural Perturbation for Secure Fine-Tuning of Large Language Models," 2024.

Y. Xing, "Enhancing Advertising Recommendation Performance via Integrated Causal Inference and Exposure Bias Correction," Journal of Computer Technology and Software, vol. 2, no. 3, 2023.

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal and I. Sutskever, "Learning Transferable Visual Models From Natural Language Supervision," Proceedings of the International Conference on Machine Learning, pp. 8748-8763, 2021.

M. Wang, "Multi-Level Attention and Sequence Modeling for Dynamic User Interest Representation in Real-Time Advertising Recommendation," Transactions on Computational and Scientific Methods, vol. 3, no. 2, 2023.

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal and D. Amodei, "Language Models Are Few-Shot Learners," Proceedings of the Advances in Neural Information Processing Systems, vol. 33, pp. 1877-1901, 2020.

J. Li, "LocateNet: Large Multimodal Models for Text-Guided Object Localization," Transactions on Computational and Scientific Methods, vol. 4, no. 12, 2024.

Y. Wang, "Semantic-Driven Large Model Scheduling for Distributed Systems via Unified Representation and Policy Generation," 2024.

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” OpenAI Technical Report, 2018.

Q. Zhang, "Adaptive Resource Scheduling in Distributed Computing via Multi-Agent Reinforcement Learning and Graph Convolutional Modeling," 2024.

Y. Hu, "Autonomous Agent Architecture for Complex Tasks via Hierarchical Planning and Language Model Reasoning," 2024.

A. H. Mohammed and A. H. Ali, "Survey of BERT (Bidirectional Encoder Representation Transformer) Types," Proceedings of the Journal of Physics: Conference Series, vol. 1963, no. 1, p. 012173, 2021.

F. Chen, "AI-Augmented Anomaly Detection via Generative Distribution Modeling and Uncertainty Quantification in Cloud Systems," 2024.

C. Hua, "A Semantic-Prior-Guided AI Framework for Collaborative Environment Understanding and Robust Agent Decision Making," 2024.

K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.

C. Chiang, "Drift-Aware Adaptive Classification for Imbalanced Data via Dynamic Class Reweighting and Structural Regularization," Transactions on Computational and Scientific Methods, vol. 4, no. 12, 2024.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov and A. Rabinovich, "Going Deeper With Convolutions," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.

C. Nie, "Representation Learning with Multi-Task Self-Supervision for Structurally Diverse Spatiotemporal Time Series Forecasting," Journal of Computer Technology and Software, vol. 3, no. 7, 2024.

J. Lai, "Attention Alignment under Logical Constraints for Reliable Financial Statement Reasoning," 2024.

Y. Huang, "Explainable Cognitive Multi-Agent AI for Joint Intention Modeling in Complex Task Planning," 2024.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez and I. Polosukhin, "Attention Is All You Need," Proceedings of the Advances in Neural Information Processing Systems, vol. 30, 2017.

Y. Wang, "Intelligent Compliance Risk Detection in the Pharmaceutical Industry via Transformer-Driven Semantic Discrimination," 2024.

R. Fang, "Transaction Network Graph Neural Networks for Automated and Robust Financial Fraud Detection in Corporate Auditing," 2024.

A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Classification With Deep Convolutional Neural Networks," Proceedings of the Advances in Neural Information Processing Systems, vol. 25, 2012.

Y. Luan, "Long Text Classification with Large Language Models via Dynamic Memory and Compression Mechanisms," Transactions on Computational and Scientific Methods, vol. 4, no. 7, 2024.

J. Guo, "Balancing Performance and Efficiency in Large Language Model Fine-Tuning through Hierarchical Freezing," Transactions on Computational and Scientific Methods, vol. 4, no. 6, 2024.

Y. LeCun, Y. Bengio and G. Hinton, "Deep Learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015.

A. Xie, "Adaptive Privacy-Aware Federated Language Modeling for Collaborative Electronic Medical Record Analysis," 2024.

F. Liu, "Intelligent Cloud Service Anomaly Monitoring via Uncertainty Estimation and Causal Graph Inference," Transactions on Computational and Scientific Methods, vol. 4, no. 10, 2024.

Z. Qiu, "Time Series and Graph Structure Fusion for AI-Based Anomaly Detection in Microservice Environments," Journal of Computer Technology and Software, vol. 3, no. 7, 2024.

I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, Cambridge, MA, USA: MIT Press, 2016.