人工智能大模型价值对齐的协同进路

江西财经大学学报 ›› 2025, Vol. 0 ›› Issue (4): 112-123.

人工智能大模型价值对齐的协同进路

郑煌杰

中南大学法学院,湖南长沙 410083

收稿日期:2025-03-09 修回日期:2025-05-25 出版日期:2025-07-25 发布日期:2025-09-17
作者简介:郑煌杰,中南大学博士研究生,主要从事人工智能法研究,联系方式Yuppiejay@163.com。
基金资助:
国家社会科学基金重大项目“习近平总书记关于民法典重要论述的学理阐释及实践研究”（24&ZD123）; 湖南省社会科学基金项目“人工智能提供者的义务规范研究”（24YB0049）

A Collaborative Approach to Aligning the Value of Artificial Intelligence Big Models

Zheng Huang-jie

Central South University, Changsha 410083, China

Received:2025-03-09 Revised:2025-05-25 Online:2025-07-25 Published:2025-09-17

摘要/Abstract

摘要： 相较于ChatGPT,国产大模型DeepSeek在模型架构、训练范式与推理机制上已取得显著性创新。这意味着,当前大模型已实现由机械式感知反馈机制向类人主体认知决策范式的质态跨越,由此引发了数据输入失序、算法运行失控与内容输出失范风险。价值对齐的核心在于推动大模型从工具理性向价值理性转变,具有一定的必要性与可行性,但其也面临技术性挑战与规范性难题,成因则在于尚未建立健全的价值对齐体系。鉴于此,亟须采取技术规制、伦理调适与法律治理的协同范式,即以风险分层理念为基础构建数据合规框架,加强算法决策全周期溯源监管,优化内容治理归责架构;完善伦理共识提炼机制,改进伦理审查响应系统,塑造伦理主体责任网络;确立层次分明的价值对齐标准,明晰权责统一的价值对齐性质,设计动态演进的价值对齐评估方法,以加快形成新质生产力推进中国式现代化建设。

关键词: 人工智能大模型, 价值对齐, 协同治理, 对齐标准, 对齐评估

Abstract: Compared to ChatGPT, the domestic large-scale model DeepSeek has achieved significant innovation in model architecture, training paradigm, and inference mechanism. This means that the current large-scale model has achieved a qualitative leap from a mechanical perception feedback mechanism to a human like subject cognitive decision-making paradigm, which has also led to risks of data input disorder, algorithm running out of control, and content output disorder. The core of value alignment lies in promoting the transformation of big models from instrumental rationality to value rationality, which has certain necessity and feasibility. However, it also faces technical challenges and normative difficulties, and the reason lies in the lack of a sound value alignment system. In view of this, it is urgent to adopt a collaborative paradigm of technical regulation, ethical adjustment, and legal governance, that is, to build a data compliance framework based on the risk stratification concept, strengthen the full cycle traceability supervision of algorithm decision-making, and optimize the content governance accountability framework; improve the mechanism for extracting ethical consensus, enhance the ethical review response system, and shape the ethical subject responsibility network; establish a clear hierarchy of value alignment standards, clarify the nature of value alignment with the unity of power and responsibility, and design a value alignment evaluation method for dynamic evolution, so as to accelerate the formation of new quality productivity to promote the construction of Chinese path to modernization.

Key words: large-scale artificial intelligence model, value alignment, collaborative governance, alignment standards, alignment evaluation

中图分类号:

D922.16

郑煌杰. 人工智能大模型价值对齐的协同进路[J]. 江西财经大学学报, 2025, 0(4): 112-123.

Zheng Huang-jie. A Collaborative Approach to Aligning the Value of Artificial Intelligence Big Models[J]. Journal of Jiangxi University of Finance and Economics, 2025, 0(4): 112-123.

参考文献

[1] 褚建勋, 夏小草, 吴熙凡. 中国与欧美治理人工智能价值对齐问题的实践及其互鉴[J]. 南京邮电大学学报 (社会科学版), 2024, (6): 18-27.
[2] 范进学, 陈阳. 生成式AI价值对齐的法理反思与路径选择[J]. 南京社会科学, 2024, (10): 69-81.
[3] 韩旭至. 大模型价值对齐的法治进路[J]. 中国法律评论, 2025,(1): 75-91.
[4] 黄锫. 生成式AI对个人信息保护的挑战与风险规制[J]. 现代法学, 2024, (4): 101-115.
[5] 江军, 李牧翰. 人工智能金融领域应用伦理风险及其法律治理[J]. 江西财经大学学报, 2025, (1): 127-136.
[6] 马川. 中国主权AI价值对齐社会主义核心价值观的逻辑、功能与实践指向[J]. 思想理论教育, 2025,(2): 58-65.
[7] 沈湘平. 价值对齐与人类价值共识及其生存理性[J]. 自然辩证法研究, 2024, (12): 3-11.
[8] 苏宇. 优化算法可解释性及透明度义务之诠释与展开[J]. 法律科学 (西北政法大学学报), 2022, (1): 133-141.
[9] 唐林垚. 公司法如何促进模型可信与价值对齐[J]. 东方法学, 2024,(2): 76-87.
[10] 夏永红. 人工智能伦理治理范式:从价值对齐到价值共生[J]. 自然辩证法通讯, 2025, (1): 1-8.
[11] 许中缘, 郑煌杰. ChatGPT类应用风险的治理误区及其修正——从“重构式规制”到“阶段性治理”[J]. 河南社会科学, 2023, (10): 50-62.
[12] 矣晓沅, 谢幸. 大模型道德价值观对齐问题剖析[J]. 计算机研究与发展, 2023, (9): 1926-1945.
[13] 张俊. 人工智能时代算法应用的风险挑战及治理路径——评《驯服算法: 数字歧视与算法规制》[J]. 江西财经大学学报, 2024, (6): 2+141.
[14] 郑煌杰. AIGC赋能新质生产力的数据风险及其敏捷治理[J]. 河海大学学报 (哲学社会科学版), 2024, (4): 89-102.
[15] 邹开亮, 刘祖兵. ChatGPT的伦理风险与中国因应制度安排[J]. 海南大学学报 (人文社会科学版), 2023, (4): 74-84.
[16] Gillespie T, Shaw R, Gray M L, et al.AI red-teaming is a sociotechnical challenge: On values, labor, and harms[Z]. arXiv, 2024(2024).
[17] Ouyang L, Wu J, Xu J, et al.Training language models to follow instructions with human feedback[J]. Advances in Neural Information Processing Systems, 2022, 35: 27730-27744.
[18] Sulisworo D, Kristiawan M.DeepSeek dan ChatGPT: Mana yang lebih baik untuk penyusunan proposal riset?[J]. Bincang Sains dan Teknologi, 2025, 4(1): 18-25.
[19] Yao J, Yi X, Wang X, et al.From instructions to intrinsic human values—A survey of alignment goals for big models[Z]. arXiv, 2023(2023).