思维工坊博客 | 思维工坊 AI 社区

揭秘Cursor背后的技术：MCP服务器架构全解析

2025年3月21日 · 阅读需 8 分钟

思维工坊合伙人

引言

在AI辅助编程的浪潮中，Cursor作为一款新兴的智能编辑器正获得越来越多开发者的青睐。而支撑其智能功能的核心技术之一，就是MCP（Modal-Client Protocol）架构。本文将深入探讨MCP服务器的工作原理、架构设计以及它在现代AI编程工具链中的关键作用。

MCP服务器是什么？

MCP服务器本质上是连接编程工具与AI服务的中间层，它为像Cursor这样的智能IDE提供了强大的后端支持。通过标准化的协议，MCP服务器使编辑器能够获取高级的代码智能服务，包括智能代码补全、深度代码分析和自动化重构建议等。

MCP架构的核心组件

MCP生态系统由三个核心组件组成，它们协同工作，为开发者提供流畅的AI辅助编程体验：

MCP客户端（如Cursor IDE）：捕获用户的编码上下文和操作意图
MCP服务器：处理请求并与AI服务交互的中间层
AI服务：提供实际的智能推理和生成能力（通常是大型语言模型）

MCP架构关系图

MCP服务器的关键功能

一个功能完善的MCP服务器通常提供以下核心服务：

1. 智能代码服务

代码补全：根据上下文提供智能的代码建议
代码分析：发现潜在问题、评估代码质量
自动重构：提供改进代码结构和质量的建议

2. 会话和状态管理

维护客户端的长连接会话
管理用户上下文和编程环境信息
实现双向通信机制

3. AI交互优化

构建优化的AI提示
处理和格式化AI响应
适配特定编程语言和框架的需求

为什么需要MCP服务器？

在我们的技术讨论中，一个很自然的问题是：为什么需要MCP服务器这个中间层？为什么不直接用HTTP接口连接IDE和AI服务？

这个问题触及了协议标准化的本质。MCP并不是要替代HTTP（实际上，MCP通常就是基于HTTP实现的），而是在其之上添加了特定于编程助手场景的规范和约定：

标准化交互模式：定义统一的消息格式和会话管理机制
领域特定功能：针对代码编辑和AI辅助的特殊需求提供标准解决方案
生态系统兼容性：一个MCP服务器可以支持多个符合规范的客户端
功能发现和扩展：允许客户端自动发现服务器的能力并适应

这种设计使得MCP成为连接编辑器和AI服务的理想桥梁，类似于SMTP之于电子邮件系统的作用。

MCP服务器实现示例

以下是一个基于Spring Boot的MCP服务器框架示例，展示了基本的组件结构：

@SpringBootApplication
public class McpServerApplication {
    public static void main(String[] args) {
        SpringApplication.run(McpServerApplication.class, args);
    }
}

// MCP消息实体类
class MCPMessage {
    private String id;
    private String type;
    private String content;
    private Map<String, Object> metadata;
    // ...其他字段和方法
}

// MCP服务实现
@Service
public class MCPService {
    // 会话管理、消息处理等核心功能
    // ...
}

// API控制器
@RestController
@RequestMapping("/api/mcp")
public class MCPController {
    // 连接、发送/接收消息的HTTP端点
    // ...
}

MCP客户端、服务器与AI服务的工作流程

当用户在Cursor中编码并请求智能辅助时，完整的工作流程如下：

**Cursor（MCP客户端）**捕获用户正在编辑的代码、光标位置和项目上下文
MCP服务器接收这些信息，处理上下文并构建适合AI服务的提示
AI服务（如GPT-4）接收提示并生成响应
MCP服务器处理AI响应，格式化结果并返回给Cursor
Cursor将结果集成到编辑器界面，展示给用户

这种分层架构使得每个组件都能专注于自己的核心职责，同时保持整体系统的灵活性和可扩展性。

Cursor与MCP服务器的关系

一个有趣的观察是，当Cursor检测到可用的MCP服务器时，它会将智能功能的请求交给MCP服务器处理，而不是直接调用内置的AI服务。这并不意味着AI服务被完全绕过，而是职责发生了转移：

没有MCP服务器时：Cursor自己负责与AI服务通信、处理上下文和格式化结果
有MCP服务器时：MCP服务器接管了这些职责，可能使用同样的AI服务或其他替代方案

这种架构为高级用户和企业提供了更大的灵活性，允许他们通过自定义MCP服务器来扩展Cursor的能力，添加专有功能或集成内部工具链。

结论

MCP服务器架构代表了AI辅助编程工具的一个重要发展方向，它通过标准化协议和中间层设计，解决了智能IDE与AI服务集成的诸多挑战。随着像Cursor这样的工具不断普及，理解和优化MCP架构将成为开发高效AI编程工具链的关键。

无论是想深入理解这些工具的内部机制，还是计划构建自己的AI辅助开发环境，MCP架构都提供了一个强大而灵活的框架，值得每一位关注AI编程未来的开发者深入探索。

参考资源

本文基于对MCP架构的技术讨论整理而成，旨在帮助开发者理解AI辅助编程工具的内部机制。

构建基于Spring Boot 2.2的企业级模型控制平台服务器

2025年3月20日 · 阅读需 12 分钟

杨杨杨大侠

思维工坊合伙人

在人工智能和机器学习快速发展的今天，企业对于AI模型的管理和部署提出了更高的要求。模型控制平台(Model Control Platform, MCP)作为一种新兴的架构模式，能够有效地管理多种AI模型，提供统一的访问接口，并实施企业级的安全策略。本文将详细介绍如何使用Spring Boot 2.2构建一个简单而功能完备的MCP Server。

MCP架构概述

在深入代码实现之前，让我们先了解MCP的整体架构以及各组件的作用：

MCP Client：企业内部应用和集成点，如Cursor编辑器
MCP Gateway：实施企业安全策略的网关
MCP Router：管理对多个模型供应商的访问
MCP Server：提供模型服务，可能是混合模式（自托管和第三方）
MCP Host：私有云或混合云环境

在这个架构中，MCP Server扮演着关键角色，它负责实际的模型加载、推理计算以及资源管理，是整个系统的计算核心。

准备工作

开始之前，确保您的开发环境满足以下要求：

JDK 8或更高版本
Maven 3.6+或Gradle 6.0+
IDE（推荐使用IntelliJ IDEA或Spring Tool Suite）
基础的Spring Boot知识

项目初始化

首先，我们需要创建一个基础的Spring Boot项目。可以使用Spring Initializr（https://start.spring.io/）或直接在IDE中创建。

项目依赖

我们的MCP Server需要以下核心依赖：

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-security</artifactId>
    </dependency>
    <dependency>
        <groupId>io.jsonwebtoken</groupId>
        <artifactId>jjwt</artifactId>
        <version>0.9.1</version>
    </dependency>
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <optional>true</optional>
    </dependency>
    <!-- 这里可以添加具体的机器学习库依赖 -->
</dependencies>

项目结构

推荐的项目结构如下：

src/main/java/com/example/mcpserver/
  - McpServerApplication.java     # 应用入口
  - config/                       # 配置类
  - controller/                   # REST API控制器
  - service/                      # 业务逻辑服务
  - model/                        # 数据模型和DTO
  - exception/                    # 自定义异常处理
  - util/                         # 工具类

核心组件实现

1. 数据模型设计

首先，让我们定义表示AI模型的基础类：

package com.example.mcpserver.model;

import lombok.Data;
import java.util.Map;

@Data
public class AIModel {
    private String id;
    private String name;
    private String version;
    private String path;
    private boolean loaded;
    private Map<String, Object> metadata;
}

接下来，我们需要为推理请求和响应创建数据传输对象（DTO）：

package com.example.mcpserver.model.dto;

import lombok.Data;
import java.util.Map;

@Data
public class InferenceRequest {
    private String modelId;
    private String prompt;
    private Map<String, Object> parameters;
}

@Data
public class InferenceResponse {
    private String modelId;
    private String result;
    private long latency;
    private Map<String, Object> metadata;
}

2. 服务层实现

我们首先定义模型服务的接口：

package com.example.mcpserver.service;

import com.example.mcpserver.model.AIModel;
import com.example.mcpserver.model.dto.InferenceRequest;
import com.example.mcpserver.model.dto.InferenceResponse;
import java.util.List;

public interface ModelService {
    List<AIModel> listModels();
    AIModel getModel(String modelId);
    boolean loadModel(String modelId);
    boolean unloadModel(String modelId);
    InferenceResponse infer(InferenceRequest request);
}

然后，我们实现这个接口：

package com.example.mcpserver.service.impl;

import com.example.mcpserver.model.AIModel;
import com.example.mcpserver.model.dto.InferenceRequest;
import com.example.mcpserver.model.dto.InferenceResponse;
import com.example.mcpserver.service.ModelService;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import javax.annotation.PostConstruct;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;

@Service
public class ModelServiceImpl implements ModelService {

    @Value("${mcp.models.base-path}")
    private String basePath;

    private final Map<String, AIModel> models = new ConcurrentHashMap<>();
    
    @PostConstruct
    public void init() {
        // 初始化模型列表，实际应用中可能从配置或数据库加载
        AIModel demoModel = new AIModel();
        demoModel.setId("gpt2");
        demoModel.setName("GPT-2");
        demoModel.setVersion("1.0");
        demoModel.setPath(basePath + "/gpt2");
        demoModel.setLoaded(false);
        demoModel.setMetadata(Map.of("type", "text-generation", "parameters", 124000000));
        
        models.put(demoModel.getId(), demoModel);
    }

    @Override
    public List<AIModel> listModels() {
        return new ArrayList<>(models.values());
    }

    @Override
    public AIModel getModel(String modelId) {
        return models.get(modelId);
    }

    @Override
    public boolean loadModel(String modelId) {
        AIModel model = models.get(modelId);
        if (model != null && !model.isLoaded()) {
            // 实际加载模型的代码，这里简化处理
            model.setLoaded(true);
            return true;
        }
        return false;
    }

    @Override
    public boolean unloadModel(String modelId) {
        AIModel model = models.get(modelId);
        if (model != null && model.isLoaded()) {
            // 实际卸载模型的代码，这里简化处理
            model.setLoaded(false);
            return true;
        }
        return false;
    }

    @Override
    public InferenceResponse infer(InferenceRequest request) {
        AIModel model = models.get(request.getModelId());
        if (model == null || !model.isLoaded()) {
            throw new RuntimeException("Model not available");
        }
        
        long startTime = System.currentTimeMillis();
        
        // 实际的模型推理代码，这里用示例实现
        String result = "This is a sample response from " + model.getName();
        
        long latency = System.currentTimeMillis() - startTime;
        
        InferenceResponse response = new InferenceResponse();
        response.setModelId(model.getId());
        response.setResult(result);
        response.setLatency(latency);
        response.setMetadata(Map.of("tokenCount", 10));
        
        return response;
    }
}

3. REST API控制器

接下来，我们创建REST API控制器，暴露服务接口：

package com.example.mcpserver.controller;

import com.example.mcpserver.model.AIModel;
import com.example.mcpserver.model.dto.InferenceRequest;
import com.example.mcpserver.model.dto.InferenceResponse;
import com.example.mcpserver.service.ModelService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import java.util.List;
import java.util.Map;

@RestController
@RequestMapping("/api/v1")
public class ModelController {

    @Autowired
    private ModelService modelService;

    @GetMapping("/models")
    public ResponseEntity<List<AIModel>> listModels() {
        return ResponseEntity.ok(modelService.listModels());
    }

    @GetMapping("/models/{modelId}")
    public ResponseEntity<AIModel> getModel(@PathVariable String modelId) {
        AIModel model = modelService.getModel(modelId);
        if (model == null) {
            return ResponseEntity.notFound().build();
        }
        return ResponseEntity.ok(model);
    }

    @PostMapping("/models/{modelId}/load")
    public ResponseEntity<Map<String, Boolean>> loadModel(@PathVariable String modelId) {
        boolean success = modelService.loadModel(modelId);
        return ResponseEntity.ok(Map.of("success", success));
    }

    @PostMapping("/models/{modelId}/unload")
    public ResponseEntity<Map<String, Boolean>> unloadModel(@PathVariable String modelId) {
        boolean success = modelService.unloadModel(modelId);
        return ResponseEntity.ok(Map.of("success", success));
    }

    @PostMapping("/infer")
    public ResponseEntity<InferenceResponse> infer(@RequestBody InferenceRequest request) {
        InferenceResponse response = modelService.infer(request);
        return ResponseEntity.ok(response);
    }

    @GetMapping("/health")
    public ResponseEntity<Map<String, String>> health() {
        return ResponseEntity.ok(Map.of("status", "OK"));
    }
}

4. 安全配置

最后，我们添加基本的安全配置：

package com.example.mcpserver.config;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.security.config.annotation.web.builders.HttpSecurity;
import org.springframework.security.config.annotation.web.configuration.EnableWebSecurity;
import org.springframework.security.config.annotation.web.configuration.WebSecurityConfigurerAdapter;
import org.springframework.security.config.http.SessionCreationPolicy;
import org.springframework.security.crypto.bcrypt.BCryptPasswordEncoder;
import org.springframework.security.crypto.password.PasswordEncoder;

@Configuration
@EnableWebSecurity
public class SecurityConfig extends WebSecurityConfigurerAdapter {

    @Override
    protected void configure(HttpSecurity http) throws Exception {
        http.csrf().disable()
            .authorizeRequests()
            .antMatchers("/api/v1/health").permitAll() // 健康检查端点不需要认证
            .anyRequest().authenticated()
            .and()
            .httpBasic() // 简单实现，生产环境应使用JWT或OAuth2
            .and()
            .sessionManagement().sessionCreationPolicy(SessionCreationPolicy.STATELESS);
    }

    @Bean
    public PasswordEncoder passwordEncoder() {
        return new BCryptPasswordEncoder();
    }
}

5. 配置属性

为了完成我们的应用，我们需要在application.properties中定义一些属性：

server.port=8080
spring.application.name=mcp-server

# 模型配置
mcp.models.base-path=/opt/mcp/models
mcp.models.default=gpt2

# 安全配置
mcp.security.jwt.secret=mcpSecretKey
mcp.security.jwt.expiration=86400000

# 资源限制
mcp.resources.max-concurrent-requests=10
mcp.resources.request-timeout=30000

实际案例：集成外部模型

到目前为止，我们已经搭建了一个基础的MCP Server框架。下面我们通过一个实际案例，展示如何集成外部模型库。

使用DJL (Deep Java Library)集成HuggingFace模型

首先，添加DJL依赖：

<dependency>
    <groupId>ai.djl</groupId>
    <artifactId>api</artifactId>
    <version>0.19.0</version>
</dependency>
<dependency>
    <groupId>ai.djl.huggingface</groupId>
    <artifactId>tokenizers</artifactId>
    <version>0.19.0</version>
</dependency>
<dependency>
    <groupId>ai.djl.pytorch</groupId>
    <artifactId>pytorch-engine</artifactId>
    <version>0.19.0</version>
</dependency>

然后，创建一个专门的模型推理服务：

package com.example.mcpserver.service.impl;

import ai.djl.Device;
import ai.djl.inference.Predictor;
import ai.djl.repository.zoo.Criteria;
import ai.djl.repository.zoo.ZooModel;
import ai.djl.training.util.ProgressBar;
import org.springframework.stereotype.Service;
import java.nio.file.Paths;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Service
public class DJLModelService {

    private final Map<String, ZooModel<?>> loadedModels = new ConcurrentHashMap<>();
    
    public boolean loadHuggingFaceModel(String modelId, String modelPath) {
        try {
            Criteria<?> criteria = Criteria.builder()
                .setTypes(String.class, String.class)
                .optModelPath(Paths.get(modelPath))
                .optDevice(Device.cpu())
                .optProgress(new ProgressBar())
                .build();
            
            ZooModel<?> model = criteria.loadModel();
            loadedModels.put(modelId, model);
            return true;
        } catch (Exception e) {
            e.printStackTrace();
            return false;
        }
    }
    
    public String generateText(String modelId, String prompt, Map<String, Object> parameters) {
        ZooModel<?> model = loadedModels.get(modelId);
        if (model == null) {
            throw new RuntimeException("Model not loaded: " + modelId);
        }
        
        try (Predictor<String, String> predictor = model.newPredictor()) {
            return predictor.predict(prompt);
        } catch (Exception e) {
            throw new RuntimeException("Inference failed", e);
        }
    }
    
    public void unloadModel(String modelId) {
        ZooModel<?> model = loadedModels.remove(modelId);
        if (model != null) {
            model.close();
        }
    }
}

接下来，修改我们的ModelServiceImpl类来集成这个DJL服务：

@Service
public class ModelServiceImpl implements ModelService {

    @Autowired
    private DJLModelService djlModelService;
    
    // 其他代码保持不变
    
    @Override
    public boolean loadModel(String modelId) {
        AIModel model = models.get(modelId);
        if (model != null && !model.isLoaded()) {
            boolean success = djlModelService.loadHuggingFaceModel(modelId, model.getPath());
            if (success) {
                model.setLoaded(true);
                return true;
            }
        }
        return false;
    }
    
    @Override
    public boolean unloadModel(String modelId) {
        AIModel model = models.get(modelId);
        if (model != null && model.isLoaded()) {
            djlModelService.unloadModel(modelId);
            model.setLoaded(false);
            return true;
        }
        return false;
    }
    
    @Override
    public InferenceResponse infer(InferenceRequest request) {
        AIModel model = models.get(request.getModelId());
        if (model == null || !model.isLoaded()) {
            throw new RuntimeException("Model not available");
        }
        
        long startTime = System.currentTimeMillis();
        
        String result = djlModelService.generateText(
            request.getModelId(),
            request.getPrompt(),
            request.getParameters()
        );
        
        long latency = System.currentTimeMillis() - startTime;
        
        InferenceResponse response = new InferenceResponse();
        response.setModelId(model.getId());
        response.setResult(result);
        response.setLatency(latency);
        response.setMetadata(Map.of("tokenCount", result.split("\\s+").length));
        
        return response;
    }
}

高级功能拓展

到目前为止，我们已经构建了一个功能基本完备的MCP Server。在实际生产环境中，您可能需要考虑以下高级功能：

1. 负载均衡和并发控制

对于高请求量的场景，我们需要增加负载均衡和并发控制：

@Configuration
public class AsyncConfig {

    @Value("${mcp.resources.max-concurrent-requests}")
    private int maxConcurrentRequests;
    
    @Bean
    public Executor taskExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(maxConcurrentRequests / 2);
        executor.setMaxPoolSize(maxConcurrentRequests);
        executor.setQueueCapacity(500);
        executor.setThreadNamePrefix("mcp-task-");
        executor.initialize();
        return executor;
    }
}

然后在服务层中使用异步调用：

@Async
public CompletableFuture<InferenceResponse> inferAsync(InferenceRequest request) {
    return CompletableFuture.completedFuture(infer(request));
}

2. 模型缓存和内存管理

对于大型模型，内存管理至关重要：

@Service
public class ModelCacheService {

    @Value("${mcp.cache.max-models}")
    private int maxModelsInMemory;
    
    private final Map<String, Long> lastUsedTime = new ConcurrentHashMap<>();
    
    public void trackModelUsage(String modelId) {
        lastUsedTime.put(modelId, System.currentTimeMillis());
    }
    
    public String findModelToEvict() {
        if (lastUsedTime.size() <= maxModelsInMemory) {
            return null;
        }
        
        return lastUsedTime.entrySet().stream()
            .min(Map.Entry.comparingByValue())
            .map(Map.Entry::getKey)
            .orElse(null);
    }
}

3. 监控和指标

使用Spring Boot Actuator增加监控指标：

@Configuration
public class MetricsConfig {

    @Bean
    MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {
        return registry -> registry.config().commonTags("application", "mcp-server");
    }
    
    @Bean
    public TimedAspect timedAspect(MeterRegistry registry) {
        return new TimedAspect(registry);
    }
}

在服务方法上添加指标注解：

@Timed(value = "model.inference", description = "Time taken for model inference")
public InferenceResponse infer(InferenceRequest request) {
    // 现有代码
}

部署和扩展

Docker化部署

为了简化部署过程，我们可以创建一个Dockerfile：

FROM openjdk:11-jre-slim

WORKDIR /app

COPY target/mcp-server-0.0.1-SNAPSHOT.jar app.jar

EXPOSE 8080

ENTRYPOINT ["java", "-jar", "app.jar"]

Kubernetes部署

对于更复杂的部署场景，可以使用Kubernetes：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: mcp-server
  template:
    metadata:
      labels:
        app: mcp-server
    spec:
      containers:
      - name: mcp-server
        image: mcp-server:latest
        ports:
        - containerPort: 8080
        resources:
          limits:
            cpu: "2"
            memory: "4Gi"
          requests:
            cpu: "1"
            memory: "2Gi"
        env:
        - name: SPRING_PROFILES_ACTIVE
          value: "prod"

最佳实践总结

在构建MCP Server时，以下是一些值得注意的最佳实践：

模块化设计：将不同功能封装到独立的服务和控制器中
资源管理：注意模型的内存消耗，实现适当的缓存和释放机制
异常处理：使用全局异常处理器提供统一的错误响应
安全性：实施适当的认证和授权机制
可观察性：集成监控和日志记录功能
性能优化：使用异步处理和并发控制处理高负载

结论

通过Spring Boot 2.2，我们可以快速构建一个功能完备的MCP Server，为企业提供统一的模型管理和推理服务。本文介绍的架构和实现可以作为基础，根据具体业务需求进行扩展和定制。

在实际开发中，您可能需要根据所使用的具体模型库和框架进行调整，但核心架构和接口设计原则是通用的。随着企业AI应用的不断发展，MCP作为一种架构模式将发挥越来越重要的作用。

参考资源

Java中使用Retrofit调用Protobuf协议接口详解

2025年3月1日 · 阅读需 12 分钟

杨杨杨大侠

思维工坊合伙人

随着微服务架构的普及，不同服务间的高效通信变得尤为重要。本文将详细介绍如何在Java项目中使用Retrofit结合Protocol Buffers(Protobuf)协议进行API调用，实现高性能、低延迟的服务间通信。

一、技术背景

1.1 Retrofit简介

Retrofit是Square公司开发的一个类型安全的HTTP客户端，专为Android和Java设计。它将HTTP API转换为Java接口，使API调用变得简单且直观。Retrofit的主要特点包括：

声明式API定义
可插拔的序列化机制
同步/异步请求处理
请求拦截和自定义
良好的扩展性

1.2 Protobuf简介

Protocol Buffers(简称Protobuf)是Google开发的一种与语言、平台无关的可扩展机制，用于序列化结构化数据。相比于JSON和XML，Protobuf具有以下优势：

数据压缩效率高，序列化后体积小
序列化/反序列化速度快
向前兼容和向后兼容
自动生成代码，减少样板代码
支持多种编程语言

1.3 为什么结合使用Retrofit和Protobuf？

将Retrofit与Protobuf结合使用，可以同时获得两者的优势：

Retrofit提供了简洁的API调用方式
Protobuf提供了高效的数据传输格式
两者结合可以显著提升API调用性能，特别是在数据量大、调用频繁的场景

二、环境准备

2.1 项目依赖

在Maven项目中添加以下依赖：

<!-- Retrofit核心库 -->
<dependency>
    <groupId>com.squareup.retrofit2</groupId>
    <artifactId>retrofit</artifactId>
    <version>2.9.0</version>
</dependency>

<!-- Protobuf依赖 -->
<dependency>
    <groupId>com.google.protobuf</groupId>
    <artifactId>protobuf-java</artifactId>
    <version>3.19.4</version>
</dependency>

<!-- Retrofit的Protobuf转换器 -->
<dependency>
    <groupId>com.squareup.retrofit2</groupId>
    <artifactId>converter-protobuf</artifactId>
    <version>2.9.0</version>
</dependency>

<!-- OkHttp客户端 -->
<dependency>
    <groupId>com.squareup.okhttp3</groupId>
    <artifactId>okhttp</artifactId>
    <version>4.9.3</version>
</dependency>

<!-- 日志拦截器 -->
<dependency>
    <groupId>com.squareup.okhttp3</groupId>
    <artifactId>logging-interceptor</artifactId>
    <version>4.9.3</version>
</dependency>

对于Gradle项目，添加以下依赖：

implementation 'com.squareup.retrofit2:retrofit:2.9.0'
implementation 'com.google.protobuf:protobuf-java:3.19.4'
implementation 'com.squareup.retrofit2:converter-protobuf:2.9.0'
implementation 'com.squareup.okhttp3:okhttp:4.9.3'
implementation 'com.squareup.okhttp3:logging-interceptor:4.9.3'

2.2 Protobuf编译器安装

要使用Protobuf，需要安装protoc编译器，用于将.proto文件编译为Java类。

macOS安装：

brew install protobuf

Linux安装：

apt-get install protobuf-compiler

Windows安装： 从GitHub发布页下载预编译的二进制文件。

2.3 Maven/Gradle插件配置

为了自动化Protobuf编译过程，可以配置Maven或Gradle插件：

Maven配置：

<plugin>
    <groupId>org.xolstice.maven.plugins</groupId>
    <artifactId>protobuf-maven-plugin</artifactId>
    <version>0.6.1</version>
    <configuration>
        <protocExecutable>/usr/local/bin/protoc</protocExecutable>
        <protoSourceRoot>${project.basedir}/src/main/proto</protoSourceRoot>
    </configuration>
    <executions>
        <execution>
            <goals>
                <goal>compile</goal>
            </goals>
        </execution>
    </executions>
</plugin>

Gradle配置：

plugins {
    id 'com.google.protobuf' version '0.8.18'
}

protobuf {
    protoc {
        artifact = 'com.google.protobuf:protoc:3.19.4'
    }
    generateProtoTasks {
        all().each { task ->
            task.builtins {
                java {}
            }
        }
    }
}

三、定义Protobuf消息

3.1 创建.proto文件

在src/main/proto目录下创建user.proto文件：

syntax = "proto3";

package com.example.proto;

option java_package = "com.example.proto";
option java_multiple_files = true;

// 用户请求消息
message UserRequest {
    int32 user_id = 1;
}

// 用户详情响应消息
message UserResponse {
    int32 user_id = 1;
    string username = 2;
    string email = 3;
    
    enum UserStatus {
        UNKNOWN = 0;
        ACTIVE = 1;
        INACTIVE = 2;
        SUSPENDED = 3;
    }
    
    UserStatus status = 4;
    repeated string roles = 5;
    UserProfile profile = 6;
}

// 用户资料子消息
message UserProfile {
    string full_name = 1;
    string avatar_url = 2;
    string bio = 3;
    int64 created_at = 4; // Unix时间戳
}

3.2 编译Protobuf文件

运行Maven命令编译proto文件：

mvn protobuf:compile

或Gradle命令：

./gradlew generateProto

编译后，会在target/generated-sources/protobuf/java或build/generated/source/proto/main/java目录下生成对应的Java类。

四、配置Retrofit

4.1 创建Retrofit接口

package com.example.api;

import com.example.proto.UserRequest;
import com.example.proto.UserResponse;
import retrofit2.Call;
import retrofit2.http.Body;
import retrofit2.http.GET;
import retrofit2.http.POST;
import retrofit2.http.Path;

public interface UserApiService {
    
    // 使用POST方法发送Protobuf请求体
    @POST("users")
    Call<UserResponse> createUser(@Body UserRequest request);
    
    // 获取用户信息
    @GET("users/{userId}")
    Call<UserResponse> getUser(@Path("userId") int userId);
    
    // 批量获取用户
    @POST("users/batch")
    Call<UserResponse> batchGetUsers(@Body UserRequest request);
}

4.2 配置Retrofit客户端

package com.example.config;

import com.example.api.UserApiService;
import okhttp3.OkHttpClient;
import okhttp3.logging.HttpLoggingInterceptor;
import retrofit2.Retrofit;
import retrofit2.converter.protobuf.ProtoConverterFactory;

import java.util.concurrent.TimeUnit;

public class RetrofitConfig {

    private static final String BASE_URL = "https://api.example.com/v1/";
    
    public static UserApiService createUserApiService() {
        // 配置OkHttp客户端
        OkHttpClient client = createOkHttpClient();
        
        // 创建Retrofit实例，使用ProtoConverterFactory
        Retrofit retrofit = new Retrofit.Builder()
                .baseUrl(BASE_URL)
                .client(client)
                .addConverterFactory(ProtoConverterFactory.create())
                .build();
        
        // 创建API服务接口
        return retrofit.create(UserApiService.class);
    }
    
    private static OkHttpClient createOkHttpClient() {
        // 创建日志拦截器
        HttpLoggingInterceptor loggingInterceptor = new HttpLoggingInterceptor();
        loggingInterceptor.setLevel(HttpLoggingInterceptor.Level.BODY);
        
        // 配置OkHttp客户端
        return new OkHttpClient.Builder()
                .addInterceptor(loggingInterceptor)
                .connectTimeout(15, TimeUnit.SECONDS)
                .readTimeout(15, TimeUnit.SECONDS)
                .writeTimeout(15, TimeUnit.SECONDS)
                .build();
    }
}

4.3 自定义Protobuf请求头

服务器需要知道请求体是Protobuf格式，可以通过拦截器添加相应的Content-Type头：

public class ProtobufRequestInterceptor implements Interceptor {
    @Override
    public Response intercept(Chain chain) throws IOException {
        Request originalRequest = chain.request();
        
        // 为Protobuf请求添加特定的Content-Type
        Request newRequest = originalRequest.newBuilder()
                .header("Content-Type", "application/x-protobuf")
                .header("Accept", "application/x-protobuf")
                .build();
        
        return chain.proceed(newRequest);
    }
}

然后将此拦截器添加到OkHttpClient：

.addInterceptor(new ProtobufRequestInterceptor())

五、使用Retrofit调用Protobuf接口

5.1 同步调用示例

package com.example;

import com.example.api.UserApiService;
import com.example.config.RetrofitConfig;
import com.example.proto.UserRequest;
import com.example.proto.UserResponse;
import retrofit2.Call;
import retrofit2.Response;

import java.io.IOException;

public class UserApiClient {

    public static void main(String[] args) {
        // 创建API服务
        UserApiService userApiService = RetrofitConfig.createUserApiService();
        
        try {
            // 构建请求
            UserRequest request = UserRequest.newBuilder()
                    .setUserId(123)
                    .build();
            
            // 发起同步调用
            Call<UserResponse> call = userApiService.getUser(request.getUserId());
            Response<UserResponse> response = call.execute();
            
            if (response.isSuccessful() && response.body() != null) {
                UserResponse userResponse = response.body();
                System.out.println("用户ID: " + userResponse.getUserId());
                System.out.println("用户名: " + userResponse.getUsername());
                System.out.println("邮箱: " + userResponse.getEmail());
                System.out.println("状态: " + userResponse.getStatus());
                System.out.println("角色: " + String.join(", ", userResponse.getRolesList()));
                
                // 访问嵌套消息
                if (userResponse.hasProfile()) {
                    System.out.println("全名: " + userResponse.getProfile().getFullName());
                    System.out.println("简介: " + userResponse.getProfile().getBio());
                }
            } else {
                System.err.println("API调用失败: " + response.code());
                if (response.errorBody() != null) {
                    System.err.println(response.errorBody().string());
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

5.2 异步调用示例

// 异步调用
Call<UserResponse> call = userApiService.getUser(userId);
call.enqueue(new Callback<UserResponse>() {
    @Override
    public void onResponse(Call<UserResponse> call, Response<UserResponse> response) {
        if (response.isSuccessful() && response.body() != null) {
            UserResponse userResponse = response.body();
            // 处理响应...
        } else {
            // 处理错误...
        }
    }
    
    @Override
    public void onFailure(Call<UserResponse> call, Throwable t) {
        // 处理网络错误
        t.printStackTrace();
    }
});

5.3 批量请求示例

// 构建批量请求
UserRequest batchRequest = UserRequest.newBuilder()
        .setUserId(123) // 可以根据需要设置更多字段
        .build();

// 发起批量请求调用
Call<UserResponse> batchCall = userApiService.batchGetUsers(batchRequest);
batchCall.enqueue(new Callback<UserResponse>() {
    @Override
    public void onResponse(Call<UserResponse> call, Response<UserResponse> response) {
        if (response.isSuccessful() && response.body() != null) {
            // 处理批量响应
            UserResponse batchResponse = response.body();
            // 进一步处理...
        }
    }
    
    @Override
    public void onFailure(Call<UserResponse> call, Throwable t) {
        t.printStackTrace();
    }
});

六、性能优化与最佳实践

6.1 连接池优化

// 配置连接池
ConnectionPool connectionPool = new ConnectionPool(5, 30, TimeUnit.SECONDS);

// 将连接池添加到OkHttpClient
OkHttpClient client = new OkHttpClient.Builder()
        .connectionPool(connectionPool)
        // 其他配置...
        .build();

6.2 请求压缩

// 添加GZIP拦截器
OkHttpClient client = new OkHttpClient.Builder()
        .addInterceptor(new GzipRequestInterceptor())
        // 其他配置...
        .build();

// GZIP拦截器实现
public class GzipRequestInterceptor implements Interceptor {
    @Override
    public Response intercept(Chain chain) throws IOException {
        Request originalRequest = chain.request();
        
        // 只对POST请求进行压缩
        if (originalRequest.method().equals("POST") || originalRequest.method().equals("PUT")) {
            RequestBody originalBody = originalRequest.body();
            if (originalBody != null) {
                // 压缩请求体
                RequestBody compressedBody = new RequestBody() {
                    @Override
                    public MediaType contentType() {
                        return originalBody.contentType();
                    }
                    
                    @Override
                    public long contentLength() {
                        return -1; // 压缩后长度未知
                    }
                    
                    @Override
                    public void writeTo(BufferedSink sink) throws IOException {
                        BufferedSink gzipSink = Okio.buffer(new GzipSink(sink));
                        originalBody.writeTo(gzipSink);
                        gzipSink.close();
                    }
                };
                
                // 创建新请求
                return originalRequest.newBuilder()
                        .header("Content-Encoding", "gzip")
                        .method(originalRequest.method(), compressedBody)
                        .build();
            }
        }
        
        return chain.proceed(originalRequest);
    }
}

6.3 错误处理与重试

// 添加重试拦截器
OkHttpClient client = new OkHttpClient.Builder()
        .addInterceptor(new RetryInterceptor(3)) // 最多重试3次
        // 其他配置...
        .build();

// 重试拦截器实现
public class RetryInterceptor implements Interceptor {
    private final int maxRetries;
    
    public RetryInterceptor(int maxRetries) {
        this.maxRetries = maxRetries;
    }
    
    @Override
    public Response intercept(Chain chain) throws IOException {
        Request request = chain.request();
        Response response = null;
        IOException exception = null;
        
        int retryCount = 0;
        while (retryCount < maxRetries) {
            try {
                if (response != null) {
                    response.close();
                }
                
                // 尝试执行请求
                response = chain.proceed(request);
                
                // 如果请求成功或者是客户端错误，不再重试
                if (response.isSuccessful() || (response.code() >= 400 && response.code() < 500)) {
                    return response;
                }
                
                // 服务器错误，准备重试
                response.close();
            } catch (IOException e) {
                exception = e;
            }
            
            retryCount++;
            
            // 指数退避策略
            try {
                Thread.sleep((long) (1000 * Math.pow(2, retryCount)));
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                break;
            }
        }
        
        // 如果所有重试都失败
        if (exception != null) {
            throw exception;
        }
        
        return response;
    }
}

七、总结与最佳实践

7.1 Retrofit + Protobuf的优势

性能优势：Protobuf序列化/反序列化速度快，数据体积小，减少网络传输时间和带宽消耗。
类型安全：Retrofit提供类型安全的API调用，Protobuf提供类型安全的数据模型。
代码生成：减少手动编写序列化/反序列化代码，降低错误率。
向前兼容：Protobuf的设计支持协议演进，便于API版本管理。
多语言支持：服务端可以使用不同语言实现，只要遵循相同的.proto定义。

7.2 适用场景

高性能微服务通信：对延迟和吞吐量要求高的场景。
移动应用API：减少数据传输量，节省流量和电池消耗。
大规模分布式系统：需要高效处理大量API调用的场景。
跨语言服务集成：不同语言实现的服务之间需要高效通信。

7.3 注意事项

学习成本：相比JSON，Protobuf有一定学习曲线。
调试难度：二进制格式不如JSON直观，需要专门工具查看。
动态性：不如JSON灵活，字段需要预先定义。
工具链依赖：需要安装protoc编译器和配置构建插件。

7.4 最佳实践建议

合理设计.proto文件：遵循Protobuf最佳实践，注意字段编号管理。
版本控制：使用package和option管理不同版本的API。
连接池管理：合理配置OkHttp连接池，避免频繁创建连接。
错误处理：实现完善的错误处理和重试机制。
监控与日志：添加适当的日志记录和性能监控。
压缩传输：对大型请求/响应考虑使用GZIP压缩。

通过合理结合Retrofit和Protobuf的优势，可以构建高效、可靠的Java微服务通信系统，满足现代分布式应用的性能需求。

本地跑大模型需要的配置

2025年2月17日 · 阅读需 5 分钟

杨杨杨大侠

思维工坊合伙人

前言

在人工智能快速发展的今天，越来越多的人希望能在本地部署和运行大语言模型（LLM）。本文将为大家详细介绍运行大模型所需的硬件配置，并提供简单的计算方法，帮助新手快速了解自己的设备是否满足要求。

基本硬件要求

1. GPU（显卡）要求

显卡是运行大模型最关键的硬件，主要考虑两个指标：

VRAM（显存）大小
计算能力（CUDA核心数量）

简单计算方法：

7B参数的模型约需要14GB显存（基本公式：参数量 × 2 = 所需显存）
13B参数的模型约需要26GB显存
33B参数的模型约需要66GB显存

推荐显卡配置：

入门级：NVIDIA RTX 3060 12GB
进阶级：NVIDIA RTX 4090 24GB
专业级：NVIDIA A100 80GB

2. RAM（内存）要求

内存建议配置：

最低配置：32GB
推荐配置：64GB
理想配置：128GB

简单计算方法：

内存至少要比模型所需显存大20%
例如：运行7B模型，建议至少准备17GB内存

3. CPU要求

处理器建议：

最低配置：8核心处理器
推荐配置：16核心处理器
理想配置：32核心处理器

4. 存储空间

硬盘空间要求：

系统盘：256GB SSD
模型存储：至少1TB（建议SSD）

小白快速判断方法

显卡检查：

# Windows系统
- 右键点击桌面
- 选择NVIDIA控制面板
- 帮助 -> 系统信息
- 查看显存大小

# Linux系统
nvidia-smi

内存检查：

# Windows系统
- 任务管理器 -> 性能

# Linux系统
free -h

CPU检查：

# Windows系统
- 任务管理器 -> 性能

# Linux系统
lscpu

常见模型的最低配置表

模型大小	最低显存	推荐内存	示例模型
7B	14GB	32GB	LLaMA-7B
13B	26GB	64GB	LLaMA-13B
33B	66GB	128GB	LLaMA-33B

优化技巧

使用量化技术

INT8量化：可减少约50%显存占用
- 7B模型：约7GB显存
- 13B模型：约13GB显存
- 33B模型：约33GB显存
INT4量化：可减少约75%显存占用
- 7B模型：约3.5GB显存
- 13B模型：约6.5GB显存
- 33B模型：约16.5GB显存

实际使用经验：

使用Ollama等工具运行量化后的7B模型，8GB显存的显卡（如RTX 3070）即可流畅运行
对于13B模型，建议使用12GB以上显存的显卡（如RTX 3060）
模型响应速度会略有降低，但对于个人使用场景影响不大

使用CPU加载

可以牺牲一些速度换取更低的显存要求
适合显卡配置不足的用户

模型拆分

多GPU分布式部署
CPU+GPU混合部署

总结

选择合适的硬件配置是成功运行大模型的关键。对于新手来说，建议从较小的模型开始尝试，比如7B参数的模型，随着经验的积累再逐步尝试更大的模型。同时，可以通过量化等技术手段来降低硬件要求，使得在普通配置的电脑上也能运行大模型。

记住：硬件配置并不是越高越好，而是要根据实际需求和预算来选择合适的配置。希望这篇文章能帮助大家更好地理解运行大模型的硬件要求！

SpringAI 接入 ollama（一）

2025年2月13日 · 阅读需 4 分钟

杨杨杨大侠

思维工坊合伙人

前言

SpringAI 是 Spring 社区推出的一个用于构建 AI 应用的框架，它提供了一系列的注解和工具，帮助开发者快速构建 AI 应用。

实战

本地开启 Ollama

要在本地开启 Ollama，您需要按照以下步骤进行操作：

启动 Ollama：在终端中运行以下命令以启动 Ollama：
```
ollama run <模型名称>
```
例如，如果您想运行一个名为 "llama" 的模型，可以使用：
```
ollama run llama
```
访问模型：一旦模型启动，您可以通过 HTTP 请求与其交互。默认情况下，Ollama 会在 http://localhost:8080 上提供服务。

测试模型：您可以使用 curl 命令或 Postman 等工具发送请求来测试模型。例如，使用 curl 发送请求：

curl -X POST http://localhost:8080/generate -H "Content-Type: application/json" -d '{"prompt": "你好，Ollama！", "max_tokens": 50}'

查看结果：Ollama 将返回生成的文本，您可以在终端或工具中查看。

通过以上步骤，您可以在本地成功开启并使用 Ollama 进行 AI 应用开发。

创建 Java 应用

要创建一个整合 SpringAI 的 Java 应用，您可以使用 Maven 或 Gradle 作为构建工具。以下是使用 Maven 创建项目的步骤：

创建 Maven 项目：在终端中运行以下命令以创建一个新的 Maven 项目：

mvn archetype:generate -DgroupId=com.example -DartifactId=springai-app -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

添加依赖：在 springai-app/pom.xml 文件中，添加 SpringAI 和其他必要的依赖：(JDK 最低版本 17)

 <properties>
     <java.version>17</java.version>
 </properties>

 <dependencyManagement>
     <dependencies>
         <dependency>
             <groupId>org.springframework.ai</groupId>
             <artifactId>spring-ai-bom</artifactId>
             <version>0.8.0</version>
             <type>pom</type>
             <scope>import</scope>
         </dependency>
     </dependencies>
 </dependencyManagement>

 <dependencies>
     <dependency>
         <groupId>org.springframework.boot</groupId>
         <artifactId>spring-boot-starter-web</artifactId>
     </dependency>

     <dependency>
         <groupId>org.springframework.boot</groupId>
         <artifactId>spring-boot-starter-actuator</artifactId>
     </dependency>

     <dependency>
         <groupId>org.springframework.ai</groupId>
         <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
     </dependency>

     <dependency>
         <groupId>org.springframework.boot</groupId>
         <artifactId>spring-boot-starter-test</artifactId>
         <scope>test</scope>
     </dependency>

     <dependency>
         <groupId>org.projectlombok</groupId>
         <artifactId>lombok</artifactId>
     </dependency>
 </dependencies>

创建主应用程序类：在 src/main/java/com/example 目录下创建 Application.java 文件，并添加以下代码：

package com.example;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class Application {

   public static void main(String[] args) {
      SpringApplication.run(Application.class, args);
   }

}

创建控制器：在 src/main/java/com/example 目录下创建 ChatController.java 文件，并添加以下代码：

package com.example;

import lombok.RequiredArgsConstructor;
import org.springframework.ai.chat.ChatClient;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.Objects;

@RestController
@RequestMapping("/chat")
@RequiredArgsConstructor(onConstructor = @__(@Autowired))
public class ChatController {

   private final ChatClient chatClient;

   // 简单使用文本聊天
   @RequestMapping(value = "message")
   public Object msg(String prompt) {
      // 根据参数调用 chatClient call 方法
      String result = chatClient.call(prompt);
      return String.format("%s:%s",message,result);
   }
}

# Spring Boot application properties
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.options.model=codellama:13b-instruct
# Other configurations can be added as needed

运行应用：在项目根目录下，运行以下命令以启动应用：
```
mvn spring-boot:run
```

测试应用：您可以使用 curl 或 Postman 发送请求来测试生成接口：

curl -X POST http://localhost:8080/chat/message -H "Content-Type: application/json" -d '{"prompt": "你好，SpringAI！"}'

通过以上步骤，您可以成功创建一个整合 SpringAI 的 Java 应用。

总结

通过以上步骤，您可以成功创建一个整合 SpringAI 的 Java 应用。首先，您需要创建一个控制器 ChatController，该控制器使用 ChatClient 来处理文本聊天请求。然后，您可以通过运行 Maven 命令启动应用，并使用 curl 或 Postman 测试生成接口。这样，您就可以在本地环境中体验 SpringAI 的强大功能。

小白本地也可以部署大模型！

2025年2月12日 · 阅读需 4 分钟

杨杨杨大侠

思维工坊合伙人

引言

随着大语言模型（LLM）的发展，像 ChatGPT、Llama、Mistral 这样的模型已经能在很多任务上提供强大的 AI 支持。然而，很多人以为只有云端可以运行大模型，其实本地同样可以部署！

本篇文章面向小白，详细介绍如何使用 Ollama 在自己的电脑上部署大模型，体验本地 AI 助手的乐趣。

1. 为什么要本地部署大模型？

相比于在线 API，本地部署有以下优点：

隐私保护：无需将数据上传至云端，确保个人信息安全。
无 API 限制：不受 API 调用次数、速率等限制。
低延迟：本地推理避免了网络请求延迟，响应速度更快。
完全离线：适用于无网络环境或对网络安全要求高的场景。

当然，本地部署的门槛比云端 API 略高，需要一定的计算资源和环境配置。

2. 部署前的准备

2.1 你的电脑能跑大模型吗？

大模型的计算需求较高，建议的最低配置如下：

设备类型	CPU	显卡 (GPU)	内存 (RAM)
最低要求	i5 以上	无 GPU / 4GB VRAM	8GB
推荐配置	i7 / Ryzen 7	RTX 3060（12GB VRAM）	16GB
高性能配置	i9 / Ryzen 9	RTX 4090（24GB VRAM）	32GB

如果没有独立显卡，Ollama 也支持 CPU 运行，但速度会较慢。

2.2 安装 Ollama

Ollama 是一个简化大模型本地部署的工具，支持 Windows、macOS 和 Linux。

下载 Ollama：官网 Ollama
安装 Ollama：根据不同系统的安装包进行安装。
验证安装：安装完成后，在终端运行：
```
ollama --version
```
如果成功显示版本号，说明安装成功。

3. 下载并运行大模型

Ollama 预装了一些流行的开源大模型，如 Llama3、Mistral、Gemma 等。

3.1 下载 Llama3 模型

在终端执行：

ollama pull llama3

等待模型下载完成。

3.2 运行大模型

直接运行：

ollama run llama3

然后你可以直接在终端与模型对话！

3.3 通过 API 调用 Ollama

如果想在自己的应用中使用 Ollama，可以启用 API：

curl http://localhost:11434/api/generate -d '{"model": "llama3", "prompt": "你好，大模型！"}'

或者使用 Python：

import requests

response = requests.post("http://localhost:11434/api/generate", json={"model": "llama3", "prompt": "你好，大模型！"})
print(response.json()["response"])

4. 让本地模型更好用

4.1 Web 界面管理

Ollama 可与 Ollama Web UI 结合使用：

git clone https://github.com/jmorganca/ollama-web.git
cd ollama-web
npm install
npm run dev

然后访问 http://localhost:3000 即可使用。

4.2 自定义模型

如果想微调自己的模型，可以创建 Modelfile：

FROM llama3
PARAMETER temperature 0.7

然后运行：

ollama create mymodel -f Modelfile

之后就可以 ollama run mymodel 了。

5. 总结

使用 Ollama 可以让小白也能轻松在本地部署大模型，核心流程是：

安装 Ollama（简单易用，支持多平台）
下载模型（Llama3、Mistral、Gemma 等）
运行和交互（终端或 API 方式调用）
优化体验（Web UI、自定义模型等）

本地部署让我们更自由地探索 AI，赶快试试吧！

引言​

MCP服务器是什么？​

MCP架构的核心组件​

MCP服务器的关键功能​

1. 智能代码服务​

2. 会话和状态管理​

3. AI交互优化​

为什么需要MCP服务器？​

MCP服务器实现示例​

MCP客户端、服务器与AI服务的工作流程​

Cursor与MCP服务器的关系​

结论​

参考资源​

MCP架构概述​

准备工作​

项目初始化​

项目依赖​

项目结构​

核心组件实现​

1. 数据模型设计​

2. 服务层实现​

3. REST API控制器​

4. 安全配置​

5. 配置属性​

实际案例：集成外部模型​

使用DJL (Deep Java Library)集成HuggingFace模型​

高级功能拓展​

1. 负载均衡和并发控制​

2. 模型缓存和内存管理​

3. 监控和指标​

部署和扩展​

Docker化部署​

Kubernetes部署​

最佳实践总结​

结论​

参考资源​

一、技术背景​

1.1 Retrofit简介​

1.2 Protobuf简介​

1.3 为什么结合使用Retrofit和Protobuf？​

二、环境准备​

2.1 项目依赖​

2.2 Protobuf编译器安装​

2.3 Maven/Gradle插件配置​

三、定义Protobuf消息​

3.1 创建.proto文件​

3.2 编译Protobuf文件​

四、配置Retrofit​

4.1 创建Retrofit接口​

4.2 配置Retrofit客户端​

4.3 自定义Protobuf请求头​

五、使用Retrofit调用Protobuf接口​

5.1 同步调用示例​

5.2 异步调用示例​

5.3 批量请求示例​

六、性能优化与最佳实践​

6.1 连接池优化​

6.2 请求压缩​

6.3 错误处理与重试​

七、总结与最佳实践​

7.1 Retrofit + Protobuf的优势​

7.2 适用场景​

7.3 注意事项​

7.4 最佳实践建议​

前言​

基本硬件要求​

1. GPU（显卡）要求​

2. RAM（内存）要求​

3. CPU要求​

4. 存储空间​

小白快速判断方法​

常见模型的最低配置表​

优化技巧​

总结​

前言​

实战​

本地开启 Ollama​

创建 Java 应用​

总结​

引言​

引言

MCP服务器是什么？

MCP架构的核心组件

MCP服务器的关键功能

1. 智能代码服务

2. 会话和状态管理

3. AI交互优化

为什么需要MCP服务器？

MCP服务器实现示例

MCP客户端、服务器与AI服务的工作流程

Cursor与MCP服务器的关系

结论

参考资源

MCP架构概述

准备工作

项目初始化

项目依赖

项目结构

核心组件实现

1. 数据模型设计

2. 服务层实现

3. REST API控制器

4. 安全配置

5. 配置属性

实际案例：集成外部模型

使用DJL (Deep Java Library)集成HuggingFace模型

高级功能拓展

1. 负载均衡和并发控制

2. 模型缓存和内存管理

3. 监控和指标

部署和扩展

Docker化部署

Kubernetes部署

最佳实践总结

结论

参考资源

一、技术背景

1.1 Retrofit简介

1.2 Protobuf简介

1.3 为什么结合使用Retrofit和Protobuf？

二、环境准备

2.1 项目依赖

2.2 Protobuf编译器安装

2.3 Maven/Gradle插件配置

三、定义Protobuf消息

3.1 创建.proto文件

3.2 编译Protobuf文件

四、配置Retrofit

4.1 创建Retrofit接口

4.2 配置Retrofit客户端

4.3 自定义Protobuf请求头

五、使用Retrofit调用Protobuf接口

5.1 同步调用示例

5.2 异步调用示例

5.3 批量请求示例

六、性能优化与最佳实践

6.1 连接池优化

6.2 请求压缩

6.3 错误处理与重试

七、总结与最佳实践

7.1 Retrofit + Protobuf的优势

7.2 适用场景

7.3 注意事项

7.4 最佳实践建议

前言

基本硬件要求

1. GPU（显卡）要求

2. RAM（内存）要求

3. CPU要求

4. 存储空间

小白快速判断方法

常见模型的最低配置表

优化技巧

总结

前言

实战

本地开启 Ollama

创建 Java 应用

总结

引言