What is the webchat look like with EJabberd, an XMPP application server, look like?
(Only 10$/h or 10% market price I provide Full Business Software Solution: Mobile App and Web App)


1. Log in to an XMPP server, here I use my EJabberd server, http://112.213.94.35:5280/http-bind , with bosh service (XMPP Over BOSH).

2. Contact requests from test2@rr1 that display on test1@rr1
3. test2@rr1 contact name be display on list contact of test1@rr1, after test1@rr1 accept the contact request that come from test2@rr1

4.  test1@rr1 be in contact list of test2@rr1
5. test2@rr1 sent a meassage to test1@rr1

6. The massage displayed at test1@rr1 after send from test2@rr1
7. Thanks message from test1@rr1 to test2@rr1

8. The thanks message displayed at test2@rr1 side
9. EJabberd, XMPP application server, admin management site look like

B. How to setup and configure an XMPP server by EJabberd?
10. Browse to http://www.process-one.net/en/ejabberd/archive/ to download EJabberd server setup application for Windows operation system
11.  Click to run Ejabberd setup file for Windows
12.  Welcome to the EJabberd Setup Wizard
13.  Accept the License Agreement

14.  Installation Directory

15. EJabberd server domain

16. Administrator username

17.  Administrator password

18. Can be choose stand alone or a node of a cluster

19. Ready to install EJabberd server

20.  Installing EJabberd server

21.  Start EJabberd

22. Starting ejabberd ...

23. Click on admin interface to browse to admin site of EJabberd

24. Log in to admin site of EJabberd server


25.  The admin site of EJabberd server look like
26.  Detail of a Virtual Hosts look like
27. Download XMPP javascript source and example from http://conversejs.org/

28.  Open ejabberd configure file to change local ip to public ip

29.  Need change from localhost ip to a public ip if we want to work with world wide client application

30. Whenever want to change the bosh server
from https://bind.opkode.im to http://yourServerIP:5280/http-bind

31. Custom EJabberd for standard template of BlogSpot, add css code (style.css) that download from Conversejs.org
32.  Add Chat Panel and script to display the panel and bind to posh service of Ejabberd
33. Chat panel displays look like


Using ejabberd with MySQL native driver


You can test the xmpp web chat from the blog

God bless us!

Thanks God and Thanks all!

Only 10$/h or 10% market price I provide full Business Software Solution: mobile app (iOS and Android), Intranet Portal by SharePoint Online or On-Premise, Business Web App, Business Workflow, Power Bi, E-Office, Document Management, Chat system, Customer Interaction Center, Dynamics AX, Dynamics 365.

Or very happy to join your team to develop Mobile App or Business Web App.

Thank you!


Phone: +60 115 199 2689 (also on WhatsApp, Viber, Zalo)
Email or Skype: ThomasTrungVo@Hotmail.com
Gmail: SharePointTaskMaster@Gmail.com
LinkedIn: https://www.linkedin.com/in/abc365/

5

View comments

  1. please give me complete source code.

    when i run github code in my browser then toggle button is not popup so please help me .i cant do anything from index.html file

    ReplyDelete
  2. Dear Kaushik,

    Please see more detail with step 31, 32, and 33 with simple template of blogspot

    God bless us!

    ReplyDelete
  3. do we need to change style.css ? is it mandatory ?

    ReplyDelete
  4. i am using public IP address for ejabberd chat but now want to shift to Domain But my previous apps are using IP address and new are using Domain. So i want chat works with IP address to Domain as domain is mapping with that IP. I am able to do chat using IP address and also with Domain in 2 devices. But not with IP in one device and domain in another. Please help me.

    ReplyDelete
  5. it's possible to use port 80 or 443 with ejabberd?¿ or jwchat

    ReplyDelete
  1. Bring the power of Gemini 2.0 to help you build applications faster and easier than ever before.

    Key Features

    - Coding assistance powered by Gemini 2.0.

    - Contextualized responses to your prompts to help guide you on what you're trying to do with your code.

    - Source citations regarding which documentation and code samples the assistant used to generate its responses.

    - Automated code reviews in GitHub with Gemini Code Assist for GitHub.

    - Also available in Firebase, Android Studio, VS Code, IntelliJ, Google Cloud Databases, BigQuery, Apigee, and more.



    Highlights

    • Through a natural language chat interface, you can quickly chat with Gemini Code Assist to get answers to cloud questions, or receive guidance on best practices.

    • Whether you are writing apps, calling APIs, or querying data, Gemini Code Assist can help complete your code while you write, or generate code blocks based on comments.

    • Gemini Code Assist can help you both generate code (including sharing citations) as well as debug code to get your app up and running in no time.

    • As you write functions and applications, Gemini Code Assist can also help you generate unit tests to help you make your code more robust and increase your test coverage, thereby reducing the risk of failures in production.

    • Gemini Code Assist can help you with development practices across most popular languages. In all Gemini Code Assist supports 20+ languages including Bash, C, C++, C#, Dart, Go, GoogleSQL, Java, JavaScript, Kotlin, Lua, MatLab, PHP, Python, R, Ruby, Rust, Scala, SQL, Swift, TypeScript, YAML

    • Gemini Code Assist also supports code infrastructure interfaces including gCloud CLI, KRM and Terraform making it easy for you to interact with various infrastructure layers.


      Generative AI, Robot Operating System (ROS 2), Computer Vision, Natural Language Processing service, Generative AI Chatbot, Machine Learning, Mobile App, Web App? Yes, I do provide!


      Call me: +84854147015

      WhatsApp: +601151992689

      https://amatasiam.web.app

      Email: ThomasTrungVo@Gmail.Com

      Facebook: 
      https://www.facebook.com/voduytrung

      X: 
      https://x.com/ThomasTrung







    0

    Add a comment

  2. Gemeni AI answers:



    Okay, I understand you want to use DeepSeek AI within your Flutter application. Since there isn't a specific official DeepSeek AI SDK for Flutter or Dart, you'll need to interact with the DeepSeek AI API using HTTP requests. Here's a breakdown of how to do it, along with some important considerations:

    Core Approach: Using HTTP Requests

    1. DeepSeek AI API: You'll be interacting with DeepSeek AI through their API. You'll need to refer to their official API documentation to understand the specific endpoints, request formats (JSON), authentication methods, and response structures.

    2. Flutter's http Package: Flutter provides the http package (a popular choice) to make HTTP requests. This package will allow you to send requests to the DeepSeek AI API and receive responses.

    3. Authentication: DeepSeek AI will likely require some form of authentication, such as an API key. You'll need to obtain an API key from them and include it in your HTTP requests.

    4. Request Structure: You'll format your requests as JSON objects. The structure of these objects will depend on what you want to do with the DeepSeek AI (e.g., generate text, analyze code, etc.).

    5. Response Handling: The DeepSeek API will send responses, also likely in JSON format. You'll need to parse these responses in your Flutter code to extract the information you need.


    Step-by-Step Guide

    Here's a more detailed breakdown of the steps involved:

    1. Set up DeepSeek AI Account and API Key

    Create an Account: If you haven't already, sign up for a DeepSeek AI account.

    Obtain API Key: Navigate to the API section of your DeepSeek AI account dashboard and create/retrieve your API key. Keep this key secure.


    2. Add http Package to Your Flutter Project

    . Open your pubspec.yaml file.

    . Under dependencies, add:

    dependencies:
    flutter:
    sdk: flutter
    http: ^1.1.0 # Check for the latest version on pub.dev

    . Run flutter pub get in your terminal to install the package. 


    3. DeepSeek AI API Documentation

    . Carefully review the official DeepSeek AI API documentation. Pay close attention to the following:

        - Base URL: The base URL for their API endpoints.

        - Authentication: How to include your API key in requests (e.g., in headers).

        - Endpoints: The specific URLs for the different features (e.g., text generation, code completion).

        - Request Format: How to format your requests in JSON.

        - Response Format: How the responses will be structured in JSON.

    . In the current version, DeepSeek AI requires authorization.

    Authorization: "Bearer <your_api_key>"


    4. Create Dart Functions to Interact with DeepSeek AI

    Here's an example of how you might write a Dart function to send a text generation request to the DeepSeek API:

    import 'dart:convert';
    import 'package:http/http.dart' as http;

    class DeepSeekService {
    final String apiKey = 'YOUR_DEEPSEEK_API_KEY'; // Replace with your actual API key
    final String baseUrl = 'https://api.deepseek.com'; // Replace with the correct base URL

    Future<String> generateText(String prompt) async {
    final url = Uri.parse('$baseUrl/v1/chat/completions'); // or the correct endpoint
    final headers = {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer $apiKey',
    };

    final body = jsonEncode({
    "model": "deepseek-chat",
    "messages": [
    {
    "role": "user",
    "content": prompt
    }
    ]
    });

    try {
    final response = await http.post(url, headers: headers, body: body);

    if (response.statusCode == 200) {
    final responseData = jsonDecode(response.body);
    // Adjust this based on the actual DeepSeek response structure
    return responseData['choices'][0]['message']['content'] ?? "Error";
    } else {
    print('Error: ${response.statusCode}');
    print('Response body: ${response.body}');
    return 'Error generating text. Status code: ${response.statusCode}';
    }
    } catch (e) {
    print('Exception: $e');
    return 'An error occurred: $e';
    }
    }
    }


    5. Use the DeepSeek AI Service in Your Flutter UI

    import 'package:flutter/material.dart';
    import 'deepseek_service.dart'; // Import your DeepSeek service file

    void main() => runApp(MyApp());

    class MyApp extends StatelessWidget {
    @override
    Widget build(BuildContext context) {
    return MaterialApp(
    home: DeepSeekDemo(),
    );
    }
    }

    class DeepSeekDemo extends StatefulWidget {
    @override
    _DeepSeekDemoState createState() => _DeepSeekDemoState();
    }

    class _DeepSeekDemoState extends State<DeepSeekDemo> {
    final DeepSeekService deepSeekService = DeepSeekService();
    final TextEditingController _textEditingController = TextEditingController();
    String _generatedText = '';

    @override
    Widget build(BuildContext context) {
    return Scaffold(
    appBar: AppBar(
    title: Text('DeepSeek AI Demo'),
    ),
    body: Padding(
    padding: const EdgeInsets.all(16.0),
    child: Column(
    children: [
    TextField(
    controller: _textEditingController,
    decoration: InputDecoration(
    hintText: 'Enter your prompt',
    ),
    ),
    SizedBox(height: 20),
    ElevatedButton(
    onPressed: () async {
    String prompt = _textEditingController.text;
    String result = await deepSeekService.generateText(prompt);
    setState(() {
    _generatedText = result;
    });
    },
    child: Text('Generate'),
    ),
    SizedBox(height: 20),
    Expanded(
    child: SingleChildScrollView(
    child: Text(_generatedText),
    ),
    ),
    ],
    ),
    ),
    );
    }
    }


    Important Considerations

    Error Handling: The provided example includes basic error handling, but you should implement more robust error handling in a production app. This includes network errors, API errors (e.g., authentication failures, rate limits), and handling unexpected response formats.

    Rate Limits: Be aware of DeepSeek AI's API rate limits. If you exceed them, your API calls will be blocked. You may need to implement rate limiting or queuing in your app.

    Security: Never hardcode your API key directly into your code. Use environment variables or a secure configuration system to manage sensitive data.

    API Changes: APIs can change. Monitor the DeepSeek AI API documentation for any changes, and update your code accordingly.

    Async Operations: Remember that making API calls are asynchronous operations. Use async and await appropriately to handle these calls without blocking the UI.

    UI/UX: Provide good feedback to the user while waiting for API responses (e.g., a loading indicator).

    Summary

    Integrating DeepSeek AI into Flutter requires direct interaction with their API using HTTP requests. The http package is your primary tool. Remember to consult the DeepSeek AI API documentation, handle errors properly, and keep security in mind. Let me know if you have any other questions.


    Generative AI, Robot Operating System (ROS 2), Computer Vision, Natural Language Processing service, Generative AI Chatbot, Machine Learning, Mobile App, Web App? Yes, I do provide!


    Call me: +84854147015

    WhatsApp: +601151992689

    https://amatasiam.web.app

    Email: ThomasTrungVo@Gmail.Com

    Facebook: 
    https://www.facebook.com/voduytrung

    X: 
    https://x.com/ThomasTrung

    0

    Add a comment

  3.  

    How is the Gemini AI look like on Android Studio Meerkat?










    How to use DeepSeek AI with flutter?




    Generative AI, Robot Operating System (ROS 2), Computer Vision, Natural Language Processing service, Generative AI Chatbot, Machine Learning, Mobile App, Web App? Yes, I do provide!


    Call me: +84854147015

    WhatsApp: +601151992689

    https://amatasiam.web.app

    Email: ThomasTrungVo@Gmail.Com

    Facebook: 
    https://www.facebook.com/voduytrung

    X: 
    https://x.com/ThomasTrung




    0

    Add a comment

  4.  


    0

    Add a comment

  5.  Introducing DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Our model series is composed of three variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small and DeepSeek-VL2, with 1.0B, 2.8B and 4.5B activated parameters respectively. DeepSeek-VL2 achieves competitive or state-of-the-art performance with similar or fewer activated parameters compared to existing open-source dense and MoE-based models.



    DeepSeek-VL2 bao gồm một loạt các Mô hình Ngôn ngữ-Thị giác Hỗn hợp Chuyên gia (MoE) lớn tiên tiến cải thiện đáng kể so với phiên bản trước đó, DeepSeek-VL. DeepSeek-VL2 chứng minh khả năng vượt trội trong nhiều tác vụ khác nhau, bao gồm nhưng không giới hạn về việc trả lời câu hỏi trực quan, nhận dạng ký tự quang học, hiểu tài liệu/bảng/biểu đồ và nền tảng trực quan. Loạt mô hình của chúng tôi bao gồm ba biến thể: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small và DeepSeek-VL2, với các tham số kích hoạt lần lượt là 1.0B, 2.8B và 4.5B. DeepSeek-VL2 đạt được hiệu suất cạnh tranh hoặc tiên tiến với các tham số kích hoạt tương tự hoặc ít hơn so với các mô hình dày đặc nguồn mở và dựa trên MoE hiện có.


    https://github.com/deepseek-ai/DeepSeek-VL2



    Generative AI, Robot Operating System (ROS 2), Computer Vision, Natural Language Processing service, Generative AI Chatbot, Machine Learning, Mobile App, Web App? Yes, I do provide!


    Call me: +84854147015

    WhatsApp: +601151992689

    https://amatasiam.web.app

    Email: ThomasTrungVo@Gmail.Com

    Facebook: 
    https://www.facebook.com/voduytrung

    X: 
    https://x.com/ThomasTrung






    0

    Add a comment

  6.  Introducing DeepSeek LLM (Large Language Model), an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.




    1. Superior General Capabilities: DeepSeek LLM 67B Base outperforms Llama2 70B Base in areas such as reasoning, coding, math, and Chinese comprehension.

    2. Proficient in Coding and Math: DeepSeek LLM 67B Chat exhibits outstanding performance in coding (HumanEval Pass@1: 73.78) and mathematics (GSM8K 0-shot: 84.1, Math 0-shot: 32.6). It also demonstrates remarkable generalization abilities, as evidenced by its exceptional score of 65 on the Hungarian National High School Exam.

    3. Mastery in Chinese Language: Based on our evaluation, DeepSeek LLM 67B Chat surpasses GPT-3.5 in Chinese.


    We release the DeepSeek LLM 7B/67B, including both base and chat models, to the public. To support a broader and more diverse range of research within both academic and commercial communities, we are providing access to the intermediate checkpoints of the base model from its training process. Please note that the use of this model is subject to the terms outlined in License section. Commercial usage is permitted under these terms.

    https://huggingface.co/deepseek-ai/deepseek-llm-7b-base
    https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat
    https://huggingface.co/deepseek-ai/deepseek-llm-67b-base
    https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat



    DeepSeek LLM (Large Language Model) là một mô hình ngôn ngữ lớn nâng cao bao gồm 67 tỷ tham số. Nó đã được đào tạo từ đầu trên một tập dữ liệu khổng lồ gồm 2 nghìn tỷ mã thông báo bằng cả tiếng Anh và tiếng Trung. Để thúc đẩy nghiên cứu, chúng tôi đã tạo ra DeepSeek LLM 7B/67B Base và DeepSeek LLM 7B/67B Chat mã nguồn mở cho cộng đồng nghiên cứu.


    1. Khả năng vượt trội: DeepSeek LLM 67B Base vượt trội hơn Llama2 70B Base trong các lĩnh vực như lý luận, mã hóa, toán học và hiểu tiếng Trung.

    2. Thành thạo về mã hóa và toán học: DeepSeek LLM 67B Chat thể hiện hiệu suất vượt trội trong mã hóa (HumanEval Pass@1: 73,78) và toán học (GSM8K 0-shot: 84,1, Toán 0-shot: 32,6). Nó cũng thể hiện khả năng khái quát đáng chú ý, bằng chứng là điểm số đặc biệt là 65 trong Kỳ thi trung học phổ thông quốc gia Hungary.

    3. Thành thạo tiếng Trung: Dựa trên đánh giá của chúng tôi, DeepSeek LLM 67B Chat vượt qua GPT-3.5 ở tiếng Trung.

    Chúng tôi phát hành DeepSeek LLM 7B/67B, bao gồm cả mô hình cơ sở và mô hình trò chuyện, cho công chúng. Để hỗ trợ phạm vi nghiên cứu rộng hơn và đa dạng hơn trong cả cộng đồng học thuật và thương mại, chúng tôi cung cấp quyền truy cập vào các điểm kiểm tra trung gian của mô hình cơ sở từ quy trình đào tạo của mô hình này. Xin lưu ý rằng việc sử dụng mô hình này phải tuân theo các điều khoản được nêu trong phần Giấy phép. Việc sử dụng thương mại được phép theo các điều khoản này.

    https://huggingface.co/deepseek-ai/deepseek-llm-7b-base
    https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat
    https://huggingface.co/deepseek-ai/deepseek-llm-67b-base
    https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat



    Generative AI, Robot Operating System (ROS 2), Computer Vision, Natural Language Processing service, Generative AI Chatbot, Machine Learning, Mobile App, Web App? Yes, I do provide!


    Call me: +84854147015

    WhatsApp: +601151992689

    https://amatasiam.web.app

    Email: ThomasTrungVo@Gmail.Com

    Facebook: 
    https://www.facebook.com/voduytrung

    X: 
    https://x.com/ThomasTrung


    0

    Add a comment


  7.  

    When using expert parallelism (EP), different experts are assigned to different GPUs. Because the load of different experts may vary depending on the current workload, it is important to keep the load of different GPUs balanced. As described in the DeepSeek-V3 paper, we adopt a redundant experts strategy that duplicates heavy-loaded experts. Then, we heuristically pack the duplicated experts to GPUs to ensure load balancing across different GPUs. Moreover, thanks to the group-limited expert routing used in DeepSeek-V3, we also attempt to place the experts of the same group to the same node to reduce inter-node data traffic, whenever possible.

    To facilitate reproduction and deployment, we open-source our deployed EP load balancing algorithm in eplb.py. The algorithm computes a balanced expert replication and placement plan based on the estimated expert loads. Note that the exact method to predict the loads of experts is out of this repo's scope. A common method is to use moving average of historical statistics.

    Code of EPLB.py




    The Algorithm

    The load balancing algorithm comes with two policies used for different cases.

    Hierarchical Load Balancing

    When the number of server nodes divides the number of expert groups, we use the hierarchical load balancing policy to harness the group-limited expert routing. We first pack the expert groups to nodes evenly, ensuring the loads of different nodes are balanced. Then, we replicate the experts within each node. Finally, we pack the replicated experts to individual GPUs to ensure different GPUs are load-balanced. The hierarchical load balancing policy can be used in prefilling stage with a smaller expert-parallel size.

    Global Load Balancing

    In other cases, we use the global load balancing policy that replicates the experts globally regardless of expert groups, and pack the replicated experts to individual GPUs. This policy can be adopted in decoding stage with a larger expert-parallel size.



    https://github.com/deepseek-ai/EPLB/


    Generative AI, Robot Operating System (ROS 2), Computer Vision, Natural Language Processing service, Generative AI Chatbot, Machine Learning, Mobile App, Web App? Yes, I do provide!


    Call me: +84854147015

    WhatsApp: +601151992689

    https://amatasiam.web.app

    Email: ThomasTrungVo@Gmail.Com

    Facebook: 
    https://www.facebook.com/voduytrung

    X: 
    https://x.com/ThomasTrung


    0

    Add a comment

  8.  DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B parameters. It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation. It is trained from scratch on 2T English and Chinese tokens, and exhibits comparable performance with DeekSeek 7B and LLaMA2 7B, with only about 40% of computations. For research purposes, we release the model checkpoints of DeepSeekMoE 16B Base and DeepSeekMoE 16B Chat to the public, which can be deployed on a single GPU with 40GB of memory without the need for quantization.


    The model code file



    https://github.com/deepseek-ai/DeepSeek-MoE



    Generative AI, Robot Operating System (ROS 2), Computer Vision, Natural Language Processing service, Generative AI Chatbot, Machine Learning, Mobile App, Web App? Yes, I do provide!


    Call me: +84854147015

    WhatsApp: +601151992689

    https://amatasiam.web.app

    Email: ThomasTrungVo@Gmail.Com

    Facebook: 
    https://www.facebook.com/voduytrung

    X: 
    https://x.com/ThomasTrung


    0

    Add a comment

  9. Features

    🚀 High-performance data processing powered by DuckDB

    🌍 Scalable to handle PB-scale datasets

    🛠️ Easy operations with no long-running services 


    Performance

    We evaluated smallpond using the GraySort benchmark (script) on a cluster comprising 50 compute nodes and 25 storage nodes running 3FS. The benchmark sorted 110.5TiB of data in 30 minutes and 14 seconds, achieving an average throughput of 3.66TiB/min.


    ( DuckDB is an open-source column-oriented relational database management system (RDBMS). It is designed to provide high performance on complex queries against large databases in embedded configuration, such as combining tables with hundreds of columns and billions of rows. Unlike other embedded databases (for example, SQLite) DuckDB is not focusing on transactional (OLTP) applications and instead is specialized for online analytical processing (OLAP) workloads. )


    Smallpond provides both high-level and low-level APIs

    Currently, smallpond provides two different APIs, supporting dynamic and static construction of data flow graphs respectively. Due to historical reasons, these two APIs use different scheduler backends and support different configuration options.

    1. The High-level API currently uses Ray as the backend, supporting dynamic construction and execution of data flow graphs.

    2. The Low-level API uses a built-in scheduler and only supports one-time execution of static data flow graphs. However, it offers more performance optimizations and richer configuration options.

    We are working to merge them so that in the future, you can use a unified high-level API and freely choose between Ray or the built-in scheduler.


    https://github.com/deepseek-ai/smallpond


    Generative AI, Robot Operating System (ROS 2), Computer Vision, Natural Language Processing service, Generative AI Chatbot, Machine Learning, Mobile App, Web App? Yes, I do provide!


    Call me: +84854147015

    WhatsApp: +601151992689

    https://amatasiam.web.app

    Email: ThomasTrungVo@Gmail.Com

    Facebook: 
    https://www.facebook.com/voduytrung

    X: 
    https://x.com/ThomasTrung





    0

    Add a comment

  10.  DeepGEMM is a library designed for clean and efficient FP8 General Matrix Multiplications (GEMMs) with fine-grained scaling, as proposed in DeepSeek-V3. It supports both normal and Mix-of-Experts (MoE) grouped GEMMs. Written in CUDA, the library has no compilation need during installation, by compiling all kernels at runtime using a lightweight Just-In-Time (JIT) module.

    Currently, DeepGEMM exclusively supports NVIDIA Hopper tensor cores. To address the imprecise FP8 tensor core accumulation, it employs CUDA-core two-level accumulation (promotion). While it leverages some concepts from CUTLASS and CuTe, it avoids heavy reliance on their templates or algebras. Instead, the library is designed for simplicity, with only one core kernel function comprising around ~300 lines of code. This makes it a clean and accessible resource for learning Hopper FP8 matrix multiplication and optimization techniques.

    Despite its lightweight design, DeepGEMM's performance matches or exceeds expert-tuned libraries across various matrix shapes.

    (CUTLASS is a collection of CUDA C++ template abstractions for implementing high-performance matrix-matrix multiplication (GEMM) and related computations at all levels and scales within CUDA. )

    (CuTe is a collection of C++ CUDA template abstractions for defining and operating on hierarchically multidimensional layouts of threads and data. )


    Performance

    We test all shapes potentially used in DeepSeek-V3/R1 inference (including both prefilling and decoding, but without tensor parallelism) on H800 SXM5 with NVCC 12.8. All speedup metrics are calculated in comparison to our internally and carefully optimized implementation based on CUTLASS 3.6.

    DeepGEMM does not behave very well on some shapes, optimization PRs are welcomed if you are interested.

    (Nvidia CUDA Compiler (NVCC) is a compiler by Nvidia intended for use with CUDA. )

    https://github.com/deepseek-ai/DeepGEMM



    Generative AI, Robot Operating System (ROS 2), Computer Vision, Natural Language Processing service, Generative AI Chatbot, Machine Learning, Mobile App, Web App? Yes, I do provide!


    Call me: +84854147015

    WhatsApp: +601151992689

    https://amatasiam.web.app

    Email: ThomasTrungVo@Gmail.Com

    Facebook: 
    https://www.facebook.com/voduytrung

    X: 
    https://x.com/ThomasTrung



    0

    Add a comment

About Me
About Me
My Photo
Vietnam
21 years experience in Mobile App (iOS and Android), Business Web Application. I have been developing with 26 business web application and Mobile App projects. Call me whenever you need a Mobile App (iOS and Android), Portal Solution by Office 365 (SharePoint Online) or by SharePoint On-Premise 2019/2016/2013/2010/2007 or an Integration SharePoint Solution with Dynamics AX, Dynamics 365, Power Bi, Digimind Social Medial Analytic Monitoring, EJabberd XMPP server Chat System, Forefront UAG, PDCA, Budget Request, LPG Bulk Transport Operations Application Solution, Project Server, Reporting Service, CRM Call Center, Dynamics CRM, Customer Interaction Center. Viet Nam: +84854147015 Malaysia: +601151992689 (also with WhatsApp) Linkedin: https://www.linkedin.com/in/abc365/ Email: ThomasTrungVo@Hotmail.com, SharePointTaskMaster@Gmail.com Skype: ThomasTrungVo@Hotmail.com
Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.