DataWorkshop
DataWorkshop is the right place to learn and teach about machine learning and data science. https://
08/10/2024
Own your AI, not just its outputs! The main question is: What do you control when using closed LLM models like OpenAI, Anthropic, or Gemini? 🤔
Answer: You have virtually no control and it's almost certainly a vendor lock-in for your business... or is it?
Usually closed models means that you're:
--> locked into their rules
--> renting AI, not owning AI
--> limited in terms of auditability
--> at risk of data privacy concerns
--> facing compliance challenges (GDPR, HIPAA, EU ACT...)
--> your business becomes dependent on their decisions and pricing
This is a risky gamble, isn't it?
Open (weight) models might require more tuning and resources to achieve the same level of performance (tailored to your specific business problems) as closed models, but they offer greater flexibility and control.
To be clear, I'm not saying you shouldn't use OpenAI (or other closed models) at all. I often use closed models for prototyping. They're great for quick results, but consider the risks before going into production.
GPT (and other closed models) is impressive, but your business needs solutions, not popularity! Do you agree? Open LLMs offer custom solutions tailored to your data and needs, not just rented AI.
Important point: own your AI future. There are different options, but first, let's take this path and start considering possible approaches.
How do you manage this risk in production? Have you considered using open (weight) models in production?
26/09/2024
Yesterday, I gave a talk at NVIDIA (with their office in the background).
I talked about how you can run large language models (LLMs) on your own rules. To put it simply — how to run LLM models (kind of "ChatGPT") on your own server using open-weight (source) models like Llama, Mistral, Qwen and others.
The presentation was quite packed. I shared my own experiences, having struggled myself while structuring all the information. It’s a fresh and fragmented topic. Usually, there are two extremes: either people are deeply immersed in their niche (e.g. focus only on Managing the KV Cache) or they’re completely out of the loop.
So, when you want to dive into the subject and make an informed decision, it's not always easy. I came up with a basic algorithm for myself, consisting of at least 5 steps:
1. Hardware (Nvidia GPUs and others)
2. Software (inference tools)
3. LLM model
4. Optimization
5. API
Additionally, yesterday there were two other talks.
Prince Canuma gave an enthusiastic presentation promoting the "small models" approach and is deeply involved in open-source development. I learned about his library, FastMLX, which is definitely worth checking out. Not only is he doing great work, but his positive attitude and energy are truly infectious!
Mateusz Szczęsny from Nvidia spoke about NIM, explaining how it works, its architecture, and the core concepts. In short, it's an Nvidia product based on microservices, utilizing various components to enable running LLMs (including vLLM and TensorRT-LLM). Their focus is on making it easy for others to use.
Last, but not least! Big thanks for NVIDIA (for hosting us), and of course Agnieszka Rybak and Tomasz Sienkiewicz for inviting me and organizing this great event. Good job! I know how much effort it takes to make things happen—bravo! 👏👏👏
P.S. By the way, have you subscribed to my podcast about AI (Biznes Myśli in Polish)? I share a lot of practical insights related to AI (ML/LLM).
https://www.youtube.com/playlist?list=PLWOCRT27Z94XZzwcRI9-ExMyUXeBrF3W_
P.P.S. Follow me, like, and share this post. This motivates me to keep sharing my experience with you. Thanks!
Kliknij tutaj, aby odebrać Sponsorowane Ogłoszenie.
Kategoria
Skontaktuj się z firmę
Strona Internetowa
Adres
Kraków
30-024 TO 31–962