Its no secret that I think Llama 2 and its derivatives are the future of AI and ML. Rather than getting bigger and smarter, I think GPT-4 is enough for 99.5% of applications, AI should instead strive to get smaller, cheaper, and faster. If you want to run Llama 2 via llama.cpp, you can check out my guide on how to do that. However, the problem with llama.cpp is that to get it working you have to have all the dependencies, either download a binary or clone and build the repo, make sure your drivers are working, and then you can finally run it.