The Glossily Rendered Elephant in the Room or: Why We are Building Our Own Models

With the accelerating rate of advancement in AI and its seeming integration into all things, the decision for many companies is whether or not to use in-house models to provide these services. At Nuanced, we aim to balance innovation, customer satisfaction, privacy and pricing. Providing our service to detect and identify AI-generated content, we chose to develop and run our models ourselves, which we believe will uphold our aforementioned commitments. There are a myriad of reasons as to why we made this decision, and we believe, moving forward, more and more companies may plan to do so themselves.

In an era where data breaches and privacy concerns are rampant, handling sensitive customer data with the utmost care is not just a compliance issue but a trust-building measure. By processing data locally, we ensure that customer information never leaves our controlled environment. This approach significantly reduces the risk of data exposure and aligns with stringent privacy regulations like GDPR and CCPA. We're able to enforce a zero personally identifiable information policy when training models, and to mix metaphors, we've seen that the same elephant in the room never forgets.

By opting for local ML models we're able to provide increased flexibility, customization, offering models which fit customers needs. It may be attractive to use the sledgehammer of an external ML API, but very often, a tack hammer will work. Recent open source models, tools and techniques for local development and deployment of large language models (LLMs) and vision models make this easier than ever. The capabilities this affords us include:

Leveraging Fusion Models and Mixture of Experts: Utilizing fusion models that combine various types of data and expertise is essential when identifying AI-generated as having a model specialized for a given generative model has been shown to be integral in identifying them as noted by Epstein et al..
White Box Advantages: Unlike the 'black box' nature of many API services, local models offer transparency in how data is processed and decisions are made. This 'white box' approach is crucial for sensitive applications where understanding the model's reasoning is as important as the outcome, as is the case when distinguishing between AI generated content and authentic human content.
Continuous Improvements: With local models, we're not at the mercy of a third-party provider's update cycle. We can continuously improve and iterate on our models, ensuring that they evolve as rapidly as our product and customer needs do.
Customization for Different Customers: Features like Parameter Efficient Fine-Tuning (PEFT) and Low Rank Adaptation (LoRA) enable us to tailor models specifically for individual customer needs, which is hard to achieve with one-size-fits-all API solutions.

While initial investments and starting friction are inherent when setting up in-house models, the long-term cost benefits are significant. By running models locally, we avoid ongoing API costs, which can quickly add up, especially as our user base grows. Additionally, having control over the models means we can optimize them for efficiency, further reducing operational costs. We achieve this by fine-tuning our models as customer requirements are uncovered and we are able to streamline our inference pipelines, hone our models

By prioritizing local ML models, we're making a conscious choice to invest in our product's future. This approach not only ensures that we're at the forefront of technology but also aligns with our core values of customer privacy, bespoke solutions, and cost-effectiveness.