楼主: oliyiyi
1153 2

Avoiding Complexity of Machine Learning Problems [推广有奖]

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
271951 个
通用积分
31269.3519
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383775 点
帖子
9598
精华
66
在线时间
5468 小时
注册时间
2007-5-21
最后登录
2024-4-18

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

本帖隐藏的内容

Today, more and more products and engineering teams rely on machine learning (referred to as ML through out this blog post). The abundance of open source tools and libraries also makes it much easier to learn, develop, and build ML models even for people with little prior knowledge or experience. ML is a powerful tool for many problems, but it comes with costs — it can introduce complexity to systems which builds up over time and evolves into large technical debt. A recent publication by Google argues that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying ML (see Reference 1).

At Quora, we've been using ML to tackle many interesting problems such as ranking, search, recommendation, and spam detection (see Reference 2, 3, and 4). We are constantly evaluating new approaches and building new product features with ML. At the same time, we also strive to be careful about the complexity that these models introduce and have developed principles and best practices to avoid or reduce such complexity. In this blog post, we will share our thinking about complexity in ML systems and describe some of our approaches to mitigate them. Note that most of the problems and solutions in this post can also be applied to general software systems, and vice versa. However, we choose to focus on those that are especially important for ML.



Do you really need machine learning?

Before even thinking about complexity in your ML system, ask yourself if your product feature actually needs an ML solution. Sometimes, ML adds complexity to your system when you could just use a simpler heuristic algorithm that does not require feature engineering, model tuning, continuous training, or model deployment. However, when there are already ML models built for other purposes which you can reuse, going with a heuristic adds complexity. A quick and dirty heuristic might seem like a short-term gain, but is really a long-term pain. Over time it becomes increasingly difficult to understand, depend on, and maintain all the ad-hoc heuristics. The product can also suffer when there are too many different ways to do similar things, resulting in inconsistent user-facing behavior. Therefore, it’s important to be aware of this tradeoff and consult with your team or ML specialists in your organization before investing heavily in any approach.

To evaluate whether an ML solution is appropriate for your problem, it is critical that good documentation is kept and shared within the organization. This way, it is possible to understand if there are product features or problems similar to yours that are already tackled using ML. There are also many resources on Quora and online about typical problems that can be solved with ML.

Let's take a look at a few examples at Quora. We have developed and productionized ML models for a number of ranking problems such as search result ranking, answer ranking, feed ranking, and digest ranking. In the ranking algorithm, an ML model produces a score that predicts if a user will “engage” with the ranked result. Although not a typical ranking problem, the digest email scheduler can build on a similar ML model to predict the likelihood of user opening the digest email. On the contrary, detecting trending topics or events is often solved using heuristic algorithms that leverage time series analysis.



What is complexity?

Nobody considers complexity as a positive feature. However, not everyone agrees on qualifying a system as complex or making a given tradeoff for simplicity. It is important to understand the different symptoms of complexity before we agree on how to treat them. So, what do we mean when we look at an ML system and say it is too complex? Below are a list of possible answers.

1. Too many different ways to do similar things

An ML system is too complex when there are too many different ways to do similar things. This creates complexity in at least two ways. First, engineers lose time trying to figure out the correct way to do what they need to do. Second, because things are implemented in different ways, maintenance overhead is added.

2. Not providing enough explanation or insight

If a system is hard to interpret from the outside and can only be understood as a “black box”, it is generally considered complex.

3. Undocumented functionality

Hard-to-understand, undocumented functionality also creates complexity in a system. The actual implementation might not be that complicated, but the fact that it is hard to understand without digging into the details adds complexity.

4. Non-reusable functionality

Functionality that cannot be reused in different contexts leads to different ways of doing similar things, and therefore adds complexity.

5. Require many steps to do a “simple” thing

Sometimes engineers may feel that an existing system or tool requires too many steps or is too complicated for their use cases. In this scenario, they are likely to come up with a brand new system or tool that is optimized for their specific use cases. While this might make the current implementation simpler, by adding a different system or tool, the overall complexity is increased.

6. Require understanding of many tools

Similar to the previous scenario, complexity may arise from imposing the need to understand many or complex tools. For example, if an engineer working on search result ranking needs to understand Python, C++, Gradient Boosted Decision Trees, and Matrix Factorization, and there is no easy way to abstract them from understanding all, the system is considered complex.

7. Unnecessary maintenance overhead

A system is qualified as complex if it adds unnecessary maintenance overhead. For example, it might generate pager duty burden or add monitoring and retraining costs.



What pushes engineers to complex solutions?

Engineers do not build complex solutions just for fun, but projects have constraints that might push them to build something unnecessarily complex.

1. Scrappiness

Intuitively, it appears as if scrappiness should lead to a simple solution since the goal is to get to it as soon as possible. However, that is rarely the case. As explained earlier, the fastest solution often leads to a local optimum but does not reuse anything existing nor can be reused in the future. We think that scrappiness is generally good for development velocity, but it is also important to acknowledge its side-effects and correct for them.

2. Lack of long-term vision

Engineers might be too focused on developing something for a specific problem, without paying much attention to whether the system is easy to maintain in the future or can support future use cases.

3. Lack of understanding

Not understanding what the current system does may lead to complexity. There might be a way to implement a new use case easily, but a lack of understanding makes it seem complicated or leads to solutions that are more complex than necessary.

4. Lack of flexibility in architecture

When an existing architecture is not flexible enough to adapt to a new use case, engineers need to decide between changing the existing architecture or doing a “one-off”. More often than not, “one-offs” are preferred because they are easier and quicker to implement.

5. Lack of feature selection

Engineers tend to be more excited about adding new features to the ML model, but care less about removing old features. Old features may no longer be useful after a certain number of iterations, and they make the model harder to understand and more complex.

6. Optimize for accuracy

To optimize for accuracy, engineers often use approaches like ensemble and combine results from multiple ML models in the system. While this is usually a good way to improve model quality, “overdoing” it may significantly increase complexity that can't be justified by small metric wins.

7. Optimize for performance

Sometimes optimizing for performance can also lead to overly complicated or obscure system implementations. For example, for performance reasons, engineers working on search may decide to implement the ranking infrastructure in C++ , whereas the rest of the stack is written in Python, which makes the entire system more complex.

8. Dependencies

Building ML systems is hard because there are very few well-known design patterns. In addition, it is very common to have chains of dependencies between data sources and subsystems. It takes an experienced ML engineer to build an efficient yet simple system.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Complexity Learning avoiding Problems earning complexity experience technical knowledge products

本帖被以下文库推荐

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
沙发
h2h2 发表于 2017-2-19 17:51:40 |只看作者 |坛友微信交流群
谢谢分享

使用道具

藤椅
paulinokok 发表于 2017-2-19 22:27:24 |只看作者 |坛友微信交流群
thank you

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-19 11:27