楼主: oliyiyi
91921 2410

【latex版】水贴   [推广有奖]

1111
oliyiyi 发表于 2015-10-11 07:47:52
This, as far as I’m concerned, is perfectly fine, especially since I agree with 98% of their views.
(My only quibble is around SQL—but that’s more an issue of my upbringing than of
disagreement.) What their unambiguous writing means is that you can focus on the craft and art
of data science and not be distracted by choices of which tools and methods to use. This
precision is what makes PDSwR practical. Let’s look at some specifics.

1112
oliyiyi 发表于 2015-10-11 07:49:29
Practical tool set: R is a given. In addition, RStudio is the IDE of choice; I’ve been using RStudio
since it came out. It has evolved into a remarkable tool—integrated debugging is in the latest
version. The third major tool choice in PDSwR is Hadley Wickham’s ggplot2. While R has
traditionally included excellent graphics and visualization tools, ggplot2 takes R visualization to
the next level. (My practical hint: take a close look at any of Hadley’s R packages, or those of his
students.) In addition to those main tools, PDSwR introduces necessary secondary tools: a
proper SQL DBMS for larger datasets; Git and GitHub for source code version control; and knitr
for documentation generation.

1113
oliyiyi 发表于 2015-10-11 07:53:08
Practical datasets: The only way to learn data science is by doing it. There’s a big leap from the
typical teaching datasets to the real world. PDSwR strikes a good balance between the need for a
practical (simple) dataset for learning and the messiness of the real world. PDSwR walks you
through how to explore a new dataset to find problems in the data, cleaning and transforming
when necessary.

1114
oliyiyi 发表于 2015-10-11 08:01:50
Practical human relations: Data science is all about solving real-world problems for your
client—either as a consultant or within your organization. In either case, you’ll work with a
multifaceted group of people, each with their own motivations, skills, and responsibilities. As
practicing consultants, Nina and John understand this well. PDSwR is unique in stressing the
importance of understanding these roles while working through your data science project.

1115
oliyiyi 发表于 2015-10-11 08:03:16
Practical modeling: The bulk of PDSwR is about modeling, starting with an excellent overview
of the modeling process, including how to pick the modeling method to use and, when done,
gauge the model’s quality. The book walks you through the most practical modeling methods
you’re likely to need. The theory behind each method is intuitively explained. A specific example
is worked through—the code and data are available on the authors’ GitHub site. Most
importantly, tricks and traps are covered. Each section ends with practical takeaways.

1116
oliyiyi 发表于 2015-10-11 09:34:22
The figure on the cover of Practical Data Science with R is captioned “Habit of a Lady of
China in 1703.” The illustration is taken from Thomas Jefferys’ A Collection of the
Dresses of Different Nations, Ancient and Modern (four volumes), London, published
between 1757 and 1772. The title page states that these are hand-colored copperplate
engravings, heightened with gum arabic. Thomas Jefferys (1719–1771) was called
“Geographer to King George III.” He was an English cartographer who was the leading
map supplier of his day. He engraved and printed maps for government and
other official bodies and produced a wide range of commercial maps and atlases,
especially of North America. His work as a mapmaker sparked an interest in local
dress customs of the lands he surveyed and mapped; they are brilliantly displayed in
this four-volume collection.

1117
oliyiyi 发表于 2015-10-11 09:36:42
The data scientist is responsible for guiding a data science project from start to finish.
Success in a data science project comes not from access to any one exotic tool,
but from having quantifiable goals, good methodology, cross-discipline interactions,
and a repeatable workflow.
This chapter walks you through what a typical data science project looks like:
the kinds of problems you encounter, the types of goals you should have, the tasks
that you’re likely to handle, and what sort of results are expected.

1118
oliyiyi 发表于 2015-10-11 09:55:55
In defining the roles here, we’ve borrowed some ideas from Fredrick
Brooks’s The Mythical Man-Month: Essays on Software Engineering (Addison-Wesley, 1995)
“surgical team” perspective on software development and also from the agile software
development paradigm

1119
oliyiyi 发表于 2015-10-11 09:57:26
Role Responsibilities
Project sponsor Represents the business interests; champions the project
Client Represents end users’ interests; domain expert
Data scientist Sets and executes analytic strategy; communicates with sponsor and client
Data architect Manages data and data storage; sometimes manages data collection
Operations Manages infrastructure; deploys final project results

1120
oliyiyi 发表于 2015-10-11 09:58:26
CLIENT
While the sponsor is the role that represents the business interest, the client is the role
that represents the model’s end users’ interests. Sometimes the sponsor and client
roles may be filled by the same person. Again, the data scientist may fill the client role
if they can weight business trade-offs, but this isn’t ideal.
The client is more hands-on than the sponsor; they’re the interface between the
technical details of building a good model and the day-to-day work process into which
the model will be deployed. They aren’t necessarily mathematically or statistically
sophisticated, but are familiar with the relevant business processes and serve as the
domain expert on the team. In the loan application example that we discuss later in
this chapter, the client may be a loan officer or someone who represents the interests
of loan officers.
As with the sponsor, you should keep the client informed and involved. Ideally
you’d like to have regular meetings with them to keep your efforts aligned with the
needs of the end users. Generally the client belongs to a different group in the organization
and has other responsibilities beyond your project. Keep meetings focused,
present results and progress in terms they can understand, and take their critiques to
heart. If the end users can’t or won’t use your model, then the project isn’t a success,
in the long run.

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-3-2 19:10