Behind Data Science

An interview with David Purdy

March 31, 2020 Kit Feber & David Purdy Season 1 Episode 1

In the debut episode of the Behind Data Science podcast, Kit Feber of Big Cloud sits down with Chief Data Scientist and Advisor to Startups and Founders, David Purdy. David's experience includes and is not limited to R&D in government and industry, relating to spatiotemporal forecasting, quantitative finance, personalised medicine, natural language processing, information retrieval, and other applications of statistics, especially machine learning. Contact Big Cloud via hi@bigcloud.io

spk_1:   0:00
Hello, listeners. Welcome to big clouds, machine learning and date science podcasts. I'm Kit, one of big clouds. Cofounders. For those of you that don't know, Big Cloud is a global recruiting firm. We hired eight science in machine learning talent for exciting tech companies in Europe, Southeast Asia on North America. In this podcast, you could expect here leading minds in date science and I talking about their field from discussing topics exciting and accessible to all listeners. Welcome to the podcast. I'm Kit Andi. I'm very pleased to introduce the guest on this episode. David Purdy. Hello, David.

spk_0:   0:45
I like it. Thank you very much for having me today.

spk_1:   0:48
It's a pleasure, David, where you based the benefit of the

spk_0:   0:51
listeners. So I'm in the San Francisco Bay area and, uh, have been working both in Silicon Valley and San Francisco for a number of years.

spk_1:   1:00
Andi, could you give people over your background, David?

spk_0:   1:04
Sure. Thank you very much. So I started a little over 20 years ago in what is now called data science. I began my career at the National Security Agency, but I've worked in everything from natural language processing and Web search to high frequency trading medicine and lead and started a variety of teams at Uber and also have worked in autonomous vehicles. More recently, I've been looking toe advise a variety of companies and founders on how to optimize and develop their data science strategy. Using a lot of experiences and insights I've gained over this time on, how do you form the team? How do you set the goals? And then how do you accelerate and reach the highest possible velocity in generating ideas that may work, validate them and deploying them? So really, it's How do you achieve the speed of light in your technical development

spk_1:   1:54
we're gonna talk about? I'm getting a strategy right behind data Science is well on the pot, David, for the benefit of the people listening. Could you give us some insight into your inspiration? Um, and people have inspired you over the years.

spk_0:   2:08
Sure. So first off in beginning my career at the essay. One thing that that is very notable is the history of work in urgent contexts to develop the talents and tools to achieve to achieve research breakthroughs. And this is notable left Bletchley Park, then, in looking back, even further Thomas Edison, one of the original founders of an industrial research laboratory, and subsequently work it. Ah, by Oppenheimer as well as the Apollo program. In each of these cases, there was something unprecedented that had to be achieved and a real sense of urgency and very fascinating problems of fascinating and talented teams. And so it's it's It's a really joy when you could work on something that, ah, unprecedented hard and also have to think very carefully. How do we get to the end results as quickly as possible?

spk_1:   2:59
I think a lot of people may have seen the imitation game. David and I think the bill actually project is dear Thio many people's hearts in the UK But could you just give people inside as to why they're touring? And Bletchley Park has interested you

spk_0:   3:13
sure. So for those who've seen the imitation game, it's spelled out fairly quickly early on, which is that the idea of approaching something as monumental is breaking the enigma code using just human means. It's it's it's impossible and something that turning and others really recognizes that you have to create a platform for being able to test ideas, Thio deploy those algorithms. So in this case, they were testing various, I guess, for code briefing and and then generate on that. But you also have to align the teams around that, and as a result, they were able to do something that had never been done before. It's ah, it's a great example, if you will, of how the platform can enable substantial breakthroughs on and I have the recommend a movie.

spk_1:   4:03
It is quite good point to make distinction. David, about the difference in data science between the problems you're trying to solve could just explain to listeners about a platform versus a singular model versus analysis It

spk_0:   4:16
Sure, sure. So the thing about data science is it's often presented as if data science is ambiguous. The reality is you have different planes of products or deliverables. It's a lot of data. Science has been about delivering insights and analytics to the to leadership on corporate strategy, industrial strategy, organizational strategy, this kind of thing. And for that you may leverage models. You may leverage a variety of statistical techniques. The another type of delivery ble is the actual model that may go into production somewhere that's deployed for all sorts of purposes it could be for spam detection. It could be for driving a car. But in order to support these efforts deployments of models, deployments of experimental frameworks. It's very, very empowering to have a platform. So you have to think very carefully. What are we developing? Are we developing something where the data say insists, are the users where the consumer is a computer or where the consumer of the product is someone who's charged with making strategic decisions and when? Once you identify those goals, then you can work backwards to how do we develop the practices and tools to move quickly

spk_1:   5:29
on dhe? I think it's fair to say, knowing you, David, that you've built both platforms and also trained, developed thousands of models cross your career, Is that correct?

spk_0:   5:42
That's right. In fact, all three of these avenues that I've worked on I have developed reporting systems and, um, processes for Seo CEOs. And CFOs on, of course, have deployed thousands of models in different contexts and have in leveraging those experiences designed tools and frameworks first for myself and then uh, designed frameworks and systems for use by hundreds of data scientists so that they could also deployed thousands of miles. And in the end, these have led to billions of dollars of incremental revenue, um, as well as greatly accelerated the process of bringing things to market.

spk_1:   6:23
And where would be a good example within your career? David of where you've been working on trying to achieve the speed of light research?

spk_0:   6:32
Sure. So I think a key example was when I worked in a high frequency trading. They're the market of ALS very, very rapidly. And of course, this. There's really no other environment where information comes in, and your goal is to decide whether to act and how to act on it as quickly as possible. This is for the model that is in production. In addition, the markets evolve quickly. So there's this this research challenge of how do we set up it on environment, where we're generating ideas that have a high chance of, uh, succeeding. They've tested testing those ideas very quickly and then having virtually no friction from the development of the idea to the deployment of the idea and then ensuring that the deployment is highly highly reliable in high frequency trading. Things can break very, very quickly as a number of companies have unfortunately experienced. And so when you think about that, you want to have simple, reliable, scalable tools. You also want to have a research agenda that allows you to examine what has happened with models with algorithms and strategies you've deployed and then use those to generate hypotheses on subsequent innovations. And, for example, when I worked at Goldman Sachs, uh, the effort that I lead on high frequency trading and interest rate products was able to deploy 30 model bundles in 30 weeks. And so we were developing and testing many models and strategies frequently than once a week and choosing the very best to deploy. And when we did so, we were able to show increasing returns on a daily basis. And so this was Ah, this was an example that I that I was very enthusiastic about. And really, as I've gone on in my career, I wanted to be able to create those opportunities for others. It's ah, it's exciting toe, have your work have an impact, and to have a real clarity on what you're researching, which are developing and dig into the really the deeper points of the system that you're trying to produce, rather than the more painful points of managing deployments. O. R, uh, systems in production that that may have unresolved issues.

spk_1:   8:49
Did the environment I the stakes being So I, within trading David have any impact on the research agenda or the culture of research?

spk_0:   8:59
Actually, if I've been fortunate, not only is trading does it have its own sort of relatively high stakes, but I've forgotten national security, medicine, safety. Everything is important, really. This is in these environments. You want tohave work. That is, um, you should expect people will review your work. Really, science is about reproduce ability. So if you have a system that is clunky that is not well engineered, not well-architected well, then you're not actually going to spend as much of your time on developing high quality research. If you If you have a research process that is not well organized, it's maybe not going to be a CZ easy to reproduce. And so it's actually the totality of these experiences, realizing that I wanted to be able to create systems that were inspect herbal, that other that could be shared with others on reviewed by others have more eyes on the problem. Uh, that's that's something that was very motivating

spk_1:   9:58
on, um, the money red flags or things that you would speak to to highlight that you see is being inhibitors on achieving efficient past research In its oration. David

spk_0:   10:11
eso at a high level, I turned to think about too goals that that everybody should really pursued ones. Clarity, another His velocity and the thing about charity is, is Are you working backwards from the goal of your working backwards from the the strategic objective, the business objective? And then what do you need in terms of a the technical staff? When What do you need in terms of the team to build a technical stacking? This technical stack is, but data feeds to be need. What algorithms do we need? What infrastructure do you need, what kinds of models or last functions but need all these kinds of things? And then we were building the team to build a technical stack to solve you the business problem. So one inhibitor is just really relaxing clarity and working in the wrong direction. So if you start off by build a team, Then figure out the technical, deliver Bols and then try to find a um you know, a buyer for this thing that you've made your going to have a really slow go on it. The others velocity and velocity is, as we learned in high school is really its speed and direction. And so you can. Everybody has the same number of hours in the day, and you can work very hard and you can have high speed. But if you keep changing direction, or if you're not clear on the direction that you're ultimately going to go, you're just going to bounce around. And so if you think of the speed of light, it's really the fastest speed from from point A to point B. And if you look back over the course of a project or an effort, you think, well, how often we're moving in this direction and given what we know, at what point did we ask the right questions? Did we make the right investments and what would have guided us earlier? So it's how do you set up the process? That's important on then? Also, how do you equip the process with of this industrial research that will move you for it in terms of inhibitors. So one is I could mentioned this. What are you working towards? So what you working towards on business goals? Were you working towards on the technical staff? That's very important. Another is really alignment across the organization. So you have data scientists, product managers, engineers, various executives. How are they all empowered and aligned? And when you look at a company, say, like Goldman Sachs, it's a It's an environment where there is a strong cultural push towards what I call lateral awareness so that people can move very, very quickly and are very empowered to do so. And I think that's that's critical. It's not possible for one person to know everything. And it's not what possible for, um, one person to solve every potential hitch along the way. So you have to think about it as a culture you have to think about its practices

spk_1:   12:57
and moving from Goldman's into what was a much smaller uber than it is today. David, how did, um, how was the culture different to you and yeah, how did the approaches differ in the relevant cos for you

spk_0:   13:16
Well, so that's that's That's Ah, that's very interesting. So I joined over when it was about 1700 employees and crew Thio 20 25,000. By the time of that I left last year and in that environment, uber first off, it was legitimately data. First, uh, it did. It is the basis of everything in terms of matching writers and drivers pricing, routing, etcetera of trips and then also managing the entire user experience. Whether that's writer driver that subsequently everything from uber eats too afraid, it's also one similarity between the two is the super is A is a broker. It's a broker of transportation. Service is. And so you know Goldman has been at that for with a century that was important for being able to set up the right data frameworks with an uber is to recognize that you're dealing with the life cycle of a customer in the life cycle of a transaction. In terms of the differences, uber is, was being much smaller, necessarily had to cobble together and leverage. The best resource is in outside of uber. Uh, that's that's very important. And of course it was in a massive growth face. And in that environment, you need a very high degree of reliability for things that have never been done before. S o So that's, uh that was that. That was certainly a very interesting situation to be in their tolerance for things breaking that just as I said, that comes with things having never been done before. But it's Ah, it's got here that there is a great a great environment.

spk_1:   15:02
David, one of the black ship projects that you worked on a U boat was Michelangelo. Could you talk about that for us and and maybe about the strategy behind it and getting that right, um is part of the key parts of process?

spk_0:   15:19
Sure. So when I arrived, I formed the first team of data scientists leveraging machine learning within a Burt and quickly saw that there was not a single common tool. Uh, there was there was no effort yet invested in making machine learning available and integrated into the products, and it was very clear that if you're going to go this route, you need this for everything from customer turned the fraud thio e ta estimation that machine learning would be critical for the camp. And so I looking back on the fact that I developed, uh, machine learning tool kits and frameworks for myself and my team's in previous blurred pushed for having a machine learning platform. And so there is one difference here, which is that this is an enterprise machine working platform. The idea is, it's not just how did we get a single model developed, or how does a single user developed models and deploy them? It is about how do we create value across the entire company? And so there are entities, writers, drivers and trips, for instance, that many different teams have an interest in and which one to do is have an environment where the underlying infrastructure and to the greatest extent the underlying data is as reliable as possible and that the two scientists can focus on developing ideas on predictors on models on response variables and investigating those as they look across uber's hundreds of markets across the many different products and can create that value in the most efficient possible way. So rather than here's a library that a person could download and they could build on models and uploaded, it is there's a resource that they could leverage that others could leverage. And then as people develop something of value, they're putting it into this platform. So an example would be predictors on customer turn. Maybe that those were useful for some form of marking. It could be that they're used for for, um, some form of product selection. What not? And so if one person has developed this, then they put this in there into a common feature story, and others could then leverage it with Michelangelo. The the idea was to capture that which is complex, that which it needs to be reliable, and that which is really not within the expected responsibility set of ah Federated Group of Users and put them into one environment and take out all of those frictions. Always impediments all those things that if you look at the course of the year, you really don't want your data scientists to have to spend time line. And for that there is a very slight friction of routing your data, routing your models through my financial. But the payoff of being able to have this reliability in the payoff of being able to use others, uh, contributions is very, very significant. Would

spk_1:   18:07
it be fair to classify it as a sort of machine learning as a service style? But one David

spk_0:   18:13
Well, it is, except I would put the word enterprise at the top because there are machine learning is the service paradigms. And again, their value proposition is that they can't help a user developed a model and then deploy that model, right? So Michelangelo does all of that. But it also is an environment where people who are not responsible for models can also inspect the models. If there is a model, that is, his behavior is changing or breaking an engineer or a product manager. And go and look at various charts and diagnostics on and about the models about the output of the models, the inputs of the models, this kind of thing. They're our abilities to have alerts and warnings and stuff. And so the enterprise needs this and the goal is really on. Yes, of course, you're delivering machine learning, but it is integrated into the output of the enterprise, so it's integrated in products. It's integrated into analytics, this kind of thing, and so I think that's that's very, very important, just realized it's not an activity by itself. It's an activity that drives the value for the company.

spk_1:   19:20
And I know because I know you that Michelangelo was has been incredibly successful. Is it still functioning in uber David? People still using it on a daily basis

spk_0:   19:32
bust my nose? Yes, there's actually a lot of a lot of usage. Hundreds, hundreds of users and thousands of bottles developed and deployed with it. And, you know, it's Ah, it's it's very exciting. There's had significant results. So the last time that I let it over was the city designs team, and they were across the entire customer and transaction lifecycle. Um, and I've really been able to have all sorts of results all over the world that have been very, very beneficial. And I had a lot of use of Michelangelo. Its results, and the Michelangelo team was an extraordinary partner in the development of these really complex applications.

spk_1:   20:10
David, just because people might know immediately think about safety when they think about you know, Cooper or ride sharing company, Why is the safety element so vital to uber? And can you speak to what your team did?

spk_0:   20:23
Sure, so at a high level. Safety is in the interests of everybody writers, drivers, regulators, the public employees, investors and these air about rare events that are not anticipated in general. And so what you want to do is try to make an environment that is both measurably safer and, uh, from the experience and, uh, and perception of users you can commit thio or facilitate the A stronger sense of safety. And so there have been incidents around the world in fact, from London to United States, Thio, other other other countries and municipalities, um, requests of uber to report on an address, safety issues and uber had it had an amazing safety report really released at the end of last year? Uh, not to my knowledge. No other company has ever been as transparent about the nature and volume of incidents that have occurred in their in their ecosystem. Where, there my phone.

spk_1:   21:30
So you're just gonna scout how big uber's kind of global safety team is? Or is that an impossible question?

spk_0:   21:37
I guess I would put it on. There are hundreds of people who considered a primary responsibility, but really thousands of people who are thinking about it and contributing to them and many, many partners around the world. Lots of nonprofit organizations, government organizations that are that are involved. And so it's it's it's truly it's it's hard to say, Um and that's actually one of the things that was really amazing. I began my career at uber thinking I'm going to develop a team that develops machine warnings on my fourth, what we call a perverse sory I was. I was with a wonderful team of people from around the world involved in safety operations on That's truly remarkable in that, as I said, there are these These things that happened is uber. That's more than five billion trips rides per year. You have these interpersonal moments where things vast majority time, that everything is just as expected. Very occasionally there are There are issues that arise and working with these, these folks from many different backgrounds, whether it's electrical engineering or consulting or operations, research or psychology, and developing systems and processes. Indeed, a sign that could support these goals was very, very exciting. And so I'd like to say that what we were doing was making rare events rarer and ah as I mentioned earlier about the product goals we were delivering on platforms, algorithms and analytics, and that's that's all of these are absolutely crucial.

spk_1:   23:13
And if I have to ask because we've spoken about this off air, you interviewed a huge number of aspiring uber date scientists in your 10 year Onda. Obviously, not all of them were successfully hired. But could you just explain for listeners who are maybe, you know, interested in securing a job? A top tech company within data science? I'm a little bit about how you interviewed people uber and yeah, any insult tips people to get a job there.

spk_0:   23:42
Sure, sure. So there's a number of things that are important. First off, it's all of this is a growth process. I was fortunate throughout my career to have a number of mentors, so having a mentor and getting feedback is very valuable. Another is Thio. Just continue to understand the problems base that you're working on. It is The technical stack is around methods, tools, infrastructure code, all of these things. But really you're working towards how do you have an impact? And so that's means learning the business as well as the technology find those opportunities to work with engineering with product, with some folks working on marketing or even design user experience finance get inside their heads. What questions are they asking? What problems are the seeking Thio being inside on and working on strategy, getting involved into the nuts and bolts? So reliability stealing. At first, it's no fun. But the reality is that when things break, it's often in these in these, uh, these foundational things. So instead of just sitting apart from folks, dive in and learn from from your partners and engineering and elsewhere. And there's on top of this. Working on writing and speaking is very important. You have to get used to explaining things simply. So listen to the people who can and try to explain two things to people who aren't like you. If you make it an assumption that a person has a strong understanding of the topic that you're presenting and you're not stretching yourself, you're not trying to get into where they're coming from, and in some sense you're not necessarily mastering the material that you're trying to communicate because you're already assuming people understand it, it really goes from there, depending on the path that person wants to pursue. There's a lot of, uh, different technical and organizational considerations, but I'd start with realizing that you're trying to learn and communicate with others.

spk_1:   25:50
That's really useful. David and I know that we've spoken the past about maybe there's a kind of gap emerging between, um, we're able to design and build models versus actually to boil them and kind of make them work in production. You seeing that as a trend in the market that employers are wanting data scientists that have almost, you know, softer engineering level coding skills?

spk_0:   26:16
Yes. And that's actually the wrong mindset. S O. S. O. So, really, if you take it in a scientist who hasn't done much after injury and you say, Look, we're going to put you on the critical path for a key release and so what you're going to do is you're going to learn a bunch of things you've never done before. You're going to try and make them work. Its gonna be lightly reviewed, and then it's gonna be pushed out. Um, and you know, you have no springs at the whole life cycle of deploying code of monitoring of debunking, Uh, but that's that's on. You even let your first time That's, Ah, that that's not a good place to start. If you look at things in terms and that's that's sort of the first responsive, uh, first generation that starts with, well, you have somebody developing models, and then they throw things over the wall and somebody else re implements them and deploys that right? So you're moving some of the work back to the data scientist instead, Really. And Google, uber and others have noticed Use your engineers at what they're really good at. Building reliable, scalable systems. Think about what are those things that need to be done when you have somebody engaged for thousands of orders on and many people engaged for tens or hundreds of thousands of hours on There's algorithmic deliverables. So a lot of it around the release process can be simplified, instrumented, automated and justus. That's done for software engineering and softer releases. It can be done for model releases.

spk_1:   27:47
That's really interesting. We're seeing more of a demand. The full stock dates scientists inverted commas or requests with people have written, you know, production seat, boss, boss, or I've been Koda's, you know, within a date science position. So it's interesting to hear you talk about that. David, Do you have any other words of advice or input to people that are within a day science field looking for a job? A bigger tech company?

spk_0:   28:17
Well, so regarding the production Python and C Plus Plus, it doesn't mean that you can remove programming from a requirement. It is the requirements around the deployment process that really need attention in terms of the work of data scientists. It's very reasonable expectation, but a data scientist can implement mathematical code, Uh, in any given language. It is not hard to pick up. Once you've worked in one language, you'll see a lot of patterns that can be translated to others, and that's very reasonable. That is more out of the competition over clothes. The relationship of different service is of different, ah, infrastructure pieces that, if not well, architected we'll just lead to a lot of misery for everybody. Frankly, um, in terms of other advice, really try different things. Try during the down time. Try to think about what you could do more quickly. Try to think of how do you set up a research agenda looking out weeks, months, quarters to solve different problems and then working with stakeholders, toe asked, What do we need to do work backwards from those goals to investigate different models, different sources of data, et cetera. And then how do we develop the tools so that you can generator, generate ideas, test them, and then said, Deploy them in your solving and you're on your way to solving problems?

spk_1:   29:44
Thank you. David on Guy was gonna ask you about deep learning specifically because, um, it's all anyone talks about the moment in this field. What's your view on it, David? And are you seeing neural networks? Actually, being that the purpose and solving lots of problems, or do you think it may be a little bit over height and yeah, good. Do you shed some light on it for it?

spk_0:   30:07
Sure, absolutely. So, um, it is an area that I would have the recommended, but it's important to think of. How do you ask questions of data here yet? If you're thinking of I have ideas, I throw them against the wall to see what sticks. You can do that with deep learning you could do that with traditional machine money. That's not asking questions. That's not developing really a research agenda, Uh, with deep learning, absolutely and work that teams that I fled on both uber and Thomas pickles. Uh, this has been a key component of visual, uh, based systems. So there are a lot of traditional works in computer vision. These have cut into different measures of utility and quality and so forth. And sometimes you can. You can develop a product friendly quickly using these standards, libraries, uh, drop predictors extracted from them into different machine learning models and sort of off to the races. The thing is, is that these systems aren't necessarily these tools and traditional competition aren't necessarily optimized for your your your prediction goal, you're inferential goal. Um, by incorporating deep learning, you're working more closely with the source material, images, videos, text, this kind of thing and working directly towards the prediction of the inference problem that you're trying to tackle. It does take time. It's also very useful if you have a, uh, somebody that you can consult. So at uber, for instance, the uber Ai Ai Labs team is an upstanding partner to, uh, program and product teams throughout the company as they're adopting deep learning. So they have a number of experts and they provide consulting and support capabilities. It does take time to develop that awareness. But overall, uh, if somebody's already committed to data science in machine learning that I would definitely say seek seek to go further

spk_1:   32:07
on Damu mentioned consulting there. David, obviously, that's a focus of yours at the moment. Are you finding life a ZA consultant? David View. You encounter the interesting cos Do you think it's the right kind of possibly moving forward?

spk_0:   32:23
Uh, s so this. This is something that I I've been approached by a number of friends and and companies over the years about advice on X or Y or Z, and if they're developing, if they're starting with, let's do machine learning platforms of the service been given the word Cliff Bennett over. That was something that was very natural and again there. I when starting the Michelangelo team, I filled in as the product manager, saying years speaking as the voice of the customer. I'II data scientists. Here's what we need to develop, right and so, but that's that's on environment that is exciting because it's about not just solving the problem within one company. So within what enterprise? But they're developing systems for the benefit of obviously as many companies as they can. They can, they can attract. And that's that's exciting. I've worked in a number of different industries, and so I would love Thio. Help folks benefit from that. There are various companies that are further along in their development of data, science organizations and strategy and reaching that point of how do we improve the cadence? How do we improve the velocity? How'd we importer of the career growth for? For scientists, all these kinds of things. And on that I've again, uh, I'm excited, Thio help him out on that. So we go every everything from very technical to very strategic and leadership, uh, interests along the way. There are also some fascinating, fascinating applications and startups in everything from consumer oriented to assess focused ventures that I'm advising and very excited about what these will be able to help people. I

spk_1:   34:00
was gonna ask you about interesting applications of machine learning. You obviously put your finger on the pulse, David. Yeah. Any problem spaces or challenges are engaged in machine learning that you think a particularly exciting over the, you know, next couple of years.

spk_0:   34:16
Yes. I can't quite get into some of these areas given their that, that I'm that area there that I went to help some of these individuals really are these these organizations get at. But if you think about there are verticals. So this is this is industry, if you will. And there are activities. So something that may be coming to many different companies, regardless of their industry that are not leveraging data much once machine learning, much less various advances in interfaces at all. It's not even that we're talking about going from like version seven to version Eat It is that there are just key holes in a lot of different, uh, spaces where companies exists. Either hope other companies order help consumers. And this is this is very, very exciting II from a standpoint of taking a taking a large space and then decomposing it instead of thinking, OK, I want to make a known thing better. They're just finding massive holes is very exciting. That was that was something that happened for me. over there was there was no tooling for data scientists. Is it that I want to build a better mousetrap on machine learning platforms? There was not. There was no totally. Similarly, there had been no system for predicting in real time the demand and supply around the world. So I moved over after the Michelangelo of team was up in mature living over to start the real time forecast. And so there are some exciting, exciting things where their entire industries just a Zuber filling a gap in transportation, their entire industries, that they're really not leveraging Dina and machine money. And then there are a lot of activities done by people that really are fairly primitive. And, uh, I think we can do a lot to help them do either work or fight. You are, ah, engaged with other companies or other people much more efficiently. That's right. I had to get into the specifics, but I think that there's really a lot of very foundational stuff that has yet to be touched.

spk_1:   36:21
I'm we are increasingly talking to candidates that are interested in working on machine. Learn your A II, but good off the social impact. David Are you seeing a trend for that in your network, or is it becoming a hot topic? But it were

spk_0:   36:38
honestly speaking. This is something that has been around for a very long time. How do you How do you leverage? In fact, this is motivated a lot of statistics over the course of more than a century. How do you improve everything from agriculture to education to medicine to social services on? And how do you develop the right quantitative awareness? How do you develop the rate forecasting? Because in these environments there's, uh, real consequences for individual. Then there's also a real shortage of of high quality data of actionable data talents. And so I think having more mining on this is very, very valuable. And there's a lot of, um, there's a lot of sort of first world problems that people want to solve through clever advances in machine learning. But there's a lot of, uh, social issues that could really our unaddressed across the board. And it's not just my data science in machinery, but I I think that this is this is one of more stimulating and rewarding, uh, areas one could go into

spk_1:   37:46
if I wanted to ask you about, um, conversion courses or online machine learning courses. Course. Aargh. Data tea, for example. Being someone who's been in machinery for over 20 years, what's your perception of them? And are they a good avenue? People t go down that are looking to move into a machine learning, would you say?

spk_0:   38:09
Certainly they give you a lot of ability to work at your own pace and autonomously. So that's there's There's no question there's no there's no competitive. There's no competition for having that flexibility, that flexible path from learning something. But at the end of the day, it's you're trying to develop a talent for solving problems, right? So focusing too much on, uh, what past people would call book knowledge is Is that the expensive not necessarily understanding how to solve a problem, that you can't just look up in a book? And that's, um, what I would say it is. Think about projects. Think about opportunities to collaborate, to think about hackathons, this kind of thing. Um, even if you're starting with something that has a very humble but time limited goal of of of a project, whether it's a hackathon and it's just a couple of days or a project that runs for a couple of weeks or a month, and I think that's important. And you'll you'll quickly reveal all of the things that you realize you didn't know right? And so as you as you. Um, look at that and it goes back to what is a technical stack? What do we need to deliver that technical stack on? Then, Of course, just getting more fluency with your stack. So instead of just watching videos about, they ask you, well, a va database, right? And then and then leverage it in some sort of an application. Um, that's, I think, very, very important. And you can tell quickly if a person has a lot of experience in, uh through the mindset for solving problems or if they're gonna need a lot of guidance on how to use the tools they already know in order to solve the problem.

spk_1:   39:47
Interesting. So I guess the message there is, you know, applying the learning and getting the leads and breaking stuff is the key part absolutely to him about it,

spk_0:   39:57
absolutely so the resources, But they shouldn't be your only, uh, investment of time.

spk_1:   40:01
Have you had any interesting books recently? David

spk_0:   40:04
s So So. I'm I'm I'm I'm actually reading Cem. Cem works on Renaissance architecture lately. Uh, but, uh, I tried to keep up with a lot of different aspects of data science machine morning, but I'm I'm pretty. Says I'm thinking a lot about how did people should communicate their learnings in the past? And Spence, that's ah, over the years. And so that's that's near and dear to my heart.

spk_1:   40:34
Interesting only date science, uh, machine learning books that you hold him particularly high regard.

spk_0:   40:40
But I've always loved the, uh, the elements of statistical learning. Um so buy a hasty to training in Freedman. That's that's that's that's Ah, that's wonderful. Also, uh, book deep learning by, uh, where in Goodfellas, One of the authors as another great text. I am honestly, a lot of a lot of a lot of what really interests me. It comes down to practice and with folks have learnt from their practices, and I think those for that I've just collected readings, um, from Google, too. No decades or centuries ago. Uh, book a couple of books that I also like our advice for a young investigator. Uh, it's It's quite gated. It was written in the late 18 hundreds. So it's comments on Social on social, be a toe. Social behaviors and norms and self worth is a little eyes more than, uh, more reflective of its time than the current the current year. But its practice it's advice on best practices by junior scientists is outstanding, and there's not a lot of books in its tummy. I think that's a That's a good book and then another book that I've often shared with colleagues. Um ah, nde Jenna colleagues called the unwritten laws of engineering. And it starts with the premise that when you begin your career, you master the technical, um, expectations of your of your role. But nobody teaches. You have to work with others, and it has a lot of precise and useful advice on out to manage projects, work with others, execute, communicated, etcetera.

spk_1:   42:28
They all sound very interesting. I think we'll link those books with the pod Really soullessness gun, Get their hands on them, David, and you're going to read. I just wanted to ask if you wanted to summarize or, um Yeah. Go back to any key points, David, that you think listeners should pay particular attention to before we wrap up.

spk_0:   42:52
Well, in general, you're going to try to solve a lot of problems in this world. And so think about a CZ. You're you're doing this Not only the the issue that you're tackling but reflect on it over time. And what could you have done better? Both in the process, in the algorithms, in the data used might be humble. It's better to make many small iterations that overall are marching towards some goal than to take forever, especially earlier in your career, on something where you're not going to see the outcome for for a long time. It's You're going to learn a lot along the way. And so really try to find the people that can give you the feedback and try to take a tumble, incremental steps, but always reflecting on what you could do better.

spk_1:   43:35
David. It's been a pleasure. I know how much you know and how much we have to talk about in this field, so I'm gonna drag you back onto the pod again in the near future room. We shouldn't cover some other interesting areas, but for now, thank you very much. Thank

spk_0:   43:50
you, kid. It's always a pleasure