Joho the BlogJoho the Blog - Page 2 of 957 - Let's just see what happens

November 1, 2016

[liveblog][bkc] Paola Villarreal on Public Interest in Data Science

I’m at a Berkman Klein Center lunch time talk by Paola Villarreal [twitter: paw], a BKC fellow, on “Public Interest in Data Science.” (Paola points to a github page for her project info.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.


Public interest, she says, is the effecting of changes in social policies in the interest of the public, especially for the underdog. Data science extracts knowledge and insight from data in various forms, using math, statistics, research, info science and computer science. “What happens if you put data and tech in the hands of civil liberties orgs, human rights activists, media outlets”What happens if you put data and tech in the hands of civil liberties orgs, human rights activists, media outlets, and governments? How might this effect liberty, justice, equality, and transparency and accountability?


She is going to talk about the Data for Justice project, which is supported by the Ford Foundation, the ACLU, and the Mozilla Foundation. The aim is to empower lawyers and advocates to make data-supported cases for improving justice in their communities.


The process: get the data, normalize it, process it, analyze it, visualize it … and then socialize it, inform change, and make it last! She cautions that it is crucial to make sure that you’ve identified the affected communities and that they’re involved in generating a solution. All the stakeholders should be involved in co-designing the solution.


Paola talks about the Annie Dookhan case. Dookhan was a chemist at a Massachusetts crime lab, who falsified evidence, possibly affecting 24,000 cases. Paola shows a table of data: the percentage of adults and juveniles convicted in drug cases and those whose evidence went through Dookhan. It’s a very high number: in some counties, over 25% of the drug convictions used possibly falsified data from Dookhan.


She shows a map of Boston that shows that marijuana-related police interactions occur mainly where people of color live. She plays a clip from marijuana,justiceos.org.


She lists her toolkit, which includes R, Stata, PostGIS, Ant (Augmented Narrative Toolkit),
and Tableau


But what counts is having an impact, she says. That means reaching out to journalists, community organizers, authorities, and lawmakers.


She concludes that data and tech do not do anything by themselves, and data scientists are only one part of a team with a common goal. The intersection of law and data is important. She concludes: Data and tech in the hands of people working with and for the public interest can have an impact on people’s lives.


Q&A

Q: Why are communities not more often involved?


A: It’s hard. It’s expensive. And data scientists are often pretty far removed from community organizing.


Q: Much of the data you’re referring to are private. How do you manage privacy when sharing the data?


A: In the Dookhan case, the data was impounded, and I used security measures. The Boston maps showing where incidents occurred smudged the info across a grid of about half a mile.


A: Kate Crawford talks about how important Paola’s research was in the Dookhan case. “It’s really valuable for the ACLU to have a scientist working on data like this.”


Q: What happened to the people who were tried with Dookhan evidence?


A: [ann] Special magistrates and special hearings were set up…


Q: [charlie nesson] A MOOC is considering Yes on 4 (marijuana legalization ballot question) and someone asked if there is a relationship between cannabis reform and Black Lives Matter. And you’ve answered that question. It’s remarkable that BLM hasn’t cottoned on to cannabis reform as a sister issue.


Q: I’ve been reading Cathy O’Neil‘s Weapons of Math Destruction [me too!] and I’m wondering if you could talk about your passion for social justice as a data scientist.


A: I’m Mexican. I learned to code when I was 12 because I had access to the Internet. I started working as a web developer at 15, and a few years later I was director of IT for the president’s office. I reflected on how I got that opportunity, and the answer was that it was thanks to open source. That inspired me.


Q: We are not looking at what happens to black women. They get criminalized even more often than black men. Also, has anyone looked at questions of environmental justice?


Q: How can we tell if a visualization is valid or is propaganda? Are there organizations doing this?


A: Great question, and I don’t know how to answer it. We publish the code, but of course not everyone can understand it. I’m not using AI or Deep Learning; I’m keeping it simple.


Q: What’s the next big data set you’re going to work on?


A: (She shows a visualization tool she developed that explores police budgets.)


Q: How do you work with journalists? Do you bring them in early?


A: We haven’t had that much interaction with them yet.

Be the first to comment »

October 25, 2016

[liveblog] Tim Wu

Tim Wu [Twitter: superwuster] is giving a talk jointly sponsored by the Shorenstein Center and the Berkman Klein Center. His new book is The Attention Merchants.  He is introduced by Erie Meyer, a Shorenstein fellow this year.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Tim begins by noting that he was at the Berkman Center at its beginning, when it was pretty much just Charlie Nesson and Jonathan Zittrain.

He says that his new book is the “history of a business model”: the re-sale of human attention. This model has “long anchored the media,” but now has “exploded into all parts of our lives.” It’s part of many business models these days. Even the national parks are selling naming rights to trails.

“Maybe a thousand times a day, something tries to get us to spend maybe a micro-second” to notice something. “The deepest ambition of the book is to say that this is having an effect on the human condition.” He points to the casino effect where you get distracted by links and an hour later you say, “What just happened?” He’s concerned about a model that has us taking our attention away from people and our surroundings and into a commercial space.

The book is a history, he says. “”Newspapers once upon a time were not a mass media.” In 1830 NY’s biggest paper’s circulation was 2,000.”“Newspapers once upon a time were not a mass media.” In 1830 NY’s biggest paper’s circulation was 2,000. Papers were expensive. So, Benjamin Day — “the first attention merchant” — lowered the price of his paper to a penny, and covered a broader range of topics, “human interest stories for a mass audience.”. E.g., the first story in his paper, The NY Sun, was about tragic lovers. He was selling his audience to the advertisers.

“We’re in a time when we’re almost addicted to free stuff — free content, free services.” But people have begun to realize that we are then the product. “What’s being resold is something very scarce: human attention.” And as food, shelter, clothing, etc., are abundant, so the scarce things become even more valuable. We have 168 hours in a week, and that is one of the last scarce resources. “The models of free are scrambling to get at that resource.”

ERIE: You say in the book that trash-talking grabs our attention.

TIM: Many of the current techniques are quite old. E.g., Trolls. The NY Sun attracted competitors, including the NY Tribune. The Trib got attention by picking fights with other newspaper editors. He was the first troll. It worked. “We’ve seen recently that you can run an entire campaign just by insulting people.” The Sun fought back with even more salacious stories. E.g., “it reported that a scientist had discovered life on the moon, including trees, horse-like animals, and man-bats. They never retracted it.”it reported that a scientist had discovered life on the moon, including trees, horse-like animals, and man-bats. They never retracted it.

ERIE: As you point out, one of them grabbed attention by being pro-Abolition, which caused the others to become rabidly anti-Abolition.

TIM: The book doesn’t totally condemn that attention-seeking model, but it warns about its tendency to run to the most lurid content. This makes for constant ethical problems.

ERIE: You talk about the Oprah model…

TIM: “Orpah Winfrey is one of the great innovators in this area.” She was a fully integrated celebrity, production company, advertising company, and a tv network, all in one. She created product endorsements that drove a lot of advertising. She also married the appeal of ministry (salvation, forgiveness, transcendence) and commercialism. By 1995, she was making more money in entertainment than anyone else and gave rise to celebrities who are themselves attention merchants. E.g., Martha Stewart, Donald Trump: the celebrity builds her/his own media empire. Tim expects this to be the future.

One of the subtexts of the book, Tim says, is that the value of human attention was not widely recognized until the 20th century, except for organized religion. The entities interested in what you spent your time doing, before the 20th century, were organized religions that wanted you praying, and going to church, and in various ways to keep God on your mind.” In some ways, Tim says, the story of the book is the story of government and business figuring out that this is valuable resource. The govt realizes it when they see they can raise an army through govt propaganda. Industry, after govt, realizes they can sell products if they have public attention.

ERIE: Can you talk about micro-celebrities?

TIM: There’s a fascinating change in celebrity. (Tim name-checks me for the line “In the future, everyone will be famous to 15 people.”) And reality tv offers the lottery of fame to anyone. This has some consistency with the American dream: Everyone can have their own land and be sort of wealthy. “We have this idea that everyone can be famous.” The negative side of this is that in fact the disparities remain: it’s extremely hard to become famous, and the pursuit of it leads to empty lives. “It’s not like you write something and people read it.” The main reason is biological: ““The default setting of our brain is to ignore everything.”The default setting of our brain is to ignore everything.”

You can control attention to some degree, but it’s always darting around, and you can really only attend to one thing at once.

ERIE: You say the first ad blocker was a remote control…

TIM: In the 1920s, Zenith was a maverick company. The head of it (“The Commodore”) thought commercials had ruined radio. He had his engineers work on ad-blocking software for TV. They came up with the remote control. Originally it was a gun so you could shoot out the commercials. There have been other revolts. In Paris, there was a revolt against posters. In Paris, advertising is still restricted to certain areas. We may be in another such period now. (He mentions the Brave browserthat blocks ads from the gitgo.) “I believe in the power and legitimacy of results.”

ERIE: “You’ve said that if you have a mission in life, it’s to fight bullies.” What should aspiring entrepreneurs do?

TIM: I struggle with this. “A lot of people who have gone into tech have been very idealistic people.” The pay-for-content models haven’t worked so well. One chapter tells the story of decision-making at Google. At one point, it was bleeding money and didn’t know what to do, so they thought about advertising. But in 1996 Larry Page had written a manifesto that declared that advertising-funded search engines will always be biased and will never serve the interests of people. But Google thought it could square the circle with Adwords: a form of advertising that made the product better and didn’t bother people. That was true at the beginning. If an ad showed up, which usually didn’t happen, it’d be useful to you.

But the demands of the ad-based model have increased. The longer it gets, the worse it gets. They’ve increasingly blurred the lines between the organic results and the ads. Google Maps shows us things and it sometimes unclear why. Most of the major platforms haven’t gotten much better for consumers over the past few years, but have gotten better for advertisers. A developer said, ““The best minds of our generation have gone to getting people to click on ads.”The best minds of our generation have gone to getting people to click on ads.”

Tech is a key driver these days, he says. “Which has changed your life more? Government or tech?” I wish Google had considered a different kind of corporate form or model. “I give Wikipedia a lot of credit for going non-commercial. I give even more to the original creators of the Internet who just built it and put it out there.” E.g., the creator of email didn’t look for a business model. Likewise for the creators of the Internet Protocol or the Web.

ERIE: Have you ever clicked on an ad on purpose?

TIM: I think yes. I think I wanted to buy those razors.

Q & A

Q: Two positive examples: FB put out the call to register to vote. Services raise money for worthy causes.

A: Yes. Gathering up attention for some purpose isn’t inherently good or evil. The book argues for carving out quiet spaces, but I believe in the Habermasian public sphere.

Q: Platforms can abandon ads but show us content based on who pays them. How can we rebel against what we can’t see?

A: Ad-blockers are not the most sustainable form of rebellion. I’ve decided that my attitude that I should never pay for anything on the Web came from my adolescent years. You have to support the content you like. “”There’s a difference between buying and supporting.””“There’s a difference between buying and supporting.”

Q: How about “Society as Spectacle“? And Kevin Kelly’s True Fan theory?

A: Paid models support a much broader variety of content. Ad models require the underlying content to more generally be mass content. That’s one of the reason that TV has gotten better over the past fifteen years. Ad supported TV drove to the middle. TV now gets 50% of its revenue from non-advertising.

Q: What’s been your hardest struggle to regain control of your attention?

A: All books probably come from a personal place. Control of attention is a struggle for me. One of the places I decided I needed to write this book was during a 10-day solo trip in the Utah desert. Time seemed to pass in very different ways. An hour could feel like a week. I felt like the modern regime was having me lose control. I like the Web, but I found I didn’t like the way I’d spent my time. I wish I’d spent time on activities I’d consciously chosen. I like JS Mills’ Chapter 3: Life is a matter of autonomy and self-development, and you need to make decisions that are yours.

Q: Is your a book is a manifesto for policy change, or a self-help book?

A: Can I have a third option?

Q: Are there policy implications?

A: I struggled with how much to make this legally prescriptive. Should I end the book with policy proposals? I decided not to, for a number of reasons. One had to do with craft: those last chapters of policy prescriptions, after a book covering 200 years, are usually pathetic. It’s very hard to regulate well. A lot of it has to do with how people conduct their lives. Policies aren’t sensitive to individual situations. I have complex feelings about it and didn’t want to cram into the book. And then people focus on those prescriptions at the expense of the rest of the book.

“If you get down to it, there is room for a new era of consumer protection” that tries to protect attention. Especially when it’s not consensual. E.g., the back of a taxi cab where you’re forced to be exposed to ads. “Non-consensual things reaching you…in law we call that ‘battery’.”

Q: Is commerce in attention span part of a democracy? People have to learn things they would not willingly learn.

A: If we perfect our filters, we may live in worlds where we learn only what we want to learn. I have complicated ideas about this. The penny press did a good job of creating the sense of a public and public opinion. But I resist the idea that to be a democracy we have to all attend to the same sources of information. “In the 19th century, America was a flourishing democracy and there was no national media”In the 19th century, America was a flourishing democracy and there was no national media, and lived in geographically defined filter bubbles. I don’t pine for the 1950s when everyone watched the same news broadcasts. Building one’s character means making your own information environment.

Be the first to comment »

October 23, 2016

[Speculative Spoiler] WestWorld

Here’s a spoiler based on nothing. Please note that I’m never right.

The Man in Black (Ed Harris) wil be revealed to be a robot. He was created by the the dead co-founder for some reason, like to be the chaos principle that will drive the genetic algorithms, or some other such sciencey sounding thing. (This would invert the Jurassic Park idea in the assumption that we can control nature is disproven. In WestWorld, according to my made-up spoiler, the park would have built in a principle of chaos.)

So, that’s settled.

1 Comment »

October 14, 2016

What is it anyway?

I found this on Reddit. Can you tell what it is?



Click on the black stripe to find out: The gear that drives a lawn sprinkler

2 Comments »

October 13, 2016

Michelle Obama speaking truth

These are words we need to hear.

I will so miss her voice. I hope she will stay where we can hear her.

2 Comments »

October 12, 2016

[liveblog] Perception of Moral Judgment Made by Machines

I’m at the PAPIs conference where Edmond Awad [ twitter]at the MIT Media Lab is giving a talk about “Moral Machine: Perception of Moral Judgement Made by Machines.”

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

He begins with a hypothetical in which you can swerve a car to kill one person instead of stay on its course and kill five. The audience chooses to swerve, and Edmond points out that we’re utilitarians. Second hypothesis: swerve into a barrier that will kill you but save the pedestrians. Most of us say we’d like it to swerve. Edmond points out that this is a variation of the trolley problem, except now it’s a machine that’s making the decision for us.

Autonomous cars are predicted to minimize fatalities from accidents by 90%. He says his advisor’s research found that most people think a car should swerve and sacrifice the passenger, but they don’t want to buy such a car. They want everyone else to.

He connects this to the Tragedy of the Commons in which if everyone acts to maximize their good, the commons fails. In such cases, governments sometimes issue regulations. Research shows that people don’t want the government to regulate the behavior of autonomous cars, although the US Dept of Transportation is requiring manufacturers to address this question.

Edmond’s group has created the moral machine, a website that creates moral dilemmas for autonomous cars. There have been about two million users and 14 million responses.

Some national trends are emerging. E.g., Eastern countries tend to prefer to save passengers more than Western countries do. Now the MIT group is looking for correlations with other factors, e.g., religiousness, economics, etc. Also, what are the factors most crucial in making decisions?

They are also looking at the effect of automation levels on the assignment of blame. Toyota’s “Guardian Angel” model results in humans being judged less harshly: that mode has a human driver but lets the car override human decisions.

Q&A

In response to a question, Edmond says that Mercedes has said that its cars will always save the passenger. He raises the possibility of the owner of such a car being held responsible for plowing into a bus full of children.

Q: The solutions in the Moral Machine seem contrived. The cars should just drive slower.

A: Yes, the point is to stimulate discussion. E.g., it doesn’t raise the possibility of swerving to avoid hitting someone who is in some way considered to be more worthy of life. [I’m rephrasing his response badly. My fault!]

Q: Have you analyzed chains of events? Does the responsibility decay the further you are from the event?

This very quickly gets game theoretical.
A:

Be the first to comment »

October 11, 2016

[liveblog] Bas Nieland, Datatrix, on predicting customer behavior

At the PAPis conference Bas Nieland, CEO and Co-Founder of Datatrics, is talking about how to predict the color of shoes your customer is going to buy. The company tries to “make data science marketeer-proof for marketing teams of all sizes.” IT ties to create 360-degree customer profiles by bringing together info from all the data silos.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

They use some machine learning to create these profiles. The profile includes the buying phase, the best time to present choices to a user, and the type of persuasion that will get them to take the desired action. [Yes, this makes me uncomfortable.]

It is structured around a core API that talks to mongoDB and MySQL. They provide “workbenches” that work with the customer’s data systems. They use BigML to operate on this data.

The outcome are models that can be used to make recommendations. They use visualizations so that marketeers can understand it. But the marketeers couldn’t figure out how to use even simplified visualizations. So they created visual decision trees. But still the marketeers couldn’t figure it out. So they turn the data into simple declarative phrases: which audience they should contact, in which channel, what content, and when. E.g.:

“To increase sales, çontact your customers in the buying phase with high engagement through FB with content about jeans on sale on Thursday, around 10 o’clock.”

They predict the increase in sales for each action, and quantify in dollars the size of the opportunity. They also classify responses by customer type and phase.

For a hotel chain, they connected 16,000 variables and 21M data points, that got reduced to 75 variables by BigML which created a predictive model that ended up getting the chain more customer conversions. E.g., if the model says someone is in the orientation phase, the Web site shows photos of recommend hotels. If in the decision phase, the user sees persuasive messages, e.g., “18 people have looked at this room today.” The messages themselves are chosen based on the customer’s profile.

Coming up: Chatbot integration. It’s a “real conversation” [with a bot with a photo of an atttractive white woman who is supposedly doing the chatting]

Take-aways: Start simple. Make ML very easy to understand. Make it actionable.

Q&A

Me: Is there a way built in for a customer to let your model know that it’s gotten her wrong. E.g., stop sending me pregnancy ads because I lost the baby.

Bas: No.

Me: Is that on the roadmap?

Bas: Yes. But not on a schedule. [I’m not proud of myself for this hostile question. I have been turning into an asshole over the past few years.]

Be the first to comment »

[liveblog] Vinny Senguttuvan on Predicting Customers

Vinny Senguttuvan is Senior Data Scientist at METIS. Before that, he was at Facebook-based gaming company, High 5 Games, which had 10M users. His talk at PAPIs: “Predicting Customers.”

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

The main challenge: Most of the players play for free. Only 2% ever spend money on the site, buying extra money to play. (It’s not gambling because you never cash out). 2% of those 2% contribute the majority of the revenue.

All proposed changes go through A/B testing. E.g., should we change the “Buy credits” button from blue to red. This is classic hypothesis testing. So you put up both options and see which gets the best results. It’s important to remember that there’s a cost to the change, so the A-B preference needs to be substantial enough. But often the differences are marginal. So you can increase the sample size. This complicates the process. “A long list of changes means not enough time per change.” And you want to be sure that the change affects the paying customers positively, which means taking even longer.

When they don’t have enough samples, they can bring down the confidence level required to make the change. Or they could bias one side of the hypothesis. And you can assume the variables are independent and run simultaneous A-B tests on various variables. High 5 does all three. It’s not perfect but it works.

Second, there is a poularity metric by which they rank or classify their 100 games. They constantly add games — it went from 15 to 100 in two years. This continuously changes the ranking of the games. Plus, some are launched locked. This complicates things. Vinny’s boss came up with a model of an n-dimensional casino, but it was too complex. Instead, they take 2 simple approaches: 1. An average-weighted spin. 2. Bayesian. Both predicted well but had flaws, so they used a type of average of both.

Third: Survival analysis. They wanted to know how many users are still active a given time after they created their account, and when is a user at risk of discontinuing use. First, they grouped users into cohorts (people who joined within a couple of weeks of each other) and plotted survival rates over time. They also observed return rates of users after each additional day of absence. They also implement a Cox survival model. They found that newer users were more likely to decline in their use of the product; early users are more committed. This pattern is widespread. That means they have to continuously acquire new players. They also alert users when they reach the elbow of disuse.

Fourth: Predictive lifetime value. Lifetime value = total revenue from a user over the entire time the the produced. This is significant because of costs: 10-15% of the rev goes into ads to acquire customers. Their 365 day prediction model should be a time series, but they needed results faster, so they flipped it into a regression problem, predicting the 365 day revenue based on the user’s first month data: how they spent, purchase count, days of play, player level achievement, and the date joined. [He talks about regression problems, but I can’t keep up.] At that point it cost $2 to acquire a customer from FB ad, and $6 from mobile apps. But when they tested, the mobile acquisitions were more profitable than those that came from through FB. It turned out that FB was counting as new users any player who hadn’t played in 30 days, and was re-charging them for it. [I hope I got that right.]

Fifth: Recommendation systems. Pandora notes the feature of songs and uses this to recommend similarities. YouTube makes recommendations made based on relations among users. Non-matrix factorization [I’m pretty sure he just made this up] gives you the ability to predict the score for a video that you know nothing about in terms of content. But what if the ratings are not clearly defined? At High 5, there are no explicit ratings. They calculated a rating based on how often a player plays it, how long the session, etc. And what do you do about missing values: use averages. But there are too many zeroes in the system, so they use sparse matrix solvers. Plus, there is a semi-order to the games, so they used some human input. [Useful for library Stackscores
?]

Be the first to comment »

[liveblog] First panel: Building intelligent applications with machine learning

I’m at the PAPIs conference. The opening panel is about building intelligent apps with machine learning. The panelists are all representing companies. It’s Q&A with the audience; I will not be able to keep up well.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

The moderator asks one of the panelists (Snejina Zacharia from Insurify) how AI can change a heavily regulated audience such as insurance. She replies that the insurance industry gets low marks for customer satisfaction, which is an opportunity. Also, they can leverage the existing platforms and build modern APIs on stop of them. Also, they can explore how to use AI in existing functions, e.g., chatbots, systems that let users just confirm their identification rather than enter all the data. They also let users pick from an AI-filtered list of carriers that are right for them. Also, personalization: predicting risk and adjusting the questionnaire based on the user’s responses.

Another panelist is working on mapping for a company that is not Google and that is owned by three car companies. So, when an Audi goes over a bump, and then a Mercedes goes over it, it will record the same data. On personalization: it’s ripe for change. People are talking about 100B devices being connected by 2020. People think that RFID tags didn’t live up to their early hype, but 10 billion RFID tags are going to be sold this year. These can provide highly personalized, higher relevant data. This will be the base for the next wave of apps. We need a standards body effort, and governments addressing privacy and security. Some standards bodies are working on it, e.g., Global Standards 1, which manages the barcodes standard.

Another panelist: Why is marketing such a good opportunity for AI and ML? Marketers used to have a specific skill set. It’s an art: writing, presenting, etc. Now they’re being challenged by tech and have to understand data. In fact, now they have to think like scientists: hypothesize, experiment, redo the hypothesis… And now marketers are responsible for revenue. Being a scientist responsible for predictable revenue is driving interest in AI and ML. This panelist’s company uses data about companies and people to segmentize following up on leads, etc. [Wrong place for a product pitch, IMO, which is a tad ironic, isn’t it?]

Another panelist: The question is: how can we use predictive intelligence to make our applications better? Layer input intelligence on top of input-programming-output. For this we need a platform that provides services and is easy to attach to existing processes.

Q: Should we develop cutting edge tech or use what Google, IBM, etc. offer?

A: It depends on whether you’re an early adopter or straggler. Regulated industries have to wait for more mature tech. But if your bread and butter is based on providing the latest and greatest, then you should use the latest tech.

A: It also depends on whether you’re doing a vertically integrated solution or something broader.

Q: What makes an app “smart”? Is it: Dynamic, with rapidly changing data?

A: Marketers use personas, e.g., a handful of types. They used to be written in stone, just about. Smart apps update the personas after ever campaign, every time you get new info about what’s going on in the market, etc.

Q: In B-to-C marketing, many companies have built the AI piece for advertising. Are you seeing any standardization or platforms on top of the advertising channels to manage the ads going out on them?

A: Yes, some companies focus on omni-channel marketing.

A: Companies are becoming service companies, not product companies. They no longer hand off to retailers.

A: It’s generally harder to automate non-digital channels. It’s harder to put a revenue number on, say, TV ads.

Be the first to comment »

[liveblog] PAPIs: Cynthia Rudin on Regulating Greed

I’m at the PAPIs (Predictive Applications and APIS) [twitter: papistotio] conference at the NERD Center in Cambridge.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

The first speaker is Cynthia Rudin, Director of the Prediction Analysis Lab at MIT. Her topic is “Regulating Greed over Time: An Important Lesson for Practical Recommender Systems.” It’s about her Lab’s entry in a data mining competition. (The entry did not win.) The competition was to design a better algorithm for Yahoo’s recommendation of articles. To create an unbiased data set they showed people random articles for two weeks. Your algorithm had to choose to show one of the pool of articles to a user. To evaluate a recommender system, they’d check if your algorithm recommended the same thing that was shown to the user. If the user clicked on it, you could get an evaluation. [I don’t think I got this right.] If so, you sent your algorithm to Yahoo, and they evaluated its clickthrough rate; you never got access to Yahoo’s data.

This is, she says, a form of the multi-arm bandit problem: one arm is better (more likely to lead to a pay out) but you don’t know which one. So you spend your time figuring out which arm is the best, and then you only pull that one. Yahoo and Microsoft are among the companies using multi-arm bandit systems for recommendation systems. “They’re a great alternative to massive A-B testing

] [No, I don’t understand this. Not Cynthia’s fault!.].

Because the team didn’t have access to Yahoo’s data, they couldn’t tune their algorithms to it. Nevertheless, they achieved a 9% clickthrough rate … and still lost (albeit by a tiny margin). Cynthia explains how they increased the efficiency of their algorithms, but it’s math so I can only here play the sound of a muted trumpet. But it involves “decay exploration on the old articles,” and a “peak grabber”: If any articles gets more than 9 clicks out of the last 100 times they show the article, and they keep displaying it: if you have a good article, grab it. The dynamic version of a Peak Grabber had them continuing to showing a peak article if it had a clickthrough rate 14% above the global clickthrough rate.

“We were adjusting the exploration-exploitation tradeoff based on trends.” Is this a phenomenon worth exploring?The phenomenon: you shouldn’t always explore. There are times when you should just stop and exploit the flowers.

Some data supports this. E.g., in England, on Boxing Day you should be done exploring and just put your best prices on things — not too high, not too low. When the clicks on your site are low, you should be exploring. When high, maybe not. “Life has patterns.” The Multiarm Bandit techniques don’t know about these patterns.

Her group came up with a formal way of putting this. At each time there is a known reward multiplier: G(t). G is like the number of people in the store. When G is high, you want to exploit, not explore. In the lower zones you want to balance exploration and exploitation.

So they created two theorems, each leading to an algorithm. [She shows the algorithm. I can’t type in math notation that fast..]

Be the first to comment »

« Previous Page | Next Page »