Author: Devin Coldewey

Facebook’s AI team maps the whole population of Africa

A new map of nearly all of Africa shows exactly where the continent’s 1.3 billion people live down to the meter, which could help everyone from local governments to aid organizations. The map joins others like it from Facebook created by running satellite imagery through a machine learning model.

It’s not exactly that there was some mystery about where people live, but the degree of precision matters. You may know that a million people live in a given region, and that about half are in the bigger city and another quarter in assorted towns. But that leaves hundreds of thousands only accounted for in the vaguest way.

Fortunately you can always inspect satellite imagery and pick out the spots where small villages and isolated houses and communities are. The only problem is that Africa is big. Really big. Manually labeling the satellite imagery even from a single mid-sized country like Gabon or Malawi would take a huge amount of time and effort. And for many applications of the data, such as coordinating the response to a natural disaster or distributing vaccinations, time lost is lives lost.

Better to get it all done at once then, right? That’s the idea behind Facebook’s Population Density Maps project, which had already mapped several countries over the last couple years before the decision was made to take on the entire African continent.

Zoom in and you can see the difference between the new and old maps. It’s pretty significant.

“The maps from Facebook ensure we focus our volunteers’ time and resources on the places they’re most needed, improving the efficacy of our programs,” said Tyler Radford, executive director of the Humanitarian OpenStreetMap Team, one of the project’s partners.

The core idea is straightforward: Match census data (how many people live in a region) with structure data derived from satellite imagery to get a much better idea of where those people are.

“With just the census data, the best you can do is assume that people live everywhere in the district – buildings, fields, and forests alike,” said Facebook engineer James Gill. ““But once you know the building locations, you can skip the fields and forests and only allocate the population to the buildings. This gives you very detailed 30 meter by 30 meter population maps.”

That’s several times more accurate than any extant population map of this size. The analysis is done by a machine learning agent trained on OpenStreetMap data from all over the world where people have labeled and outlined buildings and other features.

First the huge amount of Africa’s surface that obviously has no structure had to be removed from consideration, reducing the amount of space the team had to evaluate by a factor of a thousand or more. Then, using a region-specific algorithm (because things look a lot different in coastal Morocco than they do in central Chad), the model identifies patches that contain a building.

The map data, top left two images, is processed to find buildings, bottom left two; ultimately large tracts of land can be labeled as populated or not, as seen at right.

Throughout this process there’s a lot of double-checking by humans to make sure there are no regional biases or tendencies to mislabel in some way or another. The team has been doing it for some time, so it’s not their first rodeo, but the scale of “one country” vs. “all of Africa” is a bit different. Fortunately there have been some advances, the company’s AI team wrote in an explanatory blog post:

We’ve been able to simplify the problem to a straightforward binary classification task… Now, given an input image, a single neural net predicts whether the given image contains a building. This approach to classification is also significantly less computationally expensive than a segmentation-based approach because it allows us to use smaller neural nets and produce outputs with a smaller memory footprint.

With greater efficiency, in this case, also comes greater accuracy, since the algorithms will have learned from their previous attempts and more data is included to prevent false positives and negatives. The team found that of 1,000 patches labeled as containing buildings, 996 of them were correct. That kind of error rate sounds pretty acceptable to me, and is certainly better than the existing tools, which only gave you a vague “out there somewhere” when you asked about a small community or off-grid village.

If you’re wondering why Facebook is doing this in the first place, it has to do with their efforts over past years to identify populations with poor connectivity, so they can then beam internet down to them with lasers or the like. That’s all rather low priority right now, what with the company’s many problems right now, but the tools it was building clearly had humanitarian applications and it’s nice to see that the baby was not thrown out with the bathwater.

Harvard-MIT initiative grants $750K to projects looking to keep tech accountable

Artificial intelligence, or what passes for it, can be found in practically every major tech company and, increasingly, in government programs. A joint Harvard-MIT program just unloaded $750,000 on projects looking to keep such AI developments well understood and well reported.

The Ethics and Governance in AI Initiative is a combination research program and grant fund operated by MIT’s Media Lab and Harvard’s Berkman-Klein Center. The small projects selected by the initiative are generally speaking aimed at using technology to keep people informed, or informing people about technology.

AI is an enabler of both good and ill in the world of news and information gathering, as the initiative’s director, Tim Hwang, said in a news release:

“On one hand, the technology offers a tremendous opportunity to improve the way we work —
including helping journalists find key information buried in mountains of public records. Yet we
are also seeing a range of negative consequences as AI becomes intertwined with the spread of
misinformation and disinformation online.”

These grants are not the first the initiative has given out, but they are the first in response to an open call for ideas, Hwang noted.

The largest sum of the bunch, a $150K grant, went to MuckRock Foundation’s project Sidekick, which uses machine learning tools to help journalists scour thousands of pages of documents for interesting data. This is critical in a day and age when government and corporate records are so voluminous (for example, millions of emails leaked or revealed via FOIA) that it is basically impossible for a reporter or even team to analyze them without help.

Along the same lines is Legal Robot, which was awarded $100K for its plan to mass-request government contracts, then extract and organize the information within. This makes a lot of sense: People I’ve talked to in this sector have told me that the problem isn’t a lack of data but a surfeit of it, and poorly kept at that. Cleaning up messy data is going to be one of the first tasks any investigator or auditor of government systems will want to do.

Tattle is a project aiming to combat disinformation and false news spreading on WhatsApp, which as we’ve seen has been a major vector for it. It plans to use its $100K to establish channels for sourcing data from users, since of course much of WhatsApp is encrypted. Connecting this data with existing fact-checking efforts could help understand and mitigate harmful information going viral.

The Rochester Institute of Technology will be using its grant (also $100K) to look into detecting manipulated video, both designing its own techniques and evaluating existing ones. Close inspection of the media will render a confidence score that can be displayed via a browser extension.

Other grants are going to AI-focused reporting work by the Seattle Times and by newsrooms in Latin America, and to workshops training local media in reporting AI and how it affects their communities.

To be clear, the initiative isn’t investing in these projects — just funding them with a handful of stipulations, Hwang explained to TechCrunch over email.

“Generally, our approach is to give grantees the freedom to experiment and run with the support that we give them,” he wrote. “We do not take any ownership stake but the products of these grants are released under open licenses to ensure the widest possible distribution to the public.”

He characterized the initiative’s grants as a way to pick up the slack that larger companies seem are leaving behind as they focus on consumer-first applications like virtual assistants.

“It’s naive to believe that the big corporate leaders in AI will ensure that these technologies are being leveraged in the public interest,” wrong Hwang. “Philanthropic funding has an important role to play in filling in the gaps and supporting initiatives that envision the possibilities for AI outside the for-profit context.”

You can read more about the initiative and its grantees here.

Facebook’s head of comms hits the road after 8 years at the company

Facebook head of communications Caryn Marooney is leaving for greener pastures, she announced today, on Facebook of course. She joins the growing number of executives and high-level employees departing the company during and after what may be its toughest year.

“I spent a lot of time over the winter holiday reflecting, and with the New Year, and after 8 years at Facebook, I’ve decided to step down as leader of the communications group,” Marooney wrote. “I’ve decided it’s time to get back to my roots: going deep in tech and product.”

She thanked CEO Mark Zuckerberg and COO Sheryl Sandberg, with whom she worked closely. The former commented to thank Marooney “for the dedication and brilliance you have brought to Facebook over the years.”

Certainly she saw Facebook during a period of intense growth and transition, though arguably the company’s entire history has been marked by those traits. But 2011’s Facebook was remarkably smaller and less complex — operationally, ethically and legally — so to have gone from that middle stage to the present must certainly have been quite a ride.

Marooney is just the latest in what seems like a constant of high-profile departures over the last year:

Obviously in a large company there’s going to be turnover. But an average of one a month seems like a lot.

There’s no indication Marooney left because of any acute cause other than wanting to move on to the next thing. It’s just that a lot of people seem to be doing it at the same time.

Official emoji debut for disabled folks, service dogs, waffles and more

A gaggle of new emoji have just been approved by the Unicode Consortium, meaning they’ll be standard across any platforms that choose to support them. This batch includes some much-needed representation for people with various disabilities, new animals from guide dogs to otters, food and many more objects.

Folks with disabilities get a nice variety of new emoji, though of course these aren’t exhaustive (for example, how do you represent a learning disability or mental illness?). Still, Apple’s proposal for the new emoji points out the necessity of, for example, having both mechanical and manual wheelchairs:

The type of assistive technology that is used by individuals is very personal and mandated by their own disability need. For someone who cannot self-propel and therefore uses an electric wheelchair, it would not be realistic to only show a manual chair. For those who can use a manual version, it would not be realistic to insinuate that they have less mobility than they do. Therefore, these should be seen as two totally separate forms of assistive device.

These images, as usual, are only samples; the final emoji that will be used depend on your device or service. However, since Apple proposed these ones and they are of course a popular platform for emoji use, you can probably expect these to be very like the final ones.

There are lots of other useful things added as well. Guide and service dogs; otters and flamingos; some tasty food like waffles and butter (my breakfast can now finally be represented accurately); and some items particularly relevant to Indian users — a sari, diya lamp and tuk-tuk.

Adding support for people of different colors and genders, including non-gendered imagery, has been an ongoing process for the last few years. The latest addition is a pair of non-gendered people holding hands, with the full set of color variations. Expect more along these lines; other proposals have been made but haven’t yet been finalized.

You can browse the full list of new emoji here; expect them to be added to your favorite messaging app after a handful of months once art and code updates are final.

Snopes and AP stop fact checking for Facebook

Two of Facebook’s four fact-checking partners in the U.S. are no longer working for the program: Snopes, which recently rebuffed reports that its relationship with Facebook was strained, and the Associated Press. Both confirmed they are no longer performing fact checking for Facebook, but left open the possibility of future collaboration.

Snopes joined Facebook’s group of third-party fact checkers in 2016, at first volunteering its services and the next year accepting a lump $100,000 payment for their work. But the company said in a statement that it’s rethinking providing services like this at all:

At this time we are evaluating the ramifications and costs of providing third-party fact-checking services, and we want to determine with certainty that our efforts to aid any particular platform are a net positive for our online community, publication, and staff.

Snopes founder David Mikkelson added in a statement to TechCrunch that “we felt that the Facebook fact check partnership wasn’t working well for us as an organization.” I’ve asked Facebook for comment.

The news comes hot on the heels of a recent article in the Guardian by former Snopes employees who described the partnership as being “in disarray.”

But the Snopes founder strongly disagreed with that characterization, and the suggestion that Facebook had been interfering with or otherwise unduly influencing the fact-checking program. Mikkelson described the work as “literally just data entry,” and said that Facebook never told them what they should check, with a handful of exceptions like bringing high-profile hoaxes to checkers’ attention during the 2018 election.

In fact, Mikkelson said the main problem was a lack of engagement from Facebook. The tools, he said, were rudimentary, and checkers were limited in the number of articles they could evaluate. Meanwhile, the effect of the fact-checking program was poorly communicated both to partners and users. Was it working? How well? In what way? What changes are being made, if any, to the algorithms and systems involved?

In comments to Poynter, Snopes VP of operations Danny Green said the process needed to be improved:

With a manual system and a closed system — it’s impossible to keep on top of that stuff… It doesn’t seem like we’re striving to make third-party fact checking more practical for publishers — it seems like we’re striving to make it easier for Facebook. At some point, we need to put our foot down and say, ‘No. You need to build an API.’

This surely formed at least part of the reason why Snopes declined to renew its yearly contract with Facebook. It seems to be a coincidence that the announcement came shortly after yet another bad week for the latter; the contracts seem to be for calendar years so the decision not to rejoin would have been made some time ago.

It’s not the only U.S. fact checker declining to continue its work. The Associated Press confirmed to TechCrunch that it too is “not currently doing fact-checking work for Facebook.” In a statement, the news organization said that it “constantly evaluates how to best deploy its fact-checking resources, and that includes ongoing conversations with Facebook about opportunities to do important fact-checking work on its platform.”

Update: The AP representative contacted TechCrunch to say that although it is not doing fact checking work for the program, it is not leaving it altogether. I’ve asked for clarification on this point, but in the meantime I’ve adjusted the wording in the post and headline to reflect it.

Politifact confirmed it is staying with the program; I’ve also asked (the fourth U.S. fact-checker) and the AFP, which is a partner in multiple countries. I’ll update this post if I hear back.

Instagram outage forces millions to look directly at the world for nearly half an hour

Photo-sharing app and social network Instagram was briefly taken offline on Monday afternoon, causing nothing of consequence to occur other than a brief respite from one source of the constant deluge of inconsequential information to which we all voluntarily submit ourselves.

The service died at about 4:20, tragically the very moment when millions of people were turning to the app, for the third time that hour, desperately hoping to pass the time until the end of the workday. At least this was the case on the west coast of the U.S., the only location we are considering at this time.

The app launched fine but did not refresh feeds, and users were unable to scroll past however many posts were already cached; stories, which are also cached, were accessible but couldn’t be posted. I was able to send messages, but others weren’t.

Amazingly, even the website went down, and hard. Visitors received a “5xx Server Error,” which is not common — usually a server knows which of the various error codes to return. It seems to be back now, though.

The outage appeared to end, for some anyway, at about quarter of five, which means many of us are still at our desks, if we’re lucky enough to have them. If you were affected, here’s hoping your half hour was spent productively.

An Instagram representative told TechCrunch that they’re aware of the issue and working to fix it.

Facebook fears no FTC fine

Reports emerged today that the FTC is considering a fine against Facebook that would be the largest ever from the agency. Even if it were ten times the size of the largest, a $22.5 million bill sent to Google in 2012, the company would basically laugh it off. Facebook is made of money. But the FTC may make it provide something it has precious little of these days: accountability.

A Washington Post report cites sources inside the agency (currently on hiatus due to the shutdown) saying that regulators have “met to discuss imposing a record-setting fine.” We may as well say here that this must be taken with a grain of salt at the outset; that Facebook is non-compliant with terms set previously by the FTC is an established fact, so how much they should be made to pay is the natural next topic of discussion.

But how much would it be? The scale of the violation is hugely negotiable. Our summary of the FTC’s settlement requirements for Facebook indicate that it was:

  • barred from making misrepresentations about the privacy or security of consumers’ personal information;
  • required to obtain consumers’ affirmative express consent before enacting changes that override their privacy preferences;
  • required to prevent anyone from accessing a user’s material more than 30 days after the user has deleted his or her account;
  • required to establish and maintain a comprehensive privacy program designed to address privacy risks associated with the development and management of new and existing products and services, and to protect the privacy and confidentiality of consumers’ information; and
  • required, within 180 days, and every two years after that for the next 20 years, to obtain independent, third-party audits certifying that it has a privacy program in place that meets or exceeds the requirements of the FTC order, and to ensure that the privacy of consumers’ information is protected.

How many of those did it break, and how many times? Is it per user? Per account? Per post? Per offense? What is “accessing” under such and such a circumstance? The FTC is no doubt deliberating these things.

Yet it is hard to imagine them coming up with a number that really scares Facebook. A hundred million dollars is a lot of money, for instance. But Facebook took in more than $13 billion in revenue last quarter. Double that fine, triple it, and Facebook bounces back.

If even a fine ten times the size of the largest it ever threw can’t faze the target, what can the FTC do to scare Facebook into playing by the book? Make it do what it’s already supposed to be doing, but publicly.

How many ad campaigns is a user’s data being used for? How many internal and external research projects? How many copies are there? What data specifically and exactly is it collecting on any given user, how is that data stored, who has access to it, to whom is it sold or for whom is it aggregated or summarized? What is the exact nature of the privacy program it has in place, who works for it, who do they report to, and what are their monthly findings?

These and dozens of other questions come immediately to mind as things Facebook should be disclosing publicly in some way or another, either directly to users in the case of how one’s data is being used, or in a more general report, such as what concrete measures are being taken to prevent exfiltration of profile data by bad actors, or how user behavior and psychology is being estimated and tracked.

Not easy or convenient questions to answer at all, let alone publicly and regularly. But if the FTC wants the company to behave, it has to impose this level of responsibility and disclosure. Because, as Facebook has already shown, it cannot be trusted to disclose it otherwise. Light touch regulation is all well and good… until it isn’t.

This may in fact be such a major threat to Facebook’s business — imagine having to publicly state metrics that are clearly at odds with what you tell advertisers and users — that it might attempt to negotiate a larger initial fine in order to avoid punitive measures such as those outlined here. Volkswagen spent billions not on fines, but in sort of punitive community service to mitigate the effects of its emissions cheating. Facebook too could be made to shell out in this indirect way.

What the FTC is capable of requiring from Facebook is an open question, since the scale and nature of these violations are unprecedented. But whatever they come up with, the part with a dollar sign in front of it — however many places it goes to — will be the least of Facebook’s worries.

Turns out the science saying screen time is bad isn’t science

A new study is making waves in the worlds of tech and psychology by questioning the basis of thousands of paper and analyses with conflicting conclusions on the effect of screen time on well-being. The researchers claim is that the science doesn’t agree because it’s bad science. So is screen time good or bad? It’s not that simple.

The conclusions only make the mildest of claims about screen time, essentially that as defined it has about as much effect on well-being as potato consumption. Instinctively we may feel that not to be true; technology surely has a greater effect than that — but if it does, we haven’t found a way to judge it accurately.

The paper, by Oxford scientists Amy Orben and Andrew Przybylski, amounts to a sort of king-sized meta-analysis of studies that come to some conclusion about the relationship between technology and well-being among young people.

Their concern was that the large datasets and statistical methods employed by researchers looking into the question — for example, thousands and thousands of survey responses interacting with weeks of tracking data for each respondent — allowed for anomalies or false positives to be claimed as significant conclusions. It’s not that people are doing this on purpose necessarily, only that it’s a natural result of the approach many are taking.

“Unfortunately,” write the researchers in the paper, “the large number of participants in these designs means that small effects are easily publishable and, if positive, garner outsized press and policy attention.” (We’re a part of that equation, of course, but speaking for myself at least I try to include a grain of salt with such studies, indeed with this one as well.)

In order to show this, the researchers essentially redid the statistical analysis for several of these large datasets (Orben explains the process here), but instead of only choosing one result to present, they collected all the plausible ones they could find.

For example, imagine a study where the app use of a group of kids was tracked, and they were surveyed regularly on a variety of measures. The resulting (fictitious, I hasten to add) paper might say it found kids who use Instagram for more than two hours a day are three times as likely to suffer depressive episodes or suicidal ideations. What the paper doesn’t say, and which this new analysis could show, is that the bottom quartile is far more likely to suffer from ADHD, or the top five percent reported feeling they had a strong support network.

In the new study, any and all statistically significant results like those I just made up are detected and compared with one another. Maybe a study came out six months later that found the exact opposite in terms of ADHD but also didn’t state it as a conclusion.

This figure from the paper shows a few example behaviors that have more or less of an effect on well-being.

Ultimately what the Oxford study found was that there is no consistent good or bad effect, and although a very slight negative effect was noted, it was small enough that factors like having a single parent or needing to wear glasses were far more important.

Yet, and this is important to understand, the study does not conclude that technology has no negative or positive effect; such a broad conclusion would be untenable on its face. The data it rounds up are (as some experts point out with no ill will towards the paper) simply inadequate to the task and technology use is too variable to reduce to single factor. Its conclusion is that studies so far have in fact been inconclusive and we need to go back to the drawing board.

“The nuanced picture provided by these results is in line with previous psychological and epidemiological research suggesting that the associations between digital screen-time and child outcomes are not as simple as many might think,” the researchers write.

Could, for example, social media use affect self-worth, either positively or negatively? Could be! But the ways that scientists have gone about trying to find out have, it seems, been inadequate.

In the future, the authors suggest, researchers should not only design their experiments more carefully, but be more transparent about their analysis. By committing to document all significant links in the dataset they create, whether they fit the narrative or hypothesis or go against it, researchers show that they have not rigged the study from the start. Designing and iterating with this responsibility in mind will produce better studies and perhaps even some real conclusions.

What should parents, teachers, siblings, and others take away from this? Not anything about screen time or whether tech is good or bad, certainly. Rather let it be another instance of the frequently learned lesson that science is a work in progress and must be considered very critically before application.

Your kid is an individual and things like social media and technology affect them differently from other kids; it may very well be that your informed opinion of their character and habits, tempered with that of a teacher or psychologist, is far more accurate than the “latest study.”

Orben and Przybylski’s study, “The association between adolescent well-being and digital technology use,” appears in today’s issue of the journal Nature Human Behavior.

Facebook’s fact-checkers toil on

Facebook is fielding so many problems, oversights, scandals, and other miscellaneous ills that it wouldn’t surprise anyone to hear that its fact-checking program, undertaken last year after the network was confronted with its inaction in controlling disinformation, is falling apart. But in this case the reason you haven’t heard much about it isn’t because it’s a failure, but because fact-checking is boring and thankless — and being done quietly and systematically by people who are just fine with that.

The “falling apart” narrative was advanced in a recent article at The Guardian, and some of the problems noted in that piece are certainly real. But I was curious at the lack of documentation of the fact-checking process itself, so I talked with a couple of the people involved to get a better sense of it.

I definitely didn’t get the impression of a program in crisis at all, but rather one where the necessity of remaining hands-off with the editorial process and teams involved has created both apparent and real apathy when it comes to making real changes.

No bells, no whistles

Facebook likes to pretend that its research into AI will solve just about every problem it has. Unfortunately not only is that AI hugely dependent on human intelligence to work in the first place, but the best it can generally do is forward things on to human agents for final calls. Nowhere is that more obvious than in the process of fact-checking, in which it is trivial for machine learning agents to surface possibly dubious links or articles, but at this stage pretty much impossible for them to do any kind of real evaluation of them.

That’s where the company’s network of independent fact-checkers comes in. No longer among their number are two former Snopes staffers who left to work at another fact-checking concern — pointedly not involved with Facebook — and who clearly had major problems with the way the program worked. Most explosive was the accusation that Facebook had seemingly tried to prioritize fact checks that concerned an advertiser.

But it wasn’t clear from their complaints just how the program does work. I chatted with Snopes head David Mikkelson and checked in with Politifact editor Angie Drobnic Holan. They emphatically denied allegations of Facebook shenanigans, though they had their own reservations, and while they couldn’t provide exact details of the system they used, it sounds pretty straightforward.

“For the most part it’s literally just data entry,” explained Mikkelson. “When we fact-check something, we enter its URL into a database. You could probably dress it up in all kinds of bells and whistles, but we don’t really need or expect much more than that. We haven’t changed what we do or how we do it.”

Mikkelson described the Facebook system in broad terms. It’s a dashboard of links that are surfaced, as Facebook has explained before, primarily through machine learning systems that know what sort of thing to look for: weird URLs, bot promotion, scammy headlines, etc. They appear on the dashboard in some semblance of order, for instance based on traffic or engagement.

“It lists a thumbnail of what the item is, like is it an article or a video; there’s a column for estimated shares, first published date, etc,” said Mikkelson. “They’ve never given us any instructions on like, ‘please do the one with the most shares,’ or ‘do the most recent entry and work your way down,’ or whatever.”

In fact there’s no need to even use the dashboard that way at all.

“There’s no requirement that we undertake anything that’s in their database. If there’s something that isn’t in there, which honestly is most of what we do, we just add it,” Mikkelson said.

Passive partner or puppet master?

I asked whether there was any kind of pushback or interference at all from Facebook, as described by Brooks Binkowski in the Guardian story, who mentioned several such occasions that occurred during her time at Snopes.

Politifact’s Holan said she thought the suggestion was “very misleading.” In a statement, the organization said that “As with all our work, we decide what to fact-check and arrive at our conclusions without input from Facebook or any third party. Any claim suggesting otherwise is misinformed and baseless.”

“I realize Facebook’s reputation is kind of in the dumpster right now already,” Mikkelson said, “but this is damaging to all the fact-checking partners, including us. We would never have continued a working relationship with Facebook or any other partner that told us to couch fact checks in service of advertisers. It’s insulting to suggest.”

The question of receiving compensation for fact-checking was another of Binkowski’s qualms. On the one hand, it could be seen as a conflict of interest for Facebook to be paying for the service, since that opens all kinds of cans of worms — but on the other, it’s ridiculous to suggest this critical work can or should be done for free. Though at first, it was.

When the fact-checking team was first assembled in late 2016, Snopes wrote that it expects “to derive no direct financial benefit from this arrangement.” But eventually it did.

“When we published that, the partnership was in its earliest, embryonic stages — an experiment they’d like our help with,” Mikkelson said. Money “didn’t come up at all.” It wasn’t until the next year that Facebook mentioned paying fact checkers, though it hadn’t announced this publicly, and Snopes eventually did earn and disclose $100,000 coming from the company. Facebook had put bounties on high-profile political stories that were already on Snopes’s radar, as well as others in the fact-checking group.

The money came despite the fact that Snopes never asked for it or billed Facebook — a check arrived at the end of the year, he recalled, “with a note that said ‘vendor refuses to invoice.’ ”

Partners, but not pals

As for the mere concept of working for a company whose slippery methods and unlikeable leadership have been repeatedly pilloried over the last few years, it’s a legitimate concern. But Facebook is too important of a platform to ignore on account of ethical lapses by higher-ups who are not involved in the day-to-day fact-checking operation. Millions of people still look to Facebook for their news.

To abandon the company because (for instance) Sheryl Sandberg hired a dirty PR firm to sling mud at critics would be antithetical to the mission that drove these fact-checking companies to the platform to begin with. After all, it’s not like Facebook had a sterling reputation in 2016, either.

Both Politifact and Snopes indicated that their discontent with the company was more focused on the lack of transparency within the fact-checking program itself. The tools are basic and feedback is nil. Questions like the following have gone unanswered for years:

What constitutes falsity? What criteria should and shouldn’t be considered? How should satire be treated if it is spreading as if it were fact? What about state-sponsored propaganda and disinformation? Have other fact checkers looked at a given story, and could or should their judgments inform the other’s? What is the immediate effect of marking a story false — does it stop spreading? Is there pushback from the community? Is the outlet penalized in other ways? What about protesting an erroneous decision?

The problem with Facebook’s fact-checking operation, as so often is the case with this company, is a lack of transparency with both users and partners. The actual fact-checking happens outside Facebook, and rightly so; it’s not likely to be affected or compromised by the company, and in fact if it tried, it might find the whole thing blowing up in its face. But while the checking itself is tamper-resistant, it’s not clear at all what if any effect it’s having, and how it will be improved or implemented in the future. Surely that’s relevant to everyone with a stake in this process?

Over a year and a half or more of the program, little has been communicated and little has been changed, and that not fast enough. But at the same time, thousands of articles have been checked by experts who are used to having their work go largely unrewarded — and despite Facebook’s lack of transparency with them and us, it seems unlikely that that work has also been ineffective.

For years Facebook was a rat’s nest of trash content and systematically organized disinformation. In many ways, it still is, but an organized fact-checking campaign works like constant friction acting against the momentum of this heap. It’s not flashy and the work will never be done, but it’s no less important for all that.

As with so many other Facebook initiatives, we hear a lot of promises and seldom much in the way of results. The establishment of a group of third parties contributing independently to a fact-checking database was a good step, and it would be surprising to hear it has had no positive affect.

Users and partners deserve to know how it works, whether it’s working, and how it’s being changed. That information would disarm critics and hearten allies. If Facebook continues to defy these basic expectations, however, it only further justifies and intensifies the claims of its worst enemies.

How Russia’s online influence campaign engaged with millions for years

Russian efforts to influence U.S. politics and sway public opinion were consistent and, as far as engaging with target audiences, largely successful, according to a report from Oxford’s Computational Propaganda Project published today. Based on data provided to Congress by Facebook, Instagram, Google, and Twitter, the study paints a portrait of the years-long campaign that’s less than flattering to the companies.

The report, which you can read here, was published today but given to some outlets over the weekend, summarizes the work of the Internet Research Agency, Moscow’s online influence factory and troll farm. The data cover various periods for different companies, but 2016 and 2017 showed by far the most activity.

A clearer picture

If you’ve only checked into this narrative occasionally during the last couple years, the Comprop report is a great way to get a bird’s-eye view of the whole thing, with no “we take this very seriously” palaver interrupting the facts.

If you’ve been following the story closely, the value of the report is mostly in deriving specifics and some new statistics from the data, which Oxford researchers were provided some seven months ago for analysis. The numbers, predictably, all seem to be a bit higher or more damning than those provided by the companies themselves in their voluntary reports and carefully practiced testimony.

Previous estimates have focused on the rather nebulous metric of “encountering” or “seeing” IRA content put on these social metrics. This had the dual effect of increasing the affected number — to over a hundred million on Facebook alone — but “seeing” could easily be downplayed in importance; after all, how many things do you “see” on the internet every day?

The Oxford researchers better quantify the engagement, on Facebook first, with more specific and consequential numbers. For instance, in 2016 and 2017, nearly 30 million people on Facebook actually shared Russian propaganda content, with similar numbers of likes garnered, and millions of comments generated.

Note that these aren’t ads that Russian shell companies were paying to shove into your timeline — these were pages and groups with thousands of users on board who actively engaged with and spread posts, memes, and disinformation on captive news sites linked to by the propaganda accounts.

The content itself was, of course, carefully curated to touch on a number of divisive issues: immigration, gun control, race relations, and so on. Many different groups (i.e. black Americans, conservatives, Muslims, LGBT communities) were targeted all generated significant engagement, as this breakdown of the above stats shows:

Although the targeted communities were surprisingly diverse, the intent was highly focused: stoke partisan divisions, suppress left-leaning voters, and activate right-leaning ones.

Black voters in particular were a popular target across all platforms, and a great deal of content was posted both to keep racial tensions high and to interfere with their actual voting. Memes were posted suggesting followers withhold their votes, or deliberately incorrect instructions on how to vote. These efforts were among the most numerous and popular of the IRA’s campaign; it’s difficult to judge their effectiveness, but certainly they had reach.

Examples of posts targeting black Americans.

In a statement, Facebook said that it was cooperating with officials and that “Congress and the intelligence community are best placed to use the information we and others provide to determine the political motivations of actors like the Internet Research Agency.” It also noted that it has “made progress in helping prevent interference on our platforms during elections, strengthened our policies against voter suppression ahead of the 2018 midterms, and funded independent research on the impact of social media on democracy.”

Instagram on the rise

Based on the narrative thus far, one might expect that Facebook — being the focus for much of it — was the biggest platform for this propaganda, and that it would have peaked around the 2016 election, when the evident goal of helping Donald Trump get elected had been accomplished.

In fact Instagram was receiving as much or more content than Facebook, and it was being engaged with on a similar scale. Previous reports disclosed that around 120,000 IRA-related posts on Instagram had reached several million people in the run-up to the election. The Oxford researchers conclude, however, that 40 accounts received in total some 185 million likes and 4 million comments during the period covered by the data (2015-2017).

A partial explanation for these rather high numbers may be that, also counter to the most obvious narrative, IRA posting in fact increased following the election — for all platforms, but particularly on Instagram.

IRA-related Instagram posts jumped from an average of 2,611 per month in 2016 to 5,956 in 2017; note that the numbers don’t match the above table exactly because the time periods differ slightly.

Twitter posts, while extremely numerous, are quite steady at just under 60,000 per month, totaling around 73 million engagements over the period studied. To be perfectly frank this kind of voluminous bot and sock puppet activity is so commonplace on Twitter, and the company seems to have done so little to thwart it, that it hardly bears mentioning. But it was certainly there, and often reused existing bot nets that previously had chimed in on politics elsewhere and in other languages.

In a statement, Twitter said that it has “made significant strides since 2016 to counter manipulation of our service, including our release of additional data in October related to previously disclosed activities to enable further independent academic research and investigation.”

Google too is somewhat hard to find in the report, though not necessarily because it has a handle on Russian influence on its platforms. Oxford’s researchers complain that Google and YouTube have been not just stingy, but appear to have actively attempted to stymie analysis.

Google chose to supply the Senate committee with data in a non-machine-readable format. The evidence that the IRA had bought ads on Google was provided as images of ad text and in PDF format whose pages displayed copies of information previously organized in spreadsheets. This means that Google could have provided the useable ad text and spreadsheets—in a standard machine- readable file format, such as CSV or JSON, that would be useful to data scientists—but chose to turn them into images and PDFs as if the material would all be printed out on paper.

This forced the researchers to collect their own data via citations and mentions of YouTube content. As a consequence their conclusions are limited. Generally speaking when a tech company does this, it means that the data they could provide would tell a story they don’t want heard.

For instance, one interesting point brought up by a second report published today, by New Knowledge, concerns the 1,108 videos uploaded by IRA-linked accounts on YouTube. These videos, a Google statement explained, “were not targeted to the U.S. or to any particular sector of the U.S. population.”

In fact, all but a few dozen of these videos concerned police brutality and Black Lives Matter, which as you’ll recall were among the most popular topics on the other platforms. Seems reasonable to expect that this extremely narrow targeting would have been mentioned by YouTube in some way. Unfortunately it was left to be discovered by a third party and gives one an idea of just how far a statement from the company can be trusted.

Desperately seeking transparency

In its conclusion, the Oxford researchers — Philip N. Howard, Bharath Ganesh, and Dimitra Liotsiou — point out that although the Russian propaganda efforts were (and remain) disturbingly effective and well organized, the country is not alone in this.

“During 2016 and 2017 we saw significant efforts made by Russia to disrupt elections around the world, but also political parties in these countries spreading disinformation domestically,” they write. “In many democracies it is not even clear that spreading computational propaganda contravenes election laws.”

“It is, however, quite clear that the strategies and techniques used by government cyber troops have an impact,” the report continues, “and that their activities violate the norms of democratic practice… Social media have gone from being the natural infrastructure for sharing collective grievances and coordinating civic engagement, to being a computational tool for social control, manipulated by canny political consultants, and available to politicians in democracies and dictatorships alike.”

Predictably, even social networks’ moderation policies became targets for propagandizing.

Waiting on politicians is, as usual, something of a long shot, and the onus is squarely on the providers of social media and internet services to create an environment in which malicious actors are less likely to thrive.

Specifically, this means that these companies need to embrace researchers and watchdogs in good faith instead of freezing them out in order to protect some internal process or embarrassing misstep.

“Twitter used to provide researchers at major universities with access to several APIs, but has withdrawn this and provides so little information on the sampling of existing APIs that researchers increasingly question its utility for even basic social science,” the researchers point out. “Facebook provides an extremely limited API for the analysis of public pages, but no API for Instagram.” (And we’ve already heard what they think of Google’s submissions.)

If the companies exposed in this report truly take these issues seriously, as they tell us time and again, perhaps they should implement some of these suggestions.