Telling Stories about Algorithms: Difference between revisions

From TCU Wiki
No edit summary
No edit summary
Line 34: Line 34:


== Notes ==
== Notes ==
* What do we mean by algorithms.  
* What do we mean by algorithms. a process or set of rules to be followed in calculations or problem solving operations by a computer.  


a process or set of rules to be followed in calculations or problem solving operations by a computer.  
* It’s a set of steps for completing the process. a recipe is an algorithm.  


It’s a set of steps for completing the process. a recipe is an algorithm.  
* Algorithms exist in many places, and being more adopted into public systems. Algorithm are now being used for hiring positions. In these systems, the algorithm is part of it, but its not the only thing that needs to be investigated, interrogated or understood.  


Algorithms exist in many places, and being more adopted into public systems. Algorithm are now being used for hiring positions. In these systems, the algorithm is part of it, but its not the only thing that needs to be investigated, interrogated or understood.  
* Machine bias is not just happening in social media spaces, its happening in criminal justice systems, housing, social welfare systems etc.  


Machine bias is not just happening in social media spaces, its happening in criminal justice systems, housing, social welfare systems etc.  
* Algorithms are use to make decisions. They work in binaries. And they have impact in people’s lives. While in some cases they support humans to make decisions, it can shortail or take responsibility away from humans when they re making decisions.  


Algorithms are use to make decisions. They work in binaries. And they have impact in people’s lives.
* Machine learning model relies heavily on the data is used. Machine learning algorithms and systems fundamentally depend on good data - if you don't have good ingredients, you won't get good results,
While in some cases they support humans to make decisions, it can shortail or take responsibility away from humans when they re making decisions.
 
Machine learning model relies heavily on the data is used. Machine learning algorithms and systems fundamentally depend on good data - if you don't have good ingredients, you won't get good results,
no matter how good your recipe is  
no matter how good your recipe is  


Machine learning is the process of giving a computer a large set of data and asking it to make inferences about the data. The resulting classifier is then used to make predictions about the new data. Those predictions then trigger a decision about what should happen next.  
* Machine learning is the process of giving a computer a large set of data and asking it to make inferences about the data. The resulting classifier is then used to make predictions about the new data. Those predictions then trigger a decision about what should happen next.  
 
 Amazon for example trained their recruitment data on data about people who were already successful at the company - many of whom were white & male. So they ended up looking at similarities between how similar the applicants were to current employees, rather than predict who was talented
 
Amazon recruitment algorithm t backfired. It only thought of white men as the success model. and their new recruiting engine did not like women.
 
https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G


So the data really impacts the machine learning model.  
* Amazon for example trained their recruitment data on data about people who were already successful at the company - many of whom were white & male. So they ended up looking at similarities between how similar the applicants were to current employees, rather than predict who was talented


t's really hard to tell stories about algorithms precisely because they are black boxes. But we can still critically engage with them,for example by asking questions (what data has been fed into them?) and even by testing their "answers" by feeding very specific data to them
* Amazon recruitment algorithm t backfired. It only thought of white men as the success model. and their new recruiting engine did not like women. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G


"Data is a kinda pixelated version of what happens in the world"
* So the data really impacts the machine learning model.


* It's really hard to tell stories about algorithms precisely because they are black boxes. But we can still critically engage with them,for example by asking questions (what data has been fed into them?) and even by testing their "answers" by feeding very specific data to them


Algorithms can only replicate the patterns they have "learned". They cannot innovate. If you want the system to look for bias, you have tell it to look for bias.
* "Data is a kinda pixelated version of what happens in the world"


* Algorithms can only replicate the patterns they have "learned". They cannot innovate. If you want the system to look for bias, you have tell it to look for bias.


Understanding how they operate in general also helps us ask the most important questions, and even test their answers.  
* Understanding how they operate in general also helps us ask the most important questions, and even test their answers. Key components of algorithms.  
 
Key components of algorithms.  


a) use data to represent real world phenomena (operationalization)  
a) use data to represent real world phenomena (operationalization)  
Line 77: Line 68:


b) the formal algorithm  
b) the formal algorithm  
c) that data set used to train the algorithm
c) that data set used to train the algorithm
d) the validation method and accuracy of the algorithm
d) the validation method and accuracy of the algorithm
e) whether the decisions the algorithm makes have been tested for bias etc
e) whether the decisions the algorithm makes have been tested for bias etc


What we cannot know is precisely how the various data attributes (or features) are used by the algorithm to produce a decision.  
* What we cannot know is precisely how the various data attributes (or features) are used by the algorithm to produce a decision.  


What kids of formal algorithms are there?  
* What kids of formal algorithms are there? Two types supervised and unsupervised. Supervised require positive and negative examples of the phenomena they should make decisions about (labeled data)
- Two types supervised and unsupervised.  
Supervised require positive and negative examples of the phenomena they should make decisions about (labeled data)
Unsupervised generally cluster “unlabeled” data to help identify otherwise unrecognized patterns within the data.  
Unsupervised generally cluster “unlabeled” data to help identify otherwise unrecognized patterns within the data.  


Where did the data come from, is a big question to ask.  
* Where did the data come from, is a big question to ask. Machine learning requires large volumes of data, so it is almost always observational, eg collected from the real world. Important questions to ask  
Machine learning requires large volumes of data, so it is almost always observational, eg collected from the real world. Important questions to ask  
 
- Who collected it?  
- Who collected it?  
- what features does it include?
- what features does it include?
- how often is it updated?
- how often is it updated?
- has it been tested for representativeness
- has it been tested for representativeness


How as the classier validated?  
* How as the classier validated? This only applies to supervised learning methods, since those are the ones with a “ground truth” to compare to. But the validation can only be as good as the labeled data is to begin with. This also highlights the fact that “unsupervised” methods effectively cannot be validated.  
this only applies to supervised learning methods, since those are the ones with a “ground truth” to compare to. But the validation can only be as good as the labeled data is to begin with. This also highlights the fact that “unsupervised” methods effectively cannot be validated.  


What is artificial intelligence
* What is artificial intelligence


What is the best thing the digital rights community need to advocate to help curtail some of the damage of data machine systems.  
* What is the best thing the digital rights community need to advocate to help curtail some of the damage of data machine systems.  


Have the decisions been tested for inappropriate bias?  
* Have the decisions been tested for inappropriate bias?  
algorithms can only replicate the patters they have learned. Looking at the training data can help suggest whether inappropriate decisions bias is likely, but examine the data and/or formal algorithm is oftenh not possible.  
algorithms can only replicate the patters they have learned. Looking at the training data can help suggest whether inappropriate decisions bias is likely, but examine the data and/or formal algorithm is oftenh not possible.  


The best stories about algorithms have a human face. who is impacted and how? how have their lives been changed? perhaps most importantly, does their experience and/or the reality of the algorithm, violate important rules  or social norms that should make us reconsider using algorithms to solve certain types of problems at all.  
* The best stories about algorithms have a human face. who is impacted and how? how have their lives been changed? perhaps most importantly, does their experience and/or the reality of the algorithm, violate important rules  or social norms that should make us reconsider using algorithms to solve certain types of problems at all.  


You cannot only optimize an algorithm for one thing! We can include many things in our decision makings but algorithmic systems are opaque systems, they cannot control how significantly the system weights one factor or another.  
* You cannot only optimize an algorithm for one thing! We can include many things in our decision makings but algorithmic systems are opaque systems, they cannot control how significantly the system weights one factor or another.  


You need to understand the damage or abuse a tool can do. The ways a tool is failing is never visible to the creator. When we see crappy things happening, usually that’s the result of adequate imagination and diversity when it comes to the data and algorithm that is put together.  
* You need to understand the damage or abuse a tool can do. The ways a tool is failing is never visible to the creator. When we see crappy things happening, usually that’s the result of adequate imagination and diversity when it comes to the data and algorithm that is put together.  


Google deep dream is a good tool to show students how algorithm biases work.  
* Google deep dream is a good tool to show students how algorithm biases work.  


Algorithms on social media are inputs from both users of the platforms and also the platforms. Most of the data is coming from users. the stuff that is going into that data is what you click on, what are you watching, how long. The level of detail the systems can capture is all being recorded and interpreted. What do you want the system to do with that stuff? The algorithms for social media feeds are unsupervised algorithms. There is so much information that it would be hard for humans to make decisions at scale. What are they looking at as indicators that the algorithms are working. More than likely, making $$ will be the priority.  
* Algorithms on social media are inputs from both users of the platforms and also the platforms. Most of the data is coming from users. the stuff that is going into that data is what you click on, what are you watching, how long. The level of detail the systems can capture is all being recorded and interpreted. What do you want the system to do with that stuff? The algorithms for social media feeds are unsupervised algorithms. There is so much information that it would be hard for humans to make decisions at scale. What are they looking at as indicators that the algorithms are working. More than likely, making $$ will be the priority.  


So the scientist are constantly doing trial and error, pushing certain areas and buttons, and seeing what is happening. It’s a very erratic process.  If 90% of your income comes from advertising, then you are advertising company.  (Facebook, etc) So it is to their interest that engagement happens, even if it’s via flame wars.
* So the scientist are constantly doing trial and error, pushing certain areas and buttons, and seeing what is happening. It’s a very erratic process.  If 90% of your income comes from advertising, then you are advertising company.  (Facebook, etc) So it is to their interest that engagement happens, even if it’s via flame wars.

Revision as of 15:48, 10 November 2020

Workshop: Telling Stories about Algorithms

Who: Susan McGregor

Date: Monday, November 9th

Time: 1:00pm EST / 6pm UTC+0

Language: English

RSVP here

Join Susan McGregor for a special storytime about algorithms. We talk a lot about algorithms without always having a good understanding of the forces and elements that shape them, considering the impact they are having on our society. By joining this workshop, learn:

  • What algorithms actually do and the components that make them up.
  • Why we need to do research on and report about them (and how to do effectively)
  • How they impact our decisions on a daily basis and can perpetuate discrimination

Susan McGregor is an Associate Research Scholar at Columbia University's Data Science Institute, where she also co-chairs its Center for Data, Media & Society. McGregor's research is centered on security and privacy issues affecting journalists and media organizations. Her books, Information Security Essentials: A Guide for Reporters, Editors and Newsroom Leaders and Practical Python: Data Wrangling and Data Quality will be out in 2021.

// We will be hosting a 25 minute post-workshop networking exercise to allow folks to meet others who share their interest, and strengthen collaborations across various lines. Make sure to schedule in 25 minutes extra on your calendar, if you are interested in joining //

>> Check out notes from other sessions here

Notes

  • What do we mean by algorithms. a process or set of rules to be followed in calculations or problem solving operations by a computer.
  • It’s a set of steps for completing the process. a recipe is an algorithm.
  • Algorithms exist in many places, and being more adopted into public systems. Algorithm are now being used for hiring positions. In these systems, the algorithm is part of it, but its not the only thing that needs to be investigated, interrogated or understood.
  • Machine bias is not just happening in social media spaces, its happening in criminal justice systems, housing, social welfare systems etc.
  • Algorithms are use to make decisions. They work in binaries. And they have impact in people’s lives. While in some cases they support humans to make decisions, it can shortail or take responsibility away from humans when they re making decisions.
  • Machine learning model relies heavily on the data is used. Machine learning algorithms and systems fundamentally depend on good data - if you don't have good ingredients, you won't get good results,

no matter how good your recipe is

  • Machine learning is the process of giving a computer a large set of data and asking it to make inferences about the data. The resulting classifier is then used to make predictions about the new data. Those predictions then trigger a decision about what should happen next.
  • Amazon for example trained their recruitment data on data about people who were already successful at the company - many of whom were white & male. So they ended up looking at similarities between how similar the applicants were to current employees, rather than predict who was talented
  • So the data really impacts the machine learning model.
  • It's really hard to tell stories about algorithms precisely because they are black boxes. But we can still critically engage with them,for example by asking questions (what data has been fed into them?) and even by testing their "answers" by feeding very specific data to them
  • "Data is a kinda pixelated version of what happens in the world"
  • Algorithms can only replicate the patterns they have "learned". They cannot innovate. If you want the system to look for bias, you have tell it to look for bias.
  • Understanding how they operate in general also helps us ask the most important questions, and even test their answers. Key components of algorithms.

a) use data to represent real world phenomena (operationalization) but its inferring what it thinks the real world is. ie, its making guesses, its not really showing us the real picture or truth, its using the data to make guesses. Very often machine learning systems are used as though they can predict the future, but it can’t. Its only making assumptions or guesses based on the data it has. How the data sees us is different than what the truth is. That data driven understanding is an impressionistic one. That gap is where we start to see real problems with these systems.

b) the formal algorithm

c) that data set used to train the algorithm

d) the validation method and accuracy of the algorithm

e) whether the decisions the algorithm makes have been tested for bias etc

  • What we cannot know is precisely how the various data attributes (or features) are used by the algorithm to produce a decision.
  • What kids of formal algorithms are there? Two types supervised and unsupervised. Supervised require positive and negative examples of the phenomena they should make decisions about (labeled data)

Unsupervised generally cluster “unlabeled” data to help identify otherwise unrecognized patterns within the data.

  • Where did the data come from, is a big question to ask. Machine learning requires large volumes of data, so it is almost always observational, eg collected from the real world. Important questions to ask

- Who collected it?

- what features does it include?

- how often is it updated?

- has it been tested for representativeness

  • How as the classier validated? This only applies to supervised learning methods, since those are the ones with a “ground truth” to compare to. But the validation can only be as good as the labeled data is to begin with. This also highlights the fact that “unsupervised” methods effectively cannot be validated.
  • What is artificial intelligence
  • What is the best thing the digital rights community need to advocate to help curtail some of the damage of data machine systems.
  • Have the decisions been tested for inappropriate bias?

algorithms can only replicate the patters they have learned. Looking at the training data can help suggest whether inappropriate decisions bias is likely, but examine the data and/or formal algorithm is oftenh not possible.

  • The best stories about algorithms have a human face. who is impacted and how? how have their lives been changed? perhaps most importantly, does their experience and/or the reality of the algorithm, violate important rules or social norms that should make us reconsider using algorithms to solve certain types of problems at all.
  • You cannot only optimize an algorithm for one thing! We can include many things in our decision makings but algorithmic systems are opaque systems, they cannot control how significantly the system weights one factor or another.
  • You need to understand the damage or abuse a tool can do. The ways a tool is failing is never visible to the creator. When we see crappy things happening, usually that’s the result of adequate imagination and diversity when it comes to the data and algorithm that is put together.
  • Google deep dream is a good tool to show students how algorithm biases work.
  • Algorithms on social media are inputs from both users of the platforms and also the platforms. Most of the data is coming from users. the stuff that is going into that data is what you click on, what are you watching, how long. The level of detail the systems can capture is all being recorded and interpreted. What do you want the system to do with that stuff? The algorithms for social media feeds are unsupervised algorithms. There is so much information that it would be hard for humans to make decisions at scale. What are they looking at as indicators that the algorithms are working. More than likely, making $$ will be the priority.
  • So the scientist are constantly doing trial and error, pushing certain areas and buttons, and seeing what is happening. It’s a very erratic process. If 90% of your income comes from advertising, then you are advertising company. (Facebook, etc) So it is to their interest that engagement happens, even if it’s via flame wars.