Ron Itelman
6 min readNov 2, 2018

The defining paper on this topic from Microsoft Research Machine Teaching Group:

Our Objective: Manually define a knowledge snippet using JSON-AI

This document, the Medium article, contains the machine teaching pdf and the following paragraph is from that PDF. This paragraph is going to be marked as a knowledge snippet.

Definition 2.1 (Machine learning research) Machine Learning research aims at making the learner better by improving ML algorithms. This field covers, for instance, any new variations or breakthroughs in deep learning, unsupervised learning, recurrent networks, convex optimization, and so on. Conversely, we see version control, concept decomposition, semantic data exploration, expressiveness of the teaching language, interpretability of the model, and productivity as having more in common with programming and human-computer interaction than with machine learning. These “machine teaching” concepts, however, are extraordinarily important to any practitioners of machine learning. Hence, we define a discipline aimed at improving these concept as:

This is the JSON-AI version of how to write this knowledge snippet.

{
"*.@i.story": {
"origin": {
"epoch t1": {
"req": {
"timestamp": "Nov 1, 2018"
},
"document": {
"*.Intelligence.AI": {
"url": "https://medium.com/p/98fcc01a4f79",
"knowledge":{
"snippet 1": {
"url":"https://arxiv.org/pdf/1707.06742.pdf",
"knowledge": {
"concepts": {
"Machine Learning Research",
"machine teaching concepts"
}
"paragraph_1": {
"heading":"Definition 2.1",
"sentences":[
"Definition 2.1 (Machine learning research) Machine Learning research aims at making the learner better by improving ML algorithms.", "This field covers, for instance, any new variations or breakthroughs in deep learning, unsupervised learning, recurrent networks, convex optimization, and so on.", "Conversely, we see version control, concept decomposition, semantic data exploration, expressiveness of the teaching language, interpretability of the model, and productivity as having more in common with programming and human-computer interaction than with machine learning.", "These \“machine teaching\” concepts, however, are extraordinarily important to any practitioners of machine learning. Hence, we define a discipline aimed at improving these concept as:"
],
}
}
},
}
},
}
}
],
}
}
}
}

“@i.magic.learn.concepts”

Machine Teaching Research

Definition 2.2 (Machine teaching research) Machine teaching research aims at making the teacher more productive at building machine learning models. We have chosen these definitions to minimize the intersection between the two fields and thus provide clarity and scoping. The two disciplines are complementary and can evolve independently. Of course, like any generalization, there are limitations. Curriculum learning (Bengio et al., 2009), for instance, could be seen as belonging squarely in the intersection because it involves both a learning algorithm and teacher behavior. Nevertheless, we have found these definitions useful to decide what to work on and what not to work on

Concept

Definition 4.1 (Concept) A concept is a mapping from any example to a label value. For example, the concept of a recipe web page can be represented by a function that returns zero or one, based on whether a web page contains a cooking recipe. In another example, an address concept can be represented by a function that, given a document, returns a list of token ranges, each labeled “address”, “street”, “zip”, “state”, etc. Label values for a binary concept could be “Is” and “Is Not”. We may also allow a “Undecided” label which allows a teacher to postpone labeling decisions or ignore ambiguous examples. Postponing a decision is important because the concept may be evolving in the teacher’s head. An example of this is in (Kulesza et al., 2014).

Feature

Definition 4.2 (Feature) A feature is a concept that assigns each example a scalar value. We usually use feature to denote a concept when emphasizing its use in a machine learning model. For example, the concept corresponding to the presence or absence of the word “recipe” in text examples might be a useful feature when teaching the recipe concept.

Teacher

Definition 4.3 (Teacher) A teacher is the person who transfers concept knowledge to a learning machine. To clarify this definition of a teacher, the methods of knowledge transfer need to be defined. At this point, they include a) example selection (biased), b) labeling, c) schema definition (relationship between labels), d) featuring, and e) concept decomposition (where features are recursively defined as sub-models). The teachers are expected to make mistakes in all the forms of knowledge transfer. These teaching “bugs” are common occurrences.

Selection

Definition 4.4 (Selection) Selection is the process by which teachers gain access to an example that exemplifies useful aspects of a concept. Teachers can select specific examples by filtering the set of unlabeled examples. By choosing these filters deliberately, they can systematically explore the space and discover information relevant to concepts. For example, a teacher may discover insect recipes while building a recipe classifier by issuing a query on “source of proteins”. We note that uniform sampling and uncertainty sampling, which have no explicit input from a teacher, are likely of little use for discovering rare clusters of positive examples. Combinations of semantic filters involving trained models are even more powerful (e.g., “nutrition proteins” and low score with current classifier). This ability to find examples containing useful aspects of a concept enables the teacher to find useful features and provide the labels to train them. Furthermore, the selection choices themselves can be valuable documentation of the teaching process.

Label

Definition 4.5 (Label) A label is a (example, concept value) pair created by a teacher in relation to a concept. Teachers can provide labels by “looking at a column” in Figure 1. It is important to realize that the teachers do not know which programs are running in their heads when they evaluate the target concept values. If they knew the programs, they would transfer their knowledge in programmatic form to the machine and would not need machine learning. Teachers instead look at the available data of an example and “divine” its label. They do this by unconsciously evaluating sub-features and combining them to make labeling decisions. The feature spaces and the combination functions available to the teachers are beyond what is available through the training sets. This power is what makes the teachers valuable for the purpose of creating labels.

Schema

Definition 4.6 (Schema) A schema is a relationship graph between concepts. When multiple concepts are involved, a teacher can express relationship between them. For instance, the teacher could express that the concepts “Tennis” and “Soccer” are mutually exclusive, or that concept “Tennis” implies the concept “Sport”. These concept constraints are relationships between lines on the diagram (true across all examples). Separating knowledge captured by the schema from the knowledge captured by the labels allows information to be conveyed and edited at a high level. The implied labels can be changed simply by changing the concept relationship. For instance, “Golf” could be moved from being a sub-concept of “Sport” to being mutually exclusive or vice versa. Teachers can understand and change the semantics of a concept by reviewing its schema. Semantic decisions can be reversed without editing individual labels.

Generic Feature

Definition 4.7 (Generic feature) A generic feature is a set of related feature functions. Generic features are created by engineers in parametrizable form, and teachers instantiate individual features by providing useful and semantic parameters. For instance, a generic feature could be: “Log(1 + number of instances of words in list X in a document)” and an instantiation would be setting X to a list of car brands (useful for an automotive classifier).

Decomposition

Definition 4.8 (Decomposition) Decomposition is the act of using simpler concepts to express more complex ones. Whereas teachers do not have direct access to the program implementing their concept, they sometimes can infer how these programs work. Socrates used to teach by asking the right questions. The “right question” is akin to providing a useful sub-concept, whose value makes evaluating the overall concept easier. In other words, Socrates was teaching by decomposition rather than by examples. This ability is not equally available to teachers. It is learned. It is essential to scaling with complexity and with the number of teachers. It is the same ability that helps software engineers decompose functions into sub-functions. Software engineers also acquire this ability with experience. As in programming, teaching decompositions are not unique (in software engineering, switching from one decomposition to another is called refactoring).

“@i.magic.end”

Ron Itelman
Ron Itelman

Written by Ron Itelman

I like human + machine learning systems | Principal — Digital, Data & Analytics at Clarkston Consulting

No responses yet