Seed word in media classes

Top n texts (according to DIN) with seed word in each class

Note: Number of items in tables depends on both the top-N value as well as the rate of co-occurrence.

Number of texts containing KW (within classes)

Note: KWs occurring only once (in one text) were removed from this table.

KWs unique to media classes

DIN values of KWs unique for media classes identified against the backdrop of the rest of the WebMedia corpus.

KWs in media classes

This tab does not take into consideration the settings in the sidebar menu.

KWs in the table appear most frequently in the picked combination of media types. The numbers represent relative number of texts in a given media type in which a KW apears.

Differential Market Basket Analysis

This tab also does not take into consideration the settings in the sidebar menu.

Market Basket analysis (see e.g. here) is a method for detecting co-occurrences of items in bins. In our case, the bins (or baskets) are texts and items are their KWs (each text is represented by its list of KWs). Association is described by rules in the form A => B (which can be translated as follows: if a text contains KW A it will probably contain also KW B). Strength of the rules is meassured by three main variables – support, confidence and lift:

  • support: Fraction of texts that contain both KWs A and B,
  • confidence: How often the KW B appears in texts that contain the KW A,
  • lift: How much our confidence has increased that KW B will be present in a text given that KW A is already in the text.

Media classes can be described by comparing the rules they contain. We can either look at rules which have the same consequents (the righ hand side of the rule, hence rhs) and different antecedents (left hand side member of the rule, hence lhs) or rules which share the antecedents and differ in consequents, or at rules which are common to both selected media classes. These three perspectives are represented by tables A, B and C.

A. Same consequents (rhs), different antecedents (lhs)

B. Same antecedents (lhs), different consequents (rhs)

C. Rules common to both classes

Find texts containing KWs

Insert comma-separated KWs (at least 2) and select media class.

Network of KWs in ALs