6.2.1 Matrix Darstellung auf LaTeX umgestellt
This commit is contained in:
@@ -27,8 +27,8 @@
|
||||
"id": "29c253cc-3060-4da3-bc9b-1aa5f5874db1",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-12-15T13:56:24.782957150Z",
|
||||
"start_time": "2025-12-15T13:56:24.757679243Z"
|
||||
"end_time": "2025-12-15T14:01:01.846362881Z",
|
||||
"start_time": "2025-12-15T14:01:01.832960844Z"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
@@ -42,7 +42,7 @@
|
||||
" r\"./Sherlock_card.txt\"]"
|
||||
],
|
||||
"outputs": [],
|
||||
"execution_count": 1
|
||||
"execution_count": 8
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
@@ -57,8 +57,8 @@
|
||||
"id": "dee15bba-e43b-4e4b-b1a2-d798411820cb",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-12-15T13:56:24.879680892Z",
|
||||
"start_time": "2025-12-15T13:56:24.784572818Z"
|
||||
"end_time": "2025-12-15T14:01:01.904231100Z",
|
||||
"start_time": "2025-12-15T14:01:01.872546121Z"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
@@ -66,7 +66,7 @@
|
||||
"vectorizer = CountVectorizer(input=\"filename\")"
|
||||
],
|
||||
"outputs": [],
|
||||
"execution_count": 2
|
||||
"execution_count": 9
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
@@ -84,8 +84,8 @@
|
||||
"id": "f94a5742-9b26-40ff-a093-b5f0f0bce12f",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-12-15T13:56:25.090650734Z",
|
||||
"start_time": "2025-12-15T13:56:24.882138584Z"
|
||||
"end_time": "2025-12-15T14:01:02.160184470Z",
|
||||
"start_time": "2025-12-15T14:01:01.928765726Z"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
@@ -9017,7 +9017,7 @@
|
||||
]
|
||||
}
|
||||
],
|
||||
"execution_count": 3
|
||||
"execution_count": 10
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
@@ -9028,17 +9028,23 @@
|
||||
"Es werden über alle Dokumente hinweg alle einzigartigen Features, in diesem Fall die unterschiedlichen Wörter gezählt. Dies gibt uns die `bag_of_words`.\n",
|
||||
"\n",
|
||||
"Der Aufbau der Matrix ist wie folgt\n",
|
||||
"```\n",
|
||||
" -> axis 1\n",
|
||||
"V - axis 0 - V\n",
|
||||
"\n",
|
||||
"| | Wort1 | Wort2 | Wort3 | ... |\n",
|
||||
"|------|-------|-------|-------|-----|\n",
|
||||
"| Dok1 | | | | |\n",
|
||||
"| Dok2 | | | | |\n",
|
||||
"| ... | | | | |\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"\\begin{array}{cc}\n",
|
||||
" % Obere Zeile: Leerecke links, Pfeil für Axis 1 rechts\n",
|
||||
" & \\xrightarrow{\\hspace{1cm} \\text{axis 1 (Wörter)} \\hspace{1cm}} \\\\\n",
|
||||
" % Untere Zeile: Pfeil für Axis 0 links, Matrix rechts\n",
|
||||
" \\begin{array}{c} \\text{axis 0} \\\\ \\text{(Docs)} \\\\ \\downarrow \\end{array} &\n",
|
||||
" \\begin{array}{|c|cccc|}\n",
|
||||
" \\hline\n",
|
||||
" & \\text{Wort}_1 & \\text{Wort}_2 & \\text{Wort}_3 & \\dots \\\\\n",
|
||||
" \\hline\n",
|
||||
" \\text{Dok}_1 & n_{1,1} & n_{1,2} & n_{1,3} & \\dots \\\\\n",
|
||||
" \\text{Dok}_2 & n_{2,1} & n_{2,2} & n_{2,3} & \\dots \\\\\n",
|
||||
" \\vdots & \\vdots & \\vdots & \\vdots & \\ddots \\\\\n",
|
||||
" \\hline\n",
|
||||
" \\end{array}\n",
|
||||
"\\end{array}\n",
|
||||
"$$\n",
|
||||
"#### Gesamtzahl der (einzigartigen) Wörter\n",
|
||||
"Die Anzahl der Spalten gibt uns die Anzahl der Wörter über alle Texte hinweg, da jede Spalte ein Wort darstellt, d.h. ein Wort wird nicht doppelt gelistet werden.\n",
|
||||
"So erhalten wir eine Anzahl von **8879** Wörtern, die in den Texten vorkommen.\n",
|
||||
@@ -9094,8 +9100,8 @@
|
||||
"id": "a43e2e80",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-12-15T13:56:25.132343345Z",
|
||||
"start_time": "2025-12-15T13:56:25.106818886Z"
|
||||
"end_time": "2025-12-15T14:01:02.198425286Z",
|
||||
"start_time": "2025-12-15T14:01:02.167549542Z"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
@@ -9123,7 +9129,7 @@
|
||||
]
|
||||
}
|
||||
],
|
||||
"execution_count": 4
|
||||
"execution_count": 11
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
@@ -9156,8 +9162,8 @@
|
||||
"id": "b0de993a-7aad-4126-938d-86bc4bd26d8e",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-12-15T13:56:26.904835204Z",
|
||||
"start_time": "2025-12-15T13:56:25.135153350Z"
|
||||
"end_time": "2025-12-15T14:01:04.058166213Z",
|
||||
"start_time": "2025-12-15T14:01:02.203172264Z"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
@@ -9223,7 +9229,7 @@
|
||||
]
|
||||
}
|
||||
],
|
||||
"execution_count": 5
|
||||
"execution_count": 12
|
||||
},
|
||||
{
|
||||
"metadata": {},
|
||||
@@ -9317,8 +9323,8 @@
|
||||
{
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-12-15T13:56:26.992285254Z",
|
||||
"start_time": "2025-12-15T13:56:26.965360172Z"
|
||||
"end_time": "2025-12-15T14:01:04.103470898Z",
|
||||
"start_time": "2025-12-15T14:01:04.073400774Z"
|
||||
}
|
||||
},
|
||||
"cell_type": "code",
|
||||
@@ -9399,7 +9405,7 @@
|
||||
]
|
||||
}
|
||||
],
|
||||
"execution_count": 6
|
||||
"execution_count": 13
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
@@ -9414,8 +9420,8 @@
|
||||
"id": "1cff3622-5a62-49bb-903f-4f98d9b044fb",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-12-15T13:56:27.029415899Z",
|
||||
"start_time": "2025-12-15T13:56:26.995416559Z"
|
||||
"end_time": "2025-12-15T14:01:04.139311264Z",
|
||||
"start_time": "2025-12-15T14:01:04.107288791Z"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
@@ -9461,7 +9467,7 @@
|
||||
]
|
||||
}
|
||||
],
|
||||
"execution_count": 7
|
||||
"execution_count": 14
|
||||
},
|
||||
{
|
||||
"metadata": {},
|
||||
Reference in New Issue
Block a user