Think of a 3D array as a Rubik's cube. If you look at it straight on, you have three columns and three rows, that's X & Y. Now move your head a little and you see there are more elements behind that front face, that is Z. So 0,0,0 is that first cube at the top left. 0,0,1 is the cube behind that first one. 0,0,2 is the last one, in Z, for that top left position.
Your tilemap addresses the same as the array in X&Y. Z let's you store as many different data items as needed in that stack behind that front face.
There are expressions on the tile map to convert from a screen coordinate to a tile grid position, so you can translate any position to a tile, and any tile to a position.