Information-optimal Abstaining for Reliable Classification of Building Functions

In this paper, we analyze the situation of text mining in extremely noisy spatial datasets like when trying to map social media posts to aspects of the physical world.

We analyze from a machine learning perspective, whether a large Twitter sample could be used to assign building functions to individual buildings. In a nutshell, we assign each tweet from our sample to the nearest building from OpenStreetMap exploiting our high-performance implementation as described in (missing reference).

The setting is extremely ill-posed for many reasons. The most pressing ones are

tweets are not necessarily geolocated where they originate (fake location, inaccurate location, etc.)
the content of tweets is rarely related to the surroundings of the origin
the labels are incomplete and have significant overlap. Aside residential and commercial buildings, there are mixed buildings, industrial buildings and many more.

Therefore, we expect that only a very small fraction of these messages is valuable. The question is how to find these few, but powerful messages.

We successfully apply a technique based on information theory known as information-optimal abstaining. The paper is as preproducible as possible including synthetic data generated from mixing up movie reviews (English) with two strongly overlapping corpi Faust and Dr. Faustus in German language.

Contact

Professur für
Big Geospatial Data Management

Lise-Meitner-Str. 9
85521 Ottobrunn
martin.werner@tum.de
Getting to us...

News

September 22, 2025 - News

The AI4Soil Workshop on Opportunities of AI in Soil Science...

September 3, 2025 - News

Carla Rieger presents at IEEE Quantum Week in Albuquerque, US

August 7, 2025 - News

Meet us at IGARRS 2025 in Brisbane and learn about Multimodal Image Geolocalization

August 5, 2025 - News

Our PhD student Carla Rieger gives a talk on Quantum Machine Learning at Deutsches Museum Munich

August 1, 2025 - News

Meet us at ACM SIGSPATIAL 2025

Show all...

Information-optimal Abstaining for Reliable Classification of Building Functions

Related Resources

Contact

News