!! Assistance needed: Create ML - Scan file with multiple tables on it

Objective:

I am in the process of developing an application that utilizes machine learning (Core ML) to interact with photographs of documents, specifically focusing on those containing tables.

Step 1: Capturing the Image

The application will initiate by allowing users to take photos of documents. The key here is not just any part of the document, but specifically the sections where tables are present.

Step 2: Image Analysis through Machine Learning

Upon capturing the image, the next phase involves a machine learning model. Using Apple's Create ML tool with Swift, the application will analyze the image. The model's task is two-fold:

  1. Identifying the Table: Distinguish the table from other document information, ensuring it recognizes and isolates the table structure within the photograph.
  2. Ignoring Irrelevant Information: Concurrently, the model will disregard all non-table content, focusing the application's resources on the table data.

Step 3: Data Extraction and Training

Once the table is identified, the real work begins. The application will engage in detailed scrutiny, where it's trained to understand and recognize row and column data based on specific datasets. This training will enable the application to 'read' the table accurately, much like a human would, by identifying the organization of information into rows and columns.

Step 4: Information Storage

Post-analysis, the application will extract this critical data, storing it in a structured format. Each piece of identifiable information from the rows and columns will be systematically organized into a Dictionary or an Object. This structure is not just for immediate use but also efficient for future data operations within the app.

Conclusion:

Through these sequential steps, the application transitions from merely capturing an image to intelligently recognizing, deciphering, and storing table data from within a physical document. This streamlined process is all courtesy of integrating machine learning into the app's functionality, promising significant efficiency and accuracy in data handling.


Realistically, I have not found any good examples out there so I am attempting to create my own ML (with no experience 😅), so any guidance or help would be very much appreciated.

Post not yet marked as solved Up vote post of deadduck83 Down vote post of deadduck83
495 views

Replies

This sounds like a really interesting problem! A resource that you may find very useful in this exploration is this talk given at WWDC in 2021 on extracting document data using Vision. This may help you solve the first 3 steps that you propose.