Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dealing with CSV files #19

Open
Nichtraucher opened this issue Aug 16, 2023 · 8 comments
Open

dealing with CSV files #19

Nichtraucher opened this issue Aug 16, 2023 · 8 comments

Comments

@Nichtraucher
Copy link

Nichtraucher commented Aug 16, 2023

Hello there,

I'd like to use Local NLP Backend but it requires imported data as a CSV-file instead of a database-file. I can download the opencellid dataset as a CSV-file directly from their website but I can't unselect the useless 3G data and I can't edit the file manually as the spreadsheet editor complains about the oversize. :-/

I noticed one can sign up for a free geolocation API at unwired. Could FastLacellsGenerator be amended to access this data? I have no idea how they provide their datasets...

cheers

@IzzySoft
Copy link
Contributor

You can define via the config which data to accept:

RADIO="GSM|UMTS|LTE"

Would excluding UMTS help in your case? Then the resulting database would just have GSM and LTE cells.

Not sure which API exactly you mean, but according to this function, some unwired server is already used to obtain the OCI data. And via the config, you can filter that pretty well.

@Nichtraucher
Copy link
Author

Would excluding UMTS help in your case?
You can define via the config which data to accept:

I know how to do that, but that's not the point. The above-mentioned backend requires the data as a CSV-file and not as a database-file. lacells-creator does create a CSV-file, but it doesn't support choosing the network type and the same goes for downloading the data set directly from the opencellid website. Deleting unnecessary network type fields in a CSV file manually isn't possible because spreadsheet editors can't handle spreadsheets of this size. Can FastLacellsGenerator be amended to let users choose between creating a CSV-file and a database-file?

Not sure which API exactly you mean
some unwired server is already used to obtain the OCI data

Unwired labs offers two geolocation products/API's with different datasets. The OpenCellid API gives access to data that is somehow community sourced, whereas the UnwiredLabs API provides access to a proprietary dataset. As far as I understand, the limitations for end-users are the same as those for the OpenCellid API (I'm not sure though. Their website lacks some information). Both datasets are probably loaded from the same server.

@IzzySoft
Copy link
Contributor

Can FastLacellsGenerator be amended to let users choose between creating a CSV-file and a database-file?

Should be possible. Would need someone to make the efforts, though. Basically, the filtered *.csv are stored at least temporarily. So all that would be needed is another if-then-else to either import them to SQLite and delete them afterwards, or simply keep them (moving them from their temp location to a final one) – see the end of the flg script file.

Unwired labs offers two geolocation products/API's with different datasets.

I unfortunately don't know anything about that second set, sorry.

@sobrus
Copy link
Owner

sobrus commented Aug 17, 2023

If you only need to filter CSV file, you can easily do it using wget, cat and grep (just like flg does in data download step).

cat input.csv | egrep "^(UMTS)," > output_file.csv

(haven't tested it but it should be something like this)

Or, maybe oven better, just export the output sqlite database to CSV file:

https://www.sqlitetutorial.net/sqlite-export-csv/

Here you can also select fields and field order that is expected by LocalNLP Backend.

@Nichtraucher
Copy link
Author

Well, I didn't read the annotations in the config file properly and noticed just now that the script can be set to keep the csv file in the tmp-folder. Sorry about that! :-/

However, importing these files into the app doesn't work because they're missing the required column titles for the parameters (radio, mcc, mnc etc.) . I've opened an issue about it.

I noticed that the script does insert them into the database-file, correct? Perhaps it can also put them into the csv-file?

@Nichtraucher Nichtraucher changed the title Unwired Labs API / dealing with CSV files dealing with CSV files Aug 20, 2023
@IzzySoft
Copy link
Contributor

I noticed that the script does insert them into the database-file, correct?

In the database, the columns do exist (thanks to the CREATE TABLE), so you have just to have the INSERT statement set appropriately. As for the CSV files, you could simply add the proper line at top. As the CSV files are not intended to be kept, but just a temporary means of being fed to the database, that is intentionally not done here or it would break the INSERT – and once the INSERT is done, the CSV is to be removed anyway.

But yes, that could probably be changed. Going by the referenced issue:

echo "radio,mcc,net,area,cell,unit,lon,lat,range,samples,changeable,created,updated,averageSignal" > ${OCI_FILE}.new
cat ${OCI_FILE} >> ${OCI_FILE}.new

and you'd have a valid CSV for your purpose. Put that before the rm at the end of the script and move the resulting file where you want it to be (or replace ${OCI_FILE}.new accordingly). Though if I understand you correctly, you then wouldn't need the database file at all – which means the "proper implementation" would be to make that optional, too.

@Nichtraucher
Copy link
Author

Though if I understand you correctly, you then wouldn't need the database file at all – which means the "proper implementation" would be to make that optional, too.

Correct.

In the meantime, the backend developer amended the backend to accept CSV-files without the headers.
I'm not sure if this issue needs to be pursued?

@IzzySoft
Copy link
Contributor

IzzySoft commented Sep 1, 2023

I don't know – I'm just a minor contributor here. If there is need/demand for it, why not implement it? It wouldn't be a huge task (I just don't have the time for it now). @sobrus needs to say if it's accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants