Talk:Toolhub/Data model/Feedback

Latest comment: 1 year ago by TBurmeister (WMF) in topic Taxonomy v2 community feedback

Taxonomy v2 community feedback edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section.

Please provide your feedback using the following question prompts. Use the last question to tell us anything else you think we might need to know or provide extra responses on questions about specific categories
When providing feedback, please consider letting us know if you belong to any of these audiences:

  • Admins
  • Organizers and program coordinators
  • Editors and content contributors
  • Readers and content consumers
  • Researchers
  • Developers
  • Other

Do these categories (attributes) make sense? edit

One of the main problems I found when using Toolhub for the first time is that I needed to know the name of the tool to find it easily. So virtually any taxonomy will make the experience better. I have looked at the proposal and it looks quite complete. I can't find something to add, but I'm sure that future use could make us find some gaps. Thanks for all this work! -Theklan (talk) 08:37, 3 August 2022 (UTC)Reply

Thanks for your comment! I do hope that in the future we can do some user testing with the taxonomy and other elements in the Toolhub UI, to continue to further refine what is there. TBurmeister (WMF) (talk) 15:28, 3 August 2022 (UTC)Reply
i am confident is difficult to use tho tool without knowing in advance the name of he proper tools to employ.. Luisloureiro (talk) 18:07, 11 August 2022 (UTC)Reply

It would be good to have higher-kinded categories, I think. For instance, I want to see only data tools that work with entity data only in Wikidata (not Discussions, Images, Files, etc.). Perhaps Structured data, Linguistic data, Maps and geographic data, Books and bibliographic data, and maybe/maybe not Event data ... could all be bucketed into a more general "structured data" or simply "entity data"? For non-entity data, perhaps that bucket could be called "non-entity data"? --Thadguidry (talk) 15:01, 12 August 2022 (UTC)Reply

Thanks! Other comments also suggested adding some more hierarchy to the list of content types, so that will be one of the changes I add to the proposed taxonomy as a result of this feedback. TBurmeister (WMF) (talk) 21:29, 1 September 2022 (UTC)Reply

Do you feel like you understand what the labels (values) in each category mean? (are the values sensible?) edit

As someone who predominantly edits Wikidata, not Wikipedia, the list of content types would not be useful for me. I don't know where most of the things I've made or used would go or be. The list of tasks is slightly better but it still seems like it's designed around Wikipedia's needs. I'm not sure what categories have to do with labels, nor why archiving would go with cleanup. Drafts seems extremely English Wikipedia-specific. Images, audio/sound, video and books overlap with files - it's not clear whether selecting only "files" would include all tools for working on Commons files or if I would need to include the other four categories as well. What's the difference between audio and sound anyway? The list of platforms is really confusing. An on-wiki gadget, for example, could come under desktop, mobile, MediaWiki and web/browser. I would expect to be able to distinguish mobile apps from web tools which work on mobile, web tools which work on mobile from web tools which only work on desktop, web tools which work on desktop from browser extensions, and tools on external websites from on-wiki tools. What about command-line tools that can be used in PAWS? Do they count as web/browser tools too? - Nikki (talk) 17:50, 4 August 2022 (UTC)Reply

Thanks for your comment. It's useful to hear that the list of platforms is confusing -- perhaps it would be best if we just exclude that from the final set of attributes. The primary content type for Wikidata would be "Structured data", but perhaps that doesn't cover everything. What content types would be useful to reflect the tools you've made or used for Wikidata work?
The sources we reviewed when creating this were lists that included Wikidata-specific tools, but it sounds like we should revisit the content types and tasks to make sure Wikidata use cases are covered. Thanks again for the feedback on this! TBurmeister (WMF) (talk) 14:20, 9 August 2022 (UTC)Reply

Are there values that seem useless / you can't imagine why it would matter or be useful for cataloging or discovering tools? edit

Are there values that are missing? edit

I seem to be missing a field where I can type in the web app link. I was setting up information about https://ordia.toolforge.org/ but I cannot identify which field to use for the URL, - the URL is not for the API. — Finn Årup Nielsen (fnielsen) (talk) 20:21, 12 August 2022 (UTC)Reply

@Fnielsen sorry that I let this sit without any response for so long.
The URL for the a tool is part of the "core" toolinfo data currently. This means that it can be submitted as part of a toolinfo.json file or edited through the UI & API by the "owner" of the toolinfo record. It cannot however currently be edited by the community at large.
If the form you see when clicking the "edit tool" button starts with the "API URL" field then you are seeing the form for editing the "annotations" layer of a toolinfo record. We apparently do not have documentation here on metawiki yet about the specific fields that are part of the annotations layer, but they can be seen in the API documentation at https://toolhub.wikimedia.org/api-docs#put-/api/tools/-name-/annotations/.
Looking at https://toolhub.wikimedia.org/tools/toolforge-ordia/history I can see that the initial import of this record came from the toolinfo record at https://toolsadmin.wikimedia.org/tools/id/ordia. You can edit the toolinfo's core URL using the form at https://toolsadmin.wikimedia.org/tools/id/ordia/info/id/538/edit because you are a maintainer of the tool. The edit that is most likely needed is to check the "This is a webservice" checkbox near the bottom of the form. This will automatically set the toolinfo record's URL to https://ordia.toolforge.org/. -- BDavis (WMF) (talk) 21:27, 29 September 2022 (UTC)Reply
I attempted to summarize this in the docs at https://meta.wikimedia.org/wiki/Toolhub#Editing_tool_URLs TBurmeister (WMF) (talk) 14:46, 4 October 2022 (UTC)Reply

Anything else? edit

Really like this. Breaking down by category with colons. Audience: One thought I have here is that there's a ton of subgroups subsumed under "readers and content consumers". One category that seems particularly relevant here is students, who receive specific structural support in a variety of ways across various wikiprojects. Content types: looks like a thorough list; I wonder if this could be made a bit hierarchical in the future so that there's only 2-5 top-level data categories? Tasks: love this list. "Patrolling" is probably too broad, as you say. "Communication and supporting users" seems broad as well; that could include tasks related to education, to building community, etc. Programming languages: my personal thought here is that what's most useful about these attributes as a tool developer is seeing what other people are doing. Basically, I don't want to find myself "accidentally" doing something no one has ever done before, and so it's most useful for looking at solutions to "solved problems" and ensuring I'm adopting a tech stack that others are using. Subject domains: seems like a hard list to pin down. Tool "purpose" (and corresponding values) seems pretty different to me than e.g. article topic. To me this seems related to the intended audience and task, I wonder what "subject domain" hopes to specifically bring to the table? Overall, I like this taxonomy and echo the thought above that this would be great. Suriname0 (talk) 12:06, 3 August 2022 (UTC)Reply

Thanks for your comment! This is all very useful feedback. I'm curious where you saw "tool purpose"? That isn't part of the proposed taxonomy, though it was in some previous lists. I concluded during my research that that concept is too fuzzy. (See "Concrete concepts and definitions" in the Design Principles)
Subject domain was something that appeared in enough other lists that it seemed worth proposing for inclusion, but I agree with you that it is hard to pin down. I think the main benefit of that category is that there may be tools that operate upon different types of content (article text, images, wikidata) but are targeted at use cases specific to domains like art, history, biology. For example, there are a bunch of tools if you search "monuments" in Toolhub. A category just for "monuments" would be too specific, but maybe it would be nice to be able to browse these tools as being related by subject domain? TBurmeister (WMF) (talk) 15:56, 3 August 2022 (UTC)Reply
Ah, thanks for the clarifications. I think I [incorrectly] inferred "purpose" from "Subject Domains", but your definition and intent make sense. Suriname0 (talk) 21:46, 3 August 2022 (UTC)Reply

I wanted to echo a lot of the above. In general, very much like what I'm seeing and appreciate the detail in describing the process! A few additional thoughts:

  • The "how have other people coded X" was how I initially used Toolforge's tool directory / search -- i.e. trying to find templates to build on. The programming languages are helpful for that. Perhaps a few tweaks:
    • I also agree with others that JSON is confusing as an entry there and probably is used to widely to be a useful tag
    • Will there be a code search functionality too? For instance, I develop in Python and when I was starting on toolforge, I really wanted to see examples of Python tools that specifically used the Flask library (common library for developing APIs). Listing out individual libraries in the taxonomy feels too much but I wonder what additional functionality could support this use-case?
    • Should databases be broken out as their own section? This is a common challenge of tools (which databases to use and how to make them work). For the taxonomy, SQL could be left in programming languages but then a second Database part where you'd have MySQL, PostgreSQL, etc.? There are few enough databases that it's probably a reasonable list.
  • I'm a bit confused around the Platforms -- what would be the difference between desktop, mobile/smartphone, and web/browser? In my limited understanding, if a tool has a web interface, it nominally works for all three. Maybe there are tools that were designed specifically for mobile phones but I assume for most, web/browser covers it.
  • Will all of these facets also have an "Other" option if you feel your tool fits outside of the taxonomy? Even if it's not the most useful for search, as someone who might be applying the taxonomy to their tools, it's always nice when you can find at least one checkbox that applies.

Thanks again! --Isaac (WMF) (talk) 16:30, 19 August 2022 (UTC)Reply

Thanks for your comment! The Toolhub team and others are in agreement that "programming language" as proposed is too narrow of a concept, with too many different potential use cases and interpretations. We decided to instead keep the values relating to frameworks, programming languages, etc. covered by the existing "technology_used" field that is already present (as an uncontrolled field) in the data model. We made the same decision for "Platform", since there is significant overlap there for many tools, though some (like the Commons App) are mobile-only. We want to encourage people to curate their own lists of tools in Toolhub around topics they care about, and also add annotations (tags) that can supplement the taxonomy filtering functionality and also serve as input for future iterations on these categories and their values. So, for example, if a ton of people are tagging tools with what type of database it uses, that would be a signal we'd look for to consider adding that to future taxonomy versions. TBurmeister (WMF) (talk) 21:27, 1 September 2022 (UTC)Reply

Additional feedback page for the proposed taxonomy edit

There are multiple Talk pages for Toolhub; we will monitor them all for feedback about the data model and taxonomy. Some people have left taxonomy feedback on the main Toolhub Talk page: https://meta.wikimedia.org/wiki/Talk:Toolhub. You may want to see what others have said there even if you left your comment here. No need to double-post, though - we'll gather the comments from everywhere! TBurmeister (WMF) (talk) 15:46, 3 August 2022 (UTC)Reply

Where JSON needs to be listed? edit

I would suggest the reshuffling of some elements of the list or may be more- remove JSON from the proposed initial list of values in the "What programming languages does the tool use?" and add it to the initial list of values under "With what type of content or data does the tool interact?". I see JSON more a data formatting tool than a programming language. May be a taxonomy item for data format which includes JSON, Pikle, etc to name just a few as its list of values is required. DessalegnM (talk) 15:07, 11 August 2022 (UTC)Reply

I do think JSON is a useful list of values Luisloureiro (talk) 18:16, 11 August 2022 (UTC)Reply
Thanks for your comment! The Toolhub team and others are in agreement that "programming language" as proposed is too narrow of a concept, with too many different potential use cases and interpretations. We decided to instead keep the values relating to frameworks, programming languages, etc. covered by the existing "technology_used" field that is already present (as an uncontrolled field) in the data model. TBurmeister (WMF) (talk) 21:24, 1 September 2022 (UTC)Reply

Programming languages edit

The proposed list of "programming languages" includes a number of items that are not programming languages (e.g., JSON [file/data format], MySQL [RDBMS], Node.js [runtime environment]). If you intend to keep that list of values, then the attribute needs to be renamed to a broader term, otherwise, remove all of the ones that are not programming languages. As an aside, MySQL should probably be Q850 for the RDBMS instead of the Python package.
Frameworks might be better suited to separate uncontrolled attribute instead of being included with programming languages since tool authors could use any number of frameworks, which would vary based on programming language.
— JJMC89(T·C) 20:20, 13 August 2022 (UTC)Reply
Thanks for your comment! The Toolhub team and others are in agreement that "programming language" as proposed is too narrow of a concept, with too many different potential use cases and interpretations. We decided to instead keep the values relating to frameworks, programming languages, etc. covered by the existing "technology_used" field that is already present (as an uncontrolled field) in the data model. TBurmeister (WMF) (talk) 21:24, 1 September 2022 (UTC)Reply

The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.
Return to "Toolhub/Data model/Feedback" page.